GPU Resources

This is a collaborative resource, please improve it. Login using your MCIN user name and ID and add your discoveries.

* [ OpenACC - Tutorial - Steps to More Science ]( https://developer.nvidia.com/openacc/3-steps-to-more-science )

“Here are three simple steps to start accelerating your code with GPUs. We will be using PGI OpenACC compiler for C, C++, FORTRAN, along with tools from the PGI Community Edition.”

* [ Performance Portability from GPUs to CPUs with OpenACC ](https://devblogs.nvidia.com/parallelforall/performance-portability-gpus-cpus-openacc/)

* [ Data Center Management Tools ]( http://www.nvidia.com/object/data-center-managment-tools.html )

  • The GPU Deployment Kit
  • Ganglia
  • Slurm
  • NVIDIA Docker
  • Others???

“…performance on multicore CPUs for HPC apps using MPI + OpenACC is equivalent to MPI + OpenMP code. Compiling and running the same code on a Tesla K80 GPU can provide large speedups.”

There are currently 3 GPU's in ace-gpu-1. To select one of the three (0, 1, 2), set the CUDA_​VISIBLE_​DEVICES environment variable. This can be accomplished by adding the following line to your ~/.bash_profile file on ace-gpu-1, where X is either 0, 1 or 2:

export CUDA_VISIBLE_DEVICES=X

This will only take effect when you log in, so log out and back in and try the following to ensure that it worked:

echo $CUDA_VISIBLE_DEVICES

If it outputs the ID that you selected then you're ready to use the GPU.

To configure TensorFlow to not pre-allocate all GPU memory you can use the following Python code:

# configures TensorFlow to not try to grab all the GPU memory
config = tf.ConfigProto(allow_soft_placement=True)
config.gpu_options.allow_growth = True
session = tf.Session(config=config)
K.set_session(session)

This has been found to work only to a certain extent, and when there are several jobs that use a significant amount of the GPU resources, jobs can still be ruined even when using the above code

For CPU and GPU usage:

glances

Other info

nvcc -V
nvidia-smi
lspci -vnn | grep VGA -A 12
dpkg -l | grep -i nvidia
ssh -X ace-gpu-1
nsight

Nvidia Visual Profiler (https://developer.nvidia.com/nvidia-visual-profiler) would be useful for GPU monitoring if we had X visualization, but we do not:

/usr/local/cuda/bin/nvvp

SysAdmins: to enable Accounting mode

sudo nvidia-smi -i 0 -am ENABLED

Users: to check if Accounting mode enabled or disabled

nvidia-smi -i 0 -q -d ACCOUNTING

Output example:

==============NVSMI LOG==============

Timestamp                           : Thu Apr 27 09:09:50 2017
Driver Version                      : 375.39

Attached GPUs                       : 1
GPU 0000:01:00.0
    Accounting Mode                 : Enabled
    Accounting Mode Buffer Size     : 1920
    Accounted Processes
        Process ID                  : 15819
            GPU Utilization         : 100 %
            Memory Utilization      : 6 %
            Max memory usage        : 187 MiB
            Time                    : 3769 ms
            Is Running              : 0
...

Users: to check GPU stats per process:

nvidia-smi -i 0 --query-accounted-apps=gpu_name,pid,gpu_util,max_memory_usage,time --format=csv

Output example:

gpu_name, pid, gpu_utilization [%], max_memory_usage [MiB], time [ms]
TITAN X (Pascal), 15819, 100 %, 187 MiB, 3769 ms
TITAN X (Pascal), 15633, 87 %, 8465 MiB, 200626 ms
TITAN X (Pascal), 15944, 0 %, 153 MiB, 382 ms
TITAN X (Pascal), 16000, 0 %, 155 MiB, 299 ms
TITAN X (Pascal), 15862, 80 %, 8465 MiB, 215039 ms
TITAN X (Pascal), 15842, 41 %, 425 MiB, 721223 ms
TITAN X (Pascal), 16294, 74 %, 8465 MiB, 231517 ms
TITAN X (Pascal), 16436, 70 %, 10425 MiB, 229470 ms
TITAN X (Pascal), 16118, 40 %, 155 MiB, 1310156 ms
TITAN X (Pascal), 16908, 72 %, 8465 MiB, 511122 ms
TITAN X (Pascal), 17102, 73 %, 8465 MiB, 833806 ms
TITAN X (Pascal), 17900, 0 %, 153 MiB, 358 ms
TITAN X (Pascal), 18018, 0 %, 153 MiB, 235 ms
TITAN X (Pascal), 17632, 75 %, 8465 MiB, 823193 ms
TITAN X (Pascal), 18376, 74 %, 8529 MiB, 827336 ms
TITAN X (Pascal), 18637, 74 %, 8465 MiB, 547161 ms
TITAN X (Pascal), 16377, 54 %, 153 MiB, 0 ms
TITAN X (Pascal), 18752, 55 %, 8465 MiB, 0 ms

Users: Accounting help

nvidia-smi --help-query-accounted-apps
    -i,   --id=                 Target a specific GPU.
    -am   --accounting-mode=    Enable or disable Accounting Mode: 0/DISABLED, 1/ENABLED
    -q,   --query               Display GPU or Unit info.
    -d,   --display=            Display only selected information: MEMORY,
                                    UTILIZATION, ECC, TEMPERATURE, POWER, CLOCK,
                                    COMPUTE, PIDS, PERFORMANCE, SUPPORTED_CLOCKS,
                                    PAGE_RETIREMENT, ACCOUNTING.
                                Flags can be combined with comma e.g. ECC,POWER.
                                Sampling data with max/min/avg is also returned 
                                for POWER, UTILIZATION and CLOCK display types.
                                Doesn't work with -u or -x flags.

* http://docs.nvidia.com/deploy/driver-persistence/index.html#persistence-mode

* http://docs.nvidia.com/deploy/driver-persistence/index.html#persistence-daemon

* https://surfer.nmr.mgh.harvard.edu/fswiki/SystemRequirements * https://surfer.nmr.mgh.harvard.edu/fswiki/DevelopersGuide

FreesSurfer 6.0 with CUDA (as well as openmp). Have had issues compiling FreeSurfer with it in the recent past, no longer actively supports GPU/CUDA as Freesurfer it's permanently stuck in the past on version 5.0.35…

https://surfer.nmr.mgh.harvard.edu/fswiki/SystemRequirements https://surfer.nmr.mgh.harvard.edu/fswiki/DevelopersGuide

Request to install on ACE-GPU-1 so that we can use nvidia-docker.:

  • Docker: Docker >= 1.9 (official docker-engine only)
  • NVIDIA drivers: >= 340.29 with binary nvidia-modprobe

Why

Status

* openacc.org

OpenACC directives are complementary to and interoperate with existing HPC programming models including OpenMP, MPI, and CUDA.

The directives and programming model defined in the OpenACC API document allow programmers to create high-level host+accelerator programs without the need to explicitly initialize the accelerator, manage data or program transfers between the host and accelerator, or initiate accelerator startup and shutdown.

The OpenACC Application Program Interface describes a collection of compiler directives to specify loops and regions of code in standard C, C++ and Fortran to be offloaded from a host CPU to an attached accelerator. OpenACC is designed for portability across operating systems, host CPUs, and a wide range of accelerators, including APUs, GPUs, and many-core coprocessors.

Status

  • gpu_resources.txt
  • Last modified: 2024/03/26 13:52
  • by 127.0.0.1