This is an old revision of the document!


GPU Resources

Today I was training a model and inadvertently kicked Konrad's job off the GPU. I discovered how to configure TensorFlow so that it doesn't do this:

# configures TensorFlow to not try to grab all the GPU memory
config = tf.ConfigProto(allow_soft_placement=True)
config.gpu_options.allow_growth = True
session = tf.Session(config=config)
K.set_session(session)

We should develop some kind of policy to run jobs on ace-gpu-1 so that we don't inadvertently ruin other peoples' processes.

For CPU and GPU usage:

glances

Other info

nvcc -V
nvidia-smi
lspci -vnn | grep VGA -A 12
dpkg -l | grep -i nvidia
ssh -X ace-gpu-1
nsight

SysAdmins: to enable Accounting mode

sudo nvidia-smi -i 0 -am ENABLED

Users: to check if Accounting mode enabled or disabled

nvidia-smi -i 0 -q -d ACCOUNTING

Users: to check GPU stats per process:

nvidia-smi -i 0 --query-accounted-apps=gpu_name,pid,gpu_util,max_memory_usage,time --format=csv

Users: Accounting help

nvidia-smi --help-query-accounted-apps

* https://surfer.nmr.mgh.harvard.edu/fswiki/SystemRequirements * https://surfer.nmr.mgh.harvard.edu/fswiki/DevelopersGuide

FreesSurfer 6.0 with CUDA (as well as openmp). Have had issues compiling FreeSurfer with it in the recent past, no longer actively supports GPU/CUDA as Freesurfer it's permanently stuck in the past on version 5.0.35…

https://surfer.nmr.mgh.harvard.edu/fswiki/SystemRequirements https://surfer.nmr.mgh.harvard.edu/fswiki/DevelopersGuide

Request to install on ACE-GPU-1 so that we can use nvidia-docker.:

  • Docker: Docker >= 1.9 (official docker-engine only)
  • NVIDIA drivers: >= 340.29 with binary nvidia-modprobe

Why

Status

* openacc.org

OpenACC directives are complementary to and interoperate with existing HPC programming models including OpenMP, MPI, and CUDA.

The directives and programming model defined in the OpenACC API document allow programmers to create high-level host+accelerator programs without the need to explicitly initialize the accelerator, manage data or program transfers between the host and accelerator, or initiate accelerator startup and shutdown.

The OpenACC Application Program Interface describes a collection of compiler directives to specify loops and regions of code in standard C, C++ and Fortran to be offloaded from a host CPU to an attached accelerator. OpenACC is designed for portability across operating systems, host CPUs, and a wide range of accelerators, including APUs, GPUs, and many-core coprocessors.

Status

  • gpu_resources.1493233821.txt.gz
  • Last modified: 2024/03/26 13:52
  • (external edit)