GPU Resources

This is an old revision of the document!

Today I was training a model and inadvertently kicked Konrad's job off the GPU. I discovered how to configure TensorFlow so that it doesn't do this:

# configures TensorFlow to not try to grab all the GPU memory
config = tf.ConfigProto(allow_soft_placement=True)
config.gpu_options.allow_growth = True
session = tf.Session(config=config)
K.set_session(session)

We should develop some kind of policy to run jobs on ace-gpu-1 so that we don't inadvertently ruin other peoples' processes.

For CPU and GPU usage:

glances

Other info

nvcc -V

nvidia-smi

lspci -vnn | grep VGA -A 12

dpkg -l | grep -i nvidia

ssh -X ace-gpu-1
nsight

SysAdmins: to enable Accounting mode

sudo nvidia-smi -i 0 -am ENABLED

Users: to check if Accounting mode enabled or disabled

nvidia-smi -i 0 -q -d ACCOUNTING

Users: to check GPU stats per process:

nvidia-smi -i 0 --query-accounted-apps=gpu_name,pid,gpu_util,max_memory_usage,time --format=csv

Users: Accounting help

nvidia-smi --help-query-accounted-apps

* Deep Learning Notes

* https://surfer.nmr.mgh.harvard.edu/fswiki/SystemRequirements * https://surfer.nmr.mgh.harvard.edu/fswiki/DevelopersGuide

FreesSurfer 6.0 with CUDA (as well as openmp). Have had issues compiling FreeSurfer with it in the recent past, no longer actively supports GPU/CUDA as Freesurfer it's permanently stuck in the past on version 5.0.35…

https://surfer.nmr.mgh.harvard.edu/fswiki/SystemRequirements https://surfer.nmr.mgh.harvard.edu/fswiki/DevelopersGuide

Request to install on ACE-GPU-1 so that we can use nvidia-docker.:

Docker: Docker >= 1.9 (official docker-engine only)
NVIDIA drivers: >= 340.29 with binary nvidia-modprobe

Why

Nvidia-Docker is officially supported by NVIDIA
Allows the containerizing of GPU applications.
Containers built using this tool should be able to be run on both ACE-GPU-1 and Guillimin.
Official github site for the project: https://github.com/NVIDIA/nvidia-docker
Requirements page for installation:https://github.com/NVIDIA/nvidia-docker/wiki/Installation

Status

GLPI Ticket 386
For discussion at next IT Team Meeting

* openacc.org

OpenACC directives are complementary to and interoperate with existing HPC programming models including OpenMP, MPI, and CUDA.

The directives and programming model defined in the OpenACC API document allow programmers to create high-level host+accelerator programs without the need to explicitly initialize the accelerator, manage data or program transfers between the host and accelerator, or initiate accelerator startup and shutdown.

The OpenACC Application Program Interface describes a collection of compiler directives to specify loops and regions of code in standard C, C++ and Fortran to be offloaded from a host CPU to an attached accelerator. OpenACC is designed for portability across operating systems, host CPUs, and a wide range of accelerators, including APUs, GPUs, and many-core coprocessors.

Status