gpu_resources

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
gpu_resources [2017/04/27 13:19] csteelgpu_resources [2024/03/26 13:52] (current) – external edit 127.0.0.1
Line 2: Line 2:
  
 This is a collaborative resource, please improve it. Login using your MCIN user name and ID and add your discoveries. This is a collaborative resource, please improve it. Login using your MCIN user name and ID and add your discoveries.
 +
 +===== Items of Interest / for Discussion? =====
 +
 +
 +
 +==== Resources ====
 +
 +* [ OpenACC - Tutorial - Steps to More Science ]( https://developer.nvidia.com/openacc/3-steps-to-more-science )
 +
 +"Here are three simple steps to start accelerating your code with GPUs. We will be using PGI OpenACC compiler for C, C++, FORTRAN, along with tools from the PGI Community Edition."
 +
 +* [ Performance Portability from GPUs to CPUs with OpenACC ](https://devblogs.nvidia.com/parallelforall/performance-portability-gpus-cpus-openacc/)
 +
 +* [ Data Center Management Tools ]( http://www.nvidia.com/object/data-center-managment-tools.html )
 +
 +    * The GPU Deployment Kit
 +    * Ganglia
 +    * Slurm
 +    * NVIDIA Docker
 +    * Others???
 +
 +"...performance on multicore CPUs for HPC apps using MPI + OpenACC is equivalent to MPI + OpenMP code. Compiling and running the same code on a Tesla K80 GPU can provide large speedups."
 +
  
 ===== Preventing Job Clobbering ===== ===== Preventing Job Clobbering =====
  
-Today I was training a model and inadvertently kicked Konrad'job off the GPUI discovered how to configure TensorFlow so that it doesn't do this:+There are currently 3 GPU'in ace-gpu-1. To select one of the three (0, 1, 2), set the CUDA_​VISIBLE_​DEVICES environment variableThis can be accomplished by adding the following line to your ~/.bash_profile file on ace-gpu-1, where X is either 0, 1 or 2: 
 + 
 +<code> 
 +export CUDA_VISIBLE_DEVICES=X 
 +</code> 
 + 
 +This will only take effect when you log in, so log out and back in and try the following to ensure that it worked: 
 + 
 +<code> 
 +echo $CUDA_VISIBLE_DEVICES 
 +</code> 
 + 
 +If it outputs the ID that you selected then you're ready to use the GPU. 
 + 
 +==== Sharing a single GPU ==== 
 +To configure TensorFlow to not pre-allocate all GPU memory you can use the following Python code:
  
 <code> <code>
Line 15: Line 53:
 </code> </code>
  
-We should develop some kind of policy to run jobs on ace-gpu-1 so that we don't inadvertently ruin other peoples' processes. +This has been found to work only to a certain extent, and when there are several jobs that use a significant amount of the GPU resources, jobs can still be ruined even when using the above code
 ===== GPU Info ===== ===== GPU Info =====
  
Line 47: Line 84:
 nsight nsight
 </code> </code>
 +
 +Nvidia Visual Profiler (https://developer.nvidia.com/nvidia-visual-profiler) would be useful for GPU monitoring if we had X visualization, but we do not:
 +<code>
 +/usr/local/cuda/bin/nvvp
 +</code>
 +
  
 ===== GPU Accounting ===== ===== GPU Accounting =====
Line 60: Line 103:
 </code> </code>
  
 +Output example:
 +
 +<code>
 +==============NVSMI LOG==============
 +
 +Timestamp                           : Thu Apr 27 09:09:50 2017
 +Driver Version                      : 375.39
 +
 +Attached GPUs                       : 1
 +GPU 0000:01:00.0
 +    Accounting Mode                 : Enabled
 +    Accounting Mode Buffer Size     : 1920
 +    Accounted Processes
 +        Process ID                  : 15819
 +            GPU Utilization         : 100 %
 +            Memory Utilization      : 6 %
 +            Max memory usage        : 187 MiB
 +            Time                    : 3769 ms
 +            Is Running              : 0
 +...
 +</code>
 Users: to check GPU stats per process: Users: to check GPU stats per process:
 <code> <code>
Line 109: Line 173:
                                 Doesn't work with -u or -x flags.                                 Doesn't work with -u or -x flags.
 </code> </code>
 +
 +* [[http://docs.nvidia.com/deploy/driver-persistence/index.html#persistence-mode]]
 +
 +* [[http://docs.nvidia.com/deploy/driver-persistence/index.html#persistence-daemon]]
 ===== Deep Learning ===== ===== Deep Learning =====
  
  • gpu_resources.1493299182.txt.gz
  • Last modified: 2024/03/26 13:52
  • (external edit)