Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
gpu_resources [2017/04/27 13:19] – csteel | gpu_resources [2024/03/26 13:52] (current) – external edit 127.0.0.1 | ||
---|---|---|---|
Line 2: | Line 2: | ||
This is a collaborative resource, please improve it. Login using your MCIN user name and ID and add your discoveries. | This is a collaborative resource, please improve it. Login using your MCIN user name and ID and add your discoveries. | ||
+ | |||
+ | ===== Items of Interest / for Discussion? ===== | ||
+ | |||
+ | |||
+ | |||
+ | ==== Resources ==== | ||
+ | |||
+ | * [ OpenACC - Tutorial - Steps to More Science ]( https:// | ||
+ | |||
+ | "Here are three simple steps to start accelerating your code with GPUs. We will be using PGI OpenACC compiler for C, C++, FORTRAN, along with tools from the PGI Community Edition." | ||
+ | |||
+ | * [ Performance Portability from GPUs to CPUs with OpenACC ](https:// | ||
+ | |||
+ | * [ Data Center Management Tools ]( http:// | ||
+ | |||
+ | * The GPU Deployment Kit | ||
+ | * Ganglia | ||
+ | * Slurm | ||
+ | * NVIDIA Docker | ||
+ | * Others??? | ||
+ | |||
+ | " | ||
+ | |||
===== Preventing Job Clobbering ===== | ===== Preventing Job Clobbering ===== | ||
- | Today I was training a model and inadvertently kicked Konrad' | + | There are currently 3 GPU' |
+ | |||
+ | < | ||
+ | export CUDA_VISIBLE_DEVICES=X | ||
+ | </ | ||
+ | |||
+ | This will only take effect when you log in, so log out and back in and try the following to ensure | ||
+ | |||
+ | < | ||
+ | echo $CUDA_VISIBLE_DEVICES | ||
+ | </ | ||
+ | |||
+ | If it outputs the ID that you selected then you're ready to use the GPU. | ||
+ | |||
+ | ==== Sharing a single GPU ==== | ||
+ | To configure TensorFlow to not pre-allocate all GPU memory you can use the following Python code: | ||
< | < | ||
Line 15: | Line 53: | ||
</ | </ | ||
- | We should develop some kind of policy | + | This has been found to work only to a certain extent, and when there are several |
===== GPU Info ===== | ===== GPU Info ===== | ||
Line 47: | Line 84: | ||
nsight | nsight | ||
</ | </ | ||
+ | |||
+ | Nvidia Visual Profiler (https:// | ||
+ | < | ||
+ | / | ||
+ | </ | ||
+ | |||
===== GPU Accounting ===== | ===== GPU Accounting ===== | ||
Line 60: | Line 103: | ||
</ | </ | ||
+ | Output example: | ||
+ | |||
+ | < | ||
+ | ==============NVSMI LOG============== | ||
+ | |||
+ | Timestamp | ||
+ | Driver Version | ||
+ | |||
+ | Attached GPUs : 1 | ||
+ | GPU 0000: | ||
+ | Accounting Mode : Enabled | ||
+ | Accounting Mode Buffer Size : 1920 | ||
+ | Accounted Processes | ||
+ | Process ID : 15819 | ||
+ | GPU Utilization | ||
+ | Memory Utilization | ||
+ | Max memory usage : 187 MiB | ||
+ | Time : 3769 ms | ||
+ | Is Running | ||
+ | ... | ||
+ | </ | ||
Users: to check GPU stats per process: | Users: to check GPU stats per process: | ||
< | < | ||
Line 109: | Line 173: | ||
Doesn' | Doesn' | ||
</ | </ | ||
+ | |||
+ | * [[http:// | ||
+ | |||
+ | * [[http:// | ||
===== Deep Learning ===== | ===== Deep Learning ===== | ||