My Personal Notes from the Leeds University HPC5 training course
Course Link: goo.gl/NMGpWk
The code follows some of the tasks in the course work sheet. Commits every question and bugfix.
🚧 These are just notes from a course, this is not developed work 🚧
- As clockspeeds stall best to move to accelerators
- ARC3 has the discontinued Xeon Phi nodes that go unused
- nVida produce Tesla GPUS like GeForce but for HPCS
- ARC3 has 6 GPU Nodes (4 P100s) and 2 ( 2 k80 ) nodes. ARC4 will have less.
#$ -l coproc_p100=1
Requests 1/4 = =4 gives whole#$ -l coproc_k80=1
Requests 1/2 =2 gives whole
- ARC3 GPUs are under utilised
- Always use CPU and GPUs together the GPUs do the heavy lifting, CPU requirement depends on code
- Use Libraries e.g. MATLAB, Ansys
- Directives like OpenACC compile and auto gen
- CUDA extension to C for nVida whereas OpenCL works cross platform
- Created by nVida
- Not super user friendly even PyCUDA
- Works by sending code to CPU and separate code to GPU Kernel
- Each Streaming Multiproc (SM) has multiple cores
- ARC3 p100s has 56 SMs with 64 single precision cores or 32 Double Precision cores
- CUDA vectors can help control location in GPU architecture
- NB
nvcc
doesn't work with every version of every compiler and it's hard to find out which - PyCUDA (includes GPU interactive sessions)
import pycuda.autoinit
important!!a_gpu = cuda.mem_alloc(a.nbytes)
it's important to cast arrays to allocate memory- Still requires C for kernel but pro is that skip the compiling code and allows for using numpy functions
- CUDApython (numba)
- Alexnet example on github
- NVIDA ecosystem has a whole suite of accelerated code and libraries etc.
- Compiler requires .cu extension
- tabs are not accepted
The course was from the ARC Training Course: HPC5, delivered by Martin Callaghan.