Skip to content

Demonstrations of cuBLAS usage in several contexts

Notifications You must be signed in to change notification settings

RonRahaman/cublas-demos

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 

Repository files navigation

cuBLAS Demos

This repo contains cuBLAS demos from several sources of documentation.

src/

  • cublas_acc_device calls cublasSswap from an OpenACC device kernel. It is from Section 6.2 of PGI's Fortran CUDA Library Interfaces, v. 2017.

  • cublas_stream calls cublasDgemm from the host using multiple streams. It is from OLCF's tutorial, Concurrent Kernels II: Batched Library Calls. Note that it uses a custom Fortran interface to the C cuBLAS v2 functions. It appears that, when the tutorial was written, NVIDIA did not provide a Fortran interface to cuBLAS v2.

  • cublas_stream_no_c is a version of cublas_stream that uses NVIDIA's current (v2017) Fortran interfaces to cuBLAS v2. It was written by me, Ron Rahaman.

  • cublas_batch calls cublasDgemmBatched to launch multiple dgemm operations with one call.
    It is also from OLCF's tutorial, Concurrent Kernels II: Batched Library Calls. Like cublas_stream, it uses a custom Fortran interface to the C cuBLAS v2 functions.

  • cublas_batch_no_c is a version of cublas_batch that uses NVIDIA's current (v2017) Fortran interfaces to cuBLAS v2. It was written by me, Ron Rahaman.

  • cublas_batch_acc is a version of cublas_batch_no_c that uses OpenACC data directives for host/device data transfers. It is intended to demonstrate the use of cuBLAS batched dgemm in a code that uses OpenACC for everything else. It was written by me, Ron Rahaman.

data/

Testbeds

The following testbeds were used to gather the results contained here:

  • neddy is a GPU node provided by JLSE at ANL. It contains 1x NVIDIA P100 GPU. Details are found here.

Results

  • cublas_stream_times.p100.csv contains timings from running cublas_stream_no_c on neddy.

  • cublas_batch_times.p100.csv contains timings from running cublas_batch_no_c on neddy.

About

Demonstrations of cuBLAS usage in several contexts

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published