Skip to content

Simple multithread version of Hennessy-Patterson optimized dgemm routine on C++

Notifications You must be signed in to change notification settings

NikitaMatckevich/FastDGEMM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FastDGEMM

Simple multithread version of Hennessy-Patterson optimized dgemm routine on C++.

Short benchmark code for the OpenBLAS library is added for comparison.

I would be happy to know your opinion about this experiment and get some feedback about the general methodology, benchmark reliability, and quality of the code itself.

I would also appreciate any idea on how to improve the performance of this code.

Stages of improvement ("Getting faster"):

Compiler optimizations

AVX x86 intrinsics

Loop unrolling (-O3 compiler option shoul be enabled)

New feature - multithreading using std::thread, realization maybe isn't perfect

Cache blocking (controlling the age of the array accesses)

Comparison with OpenBLAS. We've got a really big trip ahead...

About

Simple multithread version of Hennessy-Patterson optimized dgemm routine on C++

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published