Compare with optimized convolutions #40

mratsim · 2018-11-26T11:17:10Z

Hey there, promising work on a C++ JIT.

Can you compare your JIT results with state-of-the-art convolution or at least im2col + GEMM convolution and report the GFLOP/s reached and theoretical peak?

Here are all the resources I gathered regarding convolution optimisation.

The main issue with naive direct convolution are the cache misses and poor utilisation of the CPU cache hierarchy.

On benchmarks on my CPU, a i5-5257U 2.7Ghz dual core Broadwell supporting AVX+FMA the theoretical compute peak is 172.8 GLOP/s, however a naive convolution can reach only 2.6 GFLOP/s. When reframing as a im2col + GEMM (matrix multiplication), I can reach 20+ GFLOP/s.

I didn't finish yet but I hope to reach 120+ GFLOP/s using my own BLAS which attains 98% of the speed of OpenBLAS (72.4 GFLOPS vs 73.8 GFLOPS single threaded, 136 GFLOP/s vs 145 GFLOP/s multithreaded) and fusing im2col with the matrix multiplication repacking steps.

Other promising approaches that should reach 100+ GFLOP/s are MKL-DNN and libxsmm which is described in great detail in this paper.

Also Halide has an optimised JIT generation for computational imagery and already relies on LLVM.

The text was updated successfully, but these errors were encountered:

jmmartinez · 2018-12-23T11:38:18Z

Hello,
First of all, sorry for the ridiculous delay in my response.

Do you have a small reference C/C++ code I could use for benchmarking using any of those methods? I would love to check what happens if I use the jit compiler on them.

One thing to take into account, the convolution benchmark I used is just for depicting the use of the jit compiler. It's not meant to be a real-world scenario. The library still remains a toy.

Thanks!

jmmartinez added the help wanted label Dec 23, 2018

jmmartinez self-assigned this Dec 23, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compare with optimized convolutions #40

Compare with optimized convolutions #40

mratsim commented Nov 26, 2018 •

edited

Loading

jmmartinez commented Dec 23, 2018

Compare with optimized convolutions #40

Compare with optimized convolutions #40

Comments

mratsim commented Nov 26, 2018 • edited Loading

jmmartinez commented Dec 23, 2018

mratsim commented Nov 26, 2018 •

edited

Loading