Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPU performance regression: tons of allocations #675

Closed
ali-ramadhan opened this issue Mar 5, 2020 · 2 comments · Fixed by #685
Closed

CPU performance regression: tons of allocations #675

ali-ramadhan opened this issue Mar 5, 2020 · 2 comments · Fixed by #685
Labels
bug 🐞 Even a perfect program still has bugs performance 🏍️ So we can get the wrong answer even faster

Comments

@ali-ramadhan
Copy link
Member

Reported in PR #666. Might be serious enough that it is slowing down documentation building in PR #671.

Julia Version 1.3.1
Commit 2d5741174c (2019-12-30 21:36 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)
  GPU: TITAN V

 ──────────────────────────────────────────────────────────────────────────────────────
        Static ocean benchmarks                Time                   Allocations      
                                       ──────────────────────   ───────────────────────
           Tot / % measured:                 173s / 51.2%           43.6GiB / 64.5%    

 Section                       ncalls     time   %tot     avg     alloc   %tot      avg
 ──────────────────────────────────────────────────────────────────────────────────────
  32× 32× 32  [CPU, Float32]       10    127ms  0.14%  12.7ms    170MiB  0.59%  17.0MiB
  32× 32× 32  [CPU, Float64]       10    153ms  0.17%  15.3ms    170MiB  0.59%  17.0MiB
  32× 32× 32  [GPU, Float32]       10   24.4ms  0.03%  2.44ms   10.0MiB  0.03%  1.00MiB
  32× 32× 32  [GPU, Float64]       10   24.2ms  0.03%  2.42ms   10.0MiB  0.03%  1.00MiB
  64× 64× 64  [CPU, Float32]       10    713ms  0.81%  71.3ms    676MiB  2.35%  67.6MiB
  64× 64× 64  [CPU, Float64]       10    868ms  0.98%  86.8ms    676MiB  2.35%  67.6MiB
  64× 64× 64  [GPU, Float32]       10   24.8ms  0.03%  2.48ms   10.0MiB  0.03%  1.00MiB
  64× 64× 64  [GPU, Float64]       10   25.2ms  0.03%  2.52ms   10.0MiB  0.03%  1.00MiB
 128×128×128  [CPU, Float32]       10    5.22s  5.90%   522ms   2.64GiB  9.39%   270MiB
 128×128×128  [CPU, Float64]       10    5.44s  6.14%   544ms   2.64GiB  9.39%   270MiB
 128×128×128  [GPU, Float32]       10   46.3ms  0.05%  4.63ms   10.0MiB  0.03%  1.00MiB
 128×128×128  [GPU, Float64]       10   45.6ms  0.05%  4.56ms   10.0MiB  0.03%  1.00MiB
 256×256×256  [CPU, Float32]       10    37.4s  42.3%   3.74s   10.5GiB  37.5%  1.05GiB
 256×256×256  [CPU, Float64]       10    37.7s  42.6%   3.77s   10.5GiB  37.5%  1.05GiB
 256×256×256  [GPU, Float32]       10    338ms  0.38%  33.8ms   10.0MiB  0.03%  1.00MiB
 256×256×256  [GPU, Float64]       10    336ms  0.38%  33.6ms   10.0MiB  0.03%  1.00MiB
 ──────────────────────────────────────────────────────────────────────────────────────
@ali-ramadhan ali-ramadhan added bug 🐞 Even a perfect program still has bugs performance 🏍️ So we can get the wrong answer even faster labels Mar 5, 2020
@ali-ramadhan
Copy link
Member Author

ali-ramadhan commented Mar 5, 2020

Looks like a couple of lines in apply_flux_bcs.jl (42 and 54) are the culprit but it's hitting the loop macros and I'm not sure what's wrong in there.... @glwagner Any ideas?

julia> analyze_malloc(".")
305-element Array{CoverageTools.MallocInfo,1}:                          
                                                                                                 
 CoverageTools.MallocInfo(1280, "./src/BoundaryConditions/fill_halo_regions.jl.18917.mem", 49)    
 CoverageTools.MallocInfo(1664, "./src/BoundaryConditions/fill_halo_regions.jl.18917.mem", 52)    
 CoverageTools.MallocInfo(1664, "./src/BoundaryConditions/fill_halo_regions.jl.18917.mem", 53)    
 CoverageTools.MallocInfo(1664, "./src/BoundaryConditions/fill_halo_regions.jl.18917.mem", 56)    
 CoverageTools.MallocInfo(1664, "./src/BoundaryConditions/fill_halo_regions.jl.18917.mem", 57)    
 CoverageTools.MallocInfo(2368, "./src/TimeSteppers/adams_bashforth.jl.18917.mem", 41)            
 CoverageTools.MallocInfo(4400, "./src/BoundaryConditions/fill_halo_regions.jl.18917.mem", 26)    
 CoverageTools.MallocInfo(4400, "./src/BoundaryConditions/fill_halo_regions.jl.18917.mem", 27)    
 CoverageTools.MallocInfo(4400, "./src/BoundaryConditions/fill_halo_regions.jl.18917.mem", 28)    
 CoverageTools.MallocInfo(4400, "./src/BoundaryConditions/fill_halo_regions.jl.18917.mem", 29)    
 CoverageTools.MallocInfo(4400, "./src/BoundaryConditions/fill_halo_regions.jl.18917.mem", 30)    
 CoverageTools.MallocInfo(4640, "./src/BoundaryConditions/apply_flux_bcs.jl.18917.mem", 17)       
 CoverageTools.MallocInfo(4640, "./src/BoundaryConditions/apply_flux_bcs.jl.18917.mem", 30)       
 CoverageTools.MallocInfo(5632, "./src/BoundaryConditions/fill_halo_regions.jl.18917.mem", 20)    
 CoverageTools.MallocInfo(9504, "./src/BoundaryConditions/fill_halo_regions.jl.18917.mem", 25)    
 CoverageTools.MallocInfo(24960, "./src/BoundaryConditions/fill_halo_regions.jl.18917.mem", 70)   
 CoverageTools.MallocInfo(35389680, "./src/BoundaryConditions/apply_flux_bcs.jl.18917.mem", 42)   
 CoverageTools.MallocInfo(35389680, "./src/BoundaryConditions/apply_flux_bcs.jl.18917.mem", 54)

@ali-ramadhan
Copy link
Member Author

Hmmm so this regression happened somewhere between v0.22.0 and v0.23.0 so we just haven't been very careful...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐞 Even a perfect program still has bugs performance 🏍️ So we can get the wrong answer even faster
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant