Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix memory allocations on CPU #685

Merged
merged 2 commits into from
Mar 6, 2020
Merged

Fix memory allocations on CPU #685

merged 2 commits into from
Mar 6, 2020

Conversation

ali-ramadhan
Copy link
Member

My fault, must have messed up and forgot to skip _apply_*_bcs! for NotFluxBC in PR #631.

Fixes #675 and should fix failing tests on PR #671.

Before:

 ──────────────────────────────────────────────────────────────────────────────────────
        Static ocean benchmarks                Time                   Allocations      
                                       ──────────────────────   ───────────────────────
           Tot / % measured:                24.3s / 2.17%           1.83GiB / 18.1%    

 Section                       ncalls     time   %tot     avg     alloc   %tot      avg
 ──────────────────────────────────────────────────────────────────────────────────────
  32× 32× 32  [CPU, Float32]       10    256ms  48.6%  25.6ms    170MiB  50.0%  17.0MiB
  32× 32× 32  [CPU, Float64]       10    270ms  51.4%  27.0ms    170MiB  50.0%  17.0MiB
 ──────────────────────────────────────────────────────────────────────────────────────

After:

 ──────────────────────────────────────────────────────────────────────────────────────
       Static ocean benchmarks                Time                   Allocations      
                                      ──────────────────────   ───────────────────────
          Tot / % measured:                 126s / 62.8%           1.46GiB / 0.13%    

Section                       ncalls     time   %tot     avg     alloc   %tot      avg
──────────────────────────────────────────────────────────────────────────────────────
 32× 32× 32  [CPU, Float32]       10    39.4s  49.9%   3.94s   1.00MiB  50.0%   102KiB
 32× 32× 32  [CPU, Float64]       10    39.5s  50.1%   3.95s   1.00MiB  50.0%   102KiB
──────────────────────────────────────────────────────────────────────────────────────

Unfortunately still a bit higher than v0.22.0 (~50 KiB allocations) but much better and more acceptable than 17 MiB!

Remaining memory allocations seem to be occuring in fill_halo_regions.jl but tried inlining some functions and didn't help so I'll revisit the problem in the future.

julia> analyze_malloc(".")
323-element Array{CoverageTools.MallocInfo,1}:        
 ⋮                                                                                                    
 CoverageTools.MallocInfo(5008, "./benchmark/benchmark_static_ocean.jl.32885.mem", 36)                
 CoverageTools.MallocInfo(5952, "./benchmark/benchmark_utils.jl.32885.mem", 35)                       
 CoverageTools.MallocInfo(6080, "./src/TimeSteppers/time_stepping_kernels.jl.32885.mem", 139)         
 CoverageTools.MallocInfo(7136, "./src/Architectures.jl.32885.mem", 39)                               
 CoverageTools.MallocInfo(12160, "./src/Utils/tuple_utils.jl.32885.mem", 14)                          
 CoverageTools.MallocInfo(12480, "./src/BoundaryConditions/apply_no_penetration_bcs.jl.32885.mem", 20)
 CoverageTools.MallocInfo(12480, "./src/BoundaryConditions/apply_no_penetration_bcs.jl.32885.mem", 38)
 CoverageTools.MallocInfo(12800, "./src/BoundaryConditions/fill_halo_regions.jl.32885.mem", 45)       
 CoverageTools.MallocInfo(12800, "./src/BoundaryConditions/fill_halo_regions.jl.32885.mem", 49)       
 CoverageTools.MallocInfo(16640, "./src/BoundaryConditions/fill_halo_regions.jl.32885.mem", 52)       
 CoverageTools.MallocInfo(16640, "./src/BoundaryConditions/fill_halo_regions.jl.32885.mem", 53)       
 CoverageTools.MallocInfo(16640, "./src/BoundaryConditions/fill_halo_regions.jl.32885.mem", 56)       
 CoverageTools.MallocInfo(16640, "./src/BoundaryConditions/fill_halo_regions.jl.32885.mem", 57)       
 CoverageTools.MallocInfo(44000, "./src/BoundaryConditions/fill_halo_regions.jl.32885.mem", 26)       
 CoverageTools.MallocInfo(44000, "./src/BoundaryConditions/fill_halo_regions.jl.32885.mem", 27)       
 CoverageTools.MallocInfo(44000, "./src/BoundaryConditions/fill_halo_regions.jl.32885.mem", 28)       
 CoverageTools.MallocInfo(44000, "./src/BoundaryConditions/fill_halo_regions.jl.32885.mem", 29)       
 CoverageTools.MallocInfo(44000, "./src/BoundaryConditions/fill_halo_regions.jl.32885.mem", 30)       
 CoverageTools.MallocInfo(46400, "./src/BoundaryConditions/apply_flux_bcs.jl.32885.mem", 17)          
 CoverageTools.MallocInfo(46400, "./src/BoundaryConditions/apply_flux_bcs.jl.32885.mem", 30)          
 CoverageTools.MallocInfo(56320, "./src/BoundaryConditions/fill_halo_regions.jl.32885.mem", 20)       
 CoverageTools.MallocInfo(95040, "./src/BoundaryConditions/fill_halo_regions.jl.32885.mem", 25)       
 CoverageTools.MallocInfo(235751, "./benchmark/benchmark_static_ocean.jl.32885.mem", 55)              
 CoverageTools.MallocInfo(249600, "./src/BoundaryConditions/fill_halo_regions.jl.32885.mem", 70)      
 CoverageTools.MallocInfo(739712, "./benchmark/benchmark_utils.jl.32885.mem", 20)                     
 CoverageTools.MallocInfo(1686022, "./benchmark/benchmark_static_ocean.jl.32885.mem", 50) 

@codecov
Copy link

codecov bot commented Mar 6, 2020

Codecov Report

Merging #685 into master will decrease coverage by 0.43%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #685      +/-   ##
==========================================
- Coverage   77.99%   77.56%   -0.44%     
==========================================
  Files         120      120              
  Lines        2413     2478      +65     
==========================================
+ Hits         1882     1922      +40     
- Misses        531      556      +25     
Impacted Files Coverage Δ
src/BoundaryConditions/apply_flux_bcs.jl 53.84% <0.00%> (-29.49%) ⬇️
src/BoundaryConditions/boundary_function.jl 71.42% <0.00%> (-28.58%) ⬇️
src/Fields/field.jl 75.17% <0.00%> (-0.73%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 74cfb85...6eb8db9. Read the comment docs.

@glwagner
Copy link
Member

glwagner commented Mar 6, 2020

@views allocates because it creates a new view every time it runs. Not sure if that’s the issue. To avoid this we have to preallocate the view, somehow.

@ali-ramadhan ali-ramadhan merged commit e4b7763 into master Mar 6, 2020
@ali-ramadhan ali-ramadhan deleted the ar/fix-cpu-allocs branch March 6, 2020 11:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CPU performance regression: tons of allocations
2 participants