Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simpler boundary condition functions and forcing functions #697

Merged
merged 22 commits into from
Mar 15, 2020

Conversation

glwagner
Copy link
Member

@glwagner glwagner commented Mar 14, 2020

This PR simplifies the function signatures for boundary condition functions and forcing functions.
It also nukes the model.parameters field in favor of more local "parameters" functionality, and adds ParameterizedForcing and ParameterizedBoundaryCondition convenience types and functions.

The new forcing function signature is

F(i, j, k, grid, clock, state)

while the new boundary condition function signature is

bc(i, j, grid, clock, state)

where i, j are indices along the boundary.

state is a NamedTuple with fields :velocities, :tracers, and :diffusivities, each corresponding to an OffsetArray that references the data associated with each field.

In the future, if we make substantial changes to model, the hope is that we can modify/extend state appropriately and thus leave user code unbroken.

We should probably release a new minor version when this is merged.

Resolves #682 .

Copy link
Member

@ali-ramadhan ali-ramadhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beautiful PR ❤️

Also thanks for updating stratified_couette_flow.jl.

Agree that we should release v0.26 once this PR merged as there are breaking changes for user scripts.

GPU tests fail (looks like clocks are not isbits...) which I can look into a bit. Also need to check performance benchmarks.

+ F.u(i, j, k, grid, time, U, C, parameters))
+ x_curl_Uˢ_cross_U(i, j, k, grid, surface_waves, U, clock.time)
+ ∂t_uˢ(i, j, k, grid, surface_waves, clock.time)
+ F.u(i, j, k, grid, clock, (velocities=U, tracers=C, diffusivities=K)))
Copy link
Member

@ali-ramadhan ali-ramadhan Mar 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should double check to make sure these tuple constructions dont't degrade performance. I doubt it will but should always double check performance benchmarks before tagging a release...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For sure, it’s good to check. This stuff gets inlined so my hypothesis is that it compiles to the same thing.

@ali-ramadhan
Copy link
Member

mutable struct Clock{T}
       time :: T
  iteration :: Int
end

Ah right so mutable structs are not isbits and so they cannot be passed into GPU kernels...

We might have to roll back to using time, iteration in kernel and forcing/boundary function signatures instead of clock.

But maybe it's simple enough that it could be adapted to the GPU...?

@vchuravy @maleadt Would it be possible to adapt a very simple mutable struct like Clock{T} to work in GPU kernels with Adapt.jl? It's never modified in a GPU kernel.

@ali-ramadhan
Copy link
Member

An alternative solution is to make Clock an immutable struct then redefine tick!(clock, Δt) to tick!(model, Δt) which creates a new clock and assigns it to model.clock.

This allocates memory would could become an issue for 1D models though (but maybe we're fine with that for now as 1D models have much large inefficiencies). But if we end up going back to time, iteration function signatures, then we'll end up introducing a second breaking change...

@glwagner
Copy link
Member Author

Hmm! Not a huge sacrifice to go back to (time, iteration) rather than clock, but certainly would be good to know what we can/can’t do.

@maleadt
Copy link
Collaborator

maleadt commented Mar 14, 2020

@vchuravy @maleadt Would it be possible to adapt a very simple mutable struct like Clock{T} to work in GPU kernels with Adapt.jl? It's never modified in a GPU kernel.

Sure, you could e.g. have an adapt rule return a named tuple so that you can still do .time etc, as long as you don't have any signatures relying on ::Clock. Getting mutables to work is much tougher, but not impossible either, especially if the object doesn't contain pointers (as Clock here).

@codecov
Copy link

codecov bot commented Mar 15, 2020

Codecov Report

Merging #697 into master will increase coverage by 0.10%.
The diff coverage is 78.78%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #697      +/-   ##
==========================================
+ Coverage   78.35%   78.45%   +0.10%     
==========================================
  Files         120      122       +2     
  Lines        2462     2469       +7     
==========================================
+ Hits         1929     1937       +8     
+ Misses        533      532       -1     
Impacted Files Coverage Δ
src/BoundaryConditions/BoundaryConditions.jl 100.00% <ø> (ø)
src/Forcing/Forcing.jl 100.00% <ø> (ø)
src/Models/incompressible_model.jl 88.88% <ø> (ø)
src/TimeSteppers/generic_time_stepping.jl 100.00% <ø> (ø)
src/TimeSteppers/time_stepping_kernels.jl 90.41% <ø> (ø)
src/TimeSteppers/velocity_and_tracer_tendencies.jl 100.00% <ø> (ø)
src/BoundaryConditions/boundary_function.jl 60.00% <33.33%> (ø)
src/Models/clock.jl 80.00% <50.00%> (-8.89%) ⬇️
src/BoundaryConditions/apply_no_penetration_bcs.jl 57.14% <66.66%> (ø)
src/BoundaryConditions/fill_halo_regions.jl 84.00% <100.00%> (ø)
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7a47b92...15beefe. Read the comment docs.

@glwagner
Copy link
Member Author

@ali-ramadhan looks like tests pass. Check it out. Thanks @maleadt for the tip.

Copy link
Member

@ali-ramadhan ali-ramadhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance benchmarks look good (show no regression, this PR actually reduces CPU allocations):

Julia Version 1.3.1
Commit 2d5741174c (2019-12-30 21:36 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)
  GPU: TITAN V

 ──────────────────────────────────────────────────────────────────────────────────────
        Static ocean benchmarks                Time                   Allocations      
                                       ──────────────────────   ───────────────────────
           Tot / % measured:                 152s / 45.2%           13.1GiB / 0.71%    

 Section                       ncalls     time   %tot     avg     alloc   %tot      avg
 ──────────────────────────────────────────────────────────────────────────────────────
  32× 32× 32  [CPU, Float32]       10   52.8ms  0.08%  5.28ms    743KiB  0.76%  74.3KiB
  32× 32× 32  [CPU, Float64]       10   55.2ms  0.08%  5.52ms    743KiB  0.76%  74.3KiB
  32× 32× 32  [GPU, Float32]       10   26.6ms  0.04%  2.66ms   11.2MiB  11.7%  1.12MiB
  32× 32× 32  [GPU, Float64]       10   26.7ms  0.04%  2.67ms   11.2MiB  11.7%  1.12MiB
  64× 64× 64  [CPU, Float32]       10    385ms  0.56%  38.5ms    743KiB  0.76%  74.3KiB
  64× 64× 64  [CPU, Float64]       10    395ms  0.57%  39.5ms    743KiB  0.76%  74.3KiB
  64× 64× 64  [GPU, Float32]       10   27.6ms  0.04%  2.76ms   11.2MiB  11.7%  1.12MiB
  64× 64× 64  [GPU, Float64]       10   28.0ms  0.04%  2.80ms   11.2MiB  11.7%  1.12MiB
 128×128×128  [CPU, Float32]       10    3.36s  4.90%   336ms    743KiB  0.76%  74.3KiB
 128×128×128  [CPU, Float64]       10    3.41s  4.97%   341ms    743KiB  0.76%  74.3KiB
 128×128×128  [GPU, Float32]       10   46.7ms  0.07%  4.67ms   11.2MiB  11.7%  1.12MiB
 128×128×128  [GPU, Float64]       10   46.0ms  0.07%  4.60ms   11.2MiB  11.7%  1.12MiB
 256×256×256  [CPU, Float32]       10    30.1s  43.8%   3.01s    743KiB  0.76%  74.3KiB
 256×256×256  [CPU, Float64]       10    30.0s  43.7%   3.00s    743KiB  0.76%  74.3KiB
 256×256×256  [GPU, Float32]       10    340ms  0.50%  34.0ms   11.2MiB  11.7%  1.12MiB
 256×256×256  [GPU, Float64]       10    337ms  0.49%  33.7ms   11.2MiB  11.7%  1.12MiB
 ──────────────────────────────────────────────────────────────────────────────────────

Master branch:

 ──────────────────────────────────────────────────────────────────────────────────────
        Static ocean benchmarks                Time                   Allocations      
                                       ──────────────────────   ───────────────────────
           Tot / % measured:                 153s / 45.0%           13.2GiB / 0.74%    

 Section                       ncalls     time   %tot     avg     alloc   %tot      avg
 ──────────────────────────────────────────────────────────────────────────────────────
  32× 32× 32  [CPU, Float32]       10   54.2ms  0.08%  5.42ms   1.00MiB  1.01%   102KiB
  32× 32× 32  [CPU, Float64]       10   55.1ms  0.08%  5.51ms   1.00MiB  1.01%   102KiB
  32× 32× 32  [GPU, Float32]       10   28.1ms  0.04%  2.81ms   11.4MiB  11.5%  1.14MiB
  32× 32× 32  [GPU, Float64]       10   27.6ms  0.04%  2.76ms   11.4MiB  11.5%  1.14MiB
  64× 64× 64  [CPU, Float32]       10    402ms  0.58%  40.2ms   1.00MiB  1.01%   102KiB
  64× 64× 64  [CPU, Float64]       10    404ms  0.58%  40.4ms   1.00MiB  1.01%   102KiB
  64× 64× 64  [GPU, Float32]       10   30.1ms  0.04%  3.01ms   11.4MiB  11.5%  1.14MiB
  64× 64× 64  [GPU, Float64]       10   28.6ms  0.04%  2.86ms   11.4MiB  11.5%  1.14MiB
 128×128×128  [CPU, Float32]       10    3.37s  4.88%   337ms   1.00MiB  1.01%   102KiB
 128×128×128  [CPU, Float64]       10    3.41s  4.94%   341ms   1.00MiB  1.01%   102KiB
 128×128×128  [GPU, Float32]       10   46.6ms  0.07%  4.66ms   11.4MiB  11.5%  1.14MiB
 128×128×128  [GPU, Float64]       10   46.0ms  0.07%  4.60ms   11.4MiB  11.5%  1.14MiB
 256×256×256  [CPU, Float32]       10    30.3s  43.8%   3.03s   1.00MiB  1.01%   102KiB
 256×256×256  [CPU, Float64]       10    30.2s  43.8%   3.02s   1.00MiB  1.01%   102KiB
 256×256×256  [GPU, Float32]       10    338ms  0.49%  33.8ms   11.4MiB  11.5%  1.14MiB
 256×256×256  [GPU, Float64]       10    337ms  0.49%  33.7ms   11.4MiB  11.5%  1.14MiB
 ──────────────────────────────────────────────────────────────────────────────────────

@ali-ramadhan ali-ramadhan merged commit 56c3608 into master Mar 15, 2020
@ali-ramadhan ali-ramadhan deleted the glw/boundary-condition-function-arguments branch March 15, 2020 12:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Stabilizing forcing function arguments and boundary condition function arguments
3 participants