-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve kernel efficiency of the WENO algorithm #3518
base: main
Are you sure you want to change the base?
Conversation
….jl into ss/correct-partitions
@@ -1,6 +1,6 @@ | |||
using Oceananigans.Fields: OneField | |||
using Oceananigans.Grids: architecture | |||
using Oceananigans.Architectures: on_architecture | |||
import Oceananigans.Architectures: on_architecture |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we put this in another PR? this seems important...
@@ -1,4 +1,4 @@ | |||
using Oceananigans.Architectures: architecture, on_architecture |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all these on_architecture
changes create some noise in this PR which is making it harder to review, might make sense to put them in another PR
using Printf | ||
|
||
@inline getvalue(i, j, k, grid, ψ, args...) = @inbounds ψ[i, j, k] | ||
@inline getvalue(i, j, k, grid, ψ::Function, args...) = ψ(i, j, k, grid, args...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't we have an identity
function that does this same thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oceananigans.jl/src/Operators/interpolation_utils.jl
Lines 39 to 41 in 427c92f
@inline $identity(i, j, k, grid, c) = @inbounds c[i, j, k] | |
@inline $identity(i, j, k, grid, a::Number) = a | |
@inline $identity(i, j, k, grid, F::TF, args...) where TF<:Function = F(i, j, k, grid, args...) |
I think the algorithm for saving register usage could be easier to understand if it is written abstractly (ie within a loop that goes to WENO order The main advantage of using metaprogramming is that it will be easier to maintain if this code needs to change in the future (ie even for the trivial reason that julia syntax changes). Rather than having to inspect and change 7 functions we can change one. It'll also main we can probably get away with fewer regression tests. Otherwise, to prevent the code from returning wrong results when/if it needs to be updated in the future, we need to test every WENO order... These seem like pretty significant advantages, but I understand that everyone is busy. |
…to ss/test-scaling
This PR tries to improve the GPU efficiency of the WENO algorithm by
WENO-Z weights are calculated as$$\alpha_s = C_s \left( 1 + \left(\frac{\tau}{\beta_s +\varepsilon}\right)^2 \right)$$ $$\psi =\frac{1}{\sum \alpha_s} \sum \psi_s \alpha_s$$ $\psi$ as $$\psi = \frac{ \tau^2 \hat{\psi}_1 + \hat{\psi}_2}{ \tau^2 \sum \alpha^{\star}_s + 1}$$ $$\hat{\psi}_1 = \sum\psi_s \alpha^{\star}_s$$ and $$\hat{\psi}_2 = \sum \psi_s C_s$$ and $\alpha^{\star}_s$ are the WENO-JS coefficients that depend only on the local stencil: $$\alpha^{\star}_s = \frac{C_s}{(\beta_s + \varepsilon)^2}$$
and the interpolation is calculated as
so if we reorder we can calculate
where
We can then calculate stencils one by one by accumulating the results and "throwing away" registers we don't need after the computation.
This PR is a draft because, despite it works, everything is written down manually unrolled and maybe there is a way to express the same concept with metaprogramming