Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve kernel efficiency of the WENO algorithm #3518

Draft
wants to merge 159 commits into
base: main
Choose a base branch
from

Conversation

simone-silvestri
Copy link
Collaborator

This PR tries to improve the GPU efficiency of the WENO algorithm by

  • using fast math for smoothness calculation (this should not be a problem since weights are normalized)
  • resorting the WENO algorithm to enforce register reuse by accumulating the solution instead of computing all the stencils at tge same time

WENO-Z weights are calculated as $$\alpha_s = C_s \left( 1 + \left(\frac{\tau}{\beta_s +\varepsilon}\right)^2 \right)$$
and the interpolation is calculated as $$\psi =\frac{1}{\sum \alpha_s} \sum \psi_s \alpha_s$$
so if we reorder we can calculate $\psi$ as $$\psi = \frac{ \tau^2 \hat{\psi}_1 + \hat{\psi}_2}{ \tau^2 \sum \alpha^{\star}_s + 1}$$
where $$\hat{\psi}_1 = \sum\psi_s \alpha^{\star}_s$$ and $$\hat{\psi}_2 = \sum \psi_s C_s$$ and $\alpha^{\star}_s$ are the WENO-JS coefficients that depend only on the local stencil: $$\alpha^{\star}_s = \frac{C_s}{(\beta_s + \varepsilon)^2}$$
We can then calculate stencils one by one by accumulating the results and "throwing away" registers we don't need after the computation.

This PR is a draft because, despite it works, everything is written down manually unrolled and maybe there is a way to express the same concept with metaprogramming

@simone-silvestri simone-silvestri added the 🚨 DO NOT MERGE 🚨 IN BIG BOLD RED CAPS WITH FLASHING SIRENS label Mar 21, 2024
@@ -1,6 +1,6 @@
using Oceananigans.Fields: OneField
using Oceananigans.Grids: architecture
using Oceananigans.Architectures: on_architecture
import Oceananigans.Architectures: on_architecture
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we put this in another PR? this seems important...

@@ -1,4 +1,4 @@
using Oceananigans.Architectures: architecture, on_architecture
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all these on_architecture changes create some noise in this PR which is making it harder to review, might make sense to put them in another PR

using Printf

@inline getvalue(i, j, k, grid, ψ, args...) = @inbounds ψ[i, j, k]
@inline getvalue(i, j, k, grid, ψ::Function, args...) = ψ(i, j, k, grid, args...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't we have an identity function that does this same thing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@inline $identity(i, j, k, grid, c) = @inbounds c[i, j, k]
@inline $identity(i, j, k, grid, a::Number) = a
@inline $identity(i, j, k, grid, F::TF, args...) where TF<:Function = F(i, j, k, grid, args...)

@glwagner
Copy link
Member

I think the algorithm for saving register usage could be easier to understand if it is written abstractly (ie within a loop that goes to WENO order N rather than written manually.

The main advantage of using metaprogramming is that it will be easier to maintain if this code needs to change in the future (ie even for the trivial reason that julia syntax changes). Rather than having to inspect and change 7 functions we can change one. It'll also main we can probably get away with fewer regression tests. Otherwise, to prevent the code from returning wrong results when/if it needs to be updated in the future, we need to test every WENO order...

These seem like pretty significant advantages, but I understand that everyone is busy.

@navidcy navidcy added performance 🏍️ So we can get the wrong answer even faster numerics 🧮 So things don't blow up and boil the lobsters alive GPU 👾 Where Oceananigans gets its powers from labels Mar 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🚨 DO NOT MERGE 🚨 IN BIG BOLD RED CAPS WITH FLASHING SIRENS GPU 👾 Where Oceananigans gets its powers from numerics 🧮 So things don't blow up and boil the lobsters alive performance 🏍️ So we can get the wrong answer even faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants