Improve kernel efficiency of the WENO algorithm #3518

simone-silvestri · 2024-03-21T16:10:10Z

This PR tries to improve the GPU efficiency of the WENO algorithm by

using fast math for smoothness calculation (this should not be a problem since weights are normalized)
resorting the WENO algorithm to enforce register reuse by accumulating the solution instead of computing all the stencils at tge same time

WENO-Z weights are calculated as $$\alpha_s = C_s \left( 1 + \left(\frac{\tau}{\beta_s +\varepsilon}\right)^2 \right)$$
and the interpolation is calculated as $$\psi =\frac{1}{\sum \alpha_s} \sum \psi_s \alpha_s$$
so if we reorder we can calculate $\psi$ as $$\psi = \frac{ \tau^2 \hat{\psi}_1 + \hat{\psi}_2}{ \tau^2 \sum \alpha^{\star}_s + 1}$$
where $$\hat{\psi}_1 = \sum\psi_s \alpha^{\star}_s$$ and $$\hat{\psi}_2 = \sum \psi_s C_s$$ and $\alpha^{\star}_s$ are the WENO-JS coefficients that depend only on the local stencil: $$\alpha^{\star}_s = \frac{C_s}{(\beta_s + \varepsilon)^2}$$
We can then calculate stencils one by one by accumulating the results and "throwing away" registers we don't need after the computation.

This PR is a draft because, despite it works, everything is written down manually unrolled and maybe there is a way to express the same concept with metaprogramming

….jl into ss/correct-partitions

glwagner · 2024-03-21T16:58:57Z

src/AbstractOperations/conditional_operations.jl

@@ -1,6 +1,6 @@
 using Oceananigans.Fields: OneField
 using Oceananigans.Grids: architecture
-using Oceananigans.Architectures: on_architecture
+import Oceananigans.Architectures: on_architecture


should we put this in another PR? this seems important...

glwagner · 2024-03-21T17:00:06Z

src/TurbulenceClosures/turbulence_closure_implementations/ri_based_vertical_diffusivity.jl

@@ -1,4 +1,4 @@
-using Oceananigans.Architectures: architecture, on_architecture


all these on_architecture changes create some noise in this PR which is making it harder to review, might make sense to put them in another PR

glwagner · 2024-03-21T17:01:15Z

src/Advection/weno_default_stencils.jl

+using Printf
+
+@inline getvalue(i, j, k, grid, ψ, args...) = @inbounds ψ[i, j, k]
+@inline getvalue(i, j, k, grid, ψ::Function, args...) = ψ(i, j, k, grid, args...)


don't we have an identity function that does this same thing?

Oceananigans.jl/src/Operators/interpolation_utils.jl

Lines 39 to 41 in 427c92f

@inline $identity(i, j, k, grid, c) = @inbounds c[i, j, k]

@inline $identity(i, j, k, grid, a::Number) = a

@inline $identity(i, j, k, grid, F::TF, args...) where TF<:Function = F(i, j, k, grid, args...)

glwagner · 2024-03-21T17:08:13Z

I think the algorithm for saving register usage could be easier to understand if it is written abstractly (ie within a loop that goes to WENO order N rather than written manually.

The main advantage of using metaprogramming is that it will be easier to maintain if this code needs to change in the future (ie even for the trivial reason that julia syntax changes). Rather than having to inspect and change 7 functions we can change one. It'll also main we can probably get away with fewer regression tests. Otherwise, to prevent the code from returning wrong results when/if it needs to be updated in the future, we need to test every WENO order...

These seem like pretty significant advantages, but I understand that everyone is busy.

…to ss/test-scaling

simone-silvestri and others added 30 commits March 5, 2024 13:21

correct partition

884f884

Merge branch 'main' into ss/correct-partitions

7a53b67

inothing -> isnothing

a335280

Merge branch 'ss/correct-partitions' of github.com:CliMA/Oceananigans…

db5ef62

….jl into ss/correct-partitions

Merge branch 'main' into ss/correct-partitions

f702cbe

Merge branch 'main' into ss/correct-partitions

ff158dd

does not work on julia 1.9?

73fd9c6

hopefully last one?

d2d2449

this was the last

eaf318f

other ones slipped

81b3339

other ones remaining

efe7b82

let's test it out!

d34e07b

using fastmath?

1e9cb8e

fast math and reduce divisions by 4

6e06ac7

remove inbounds where I don't need it

2a39065

try fastmath

159aab3

try simplifying

622fb37

bugfix

8f23c87

bugfix

79c3699

remove JS weno

9fef163

remove js weno

15c764f

reducing registers?

e91aa61

reduce registers

492f3af

try it now

70fa840

bugfix

c969ebc

json3

d906af0

try it now

47078e1

ready to go

da87968

new stencils

845a252

bugfix

6c3c136

simone-silvestri added 12 commits March 19, 2024 11:41

bugfix

f555571

fix getvalue

46d306b

improve scheme

cae5f02

remove useless file

8ed2480

bugfix

83447bc

bugfixes

58aad89

all done

5d7d243

bugfix

43f852d

bugfix

93a09d4

bugfix

39fdad4

more bugfixes

a7deaa0

add mention

a93197b

simone-silvestri added the 🚨 DO NOT MERGE 🚨 IN BIG BOLD RED CAPS WITH FLASHING SIRENS label Mar 21, 2024

glwagner reviewed Mar 21, 2024

View reviewed changes

navidcy added performance 🏍️ So we can get the wrong answer even faster numerics 🧮 So things don't blow up and boil the lobsters alive GPU 👾 Where Oceananigans gets its powers from labels Mar 21, 2024

simone-silvestri and others added 10 commits March 22, 2024 09:32

remove non-necessary directions

a878de2

add some metaprogramming

b4b79d2

full metaprogramming

9e653a0

this works

8e8112e

some comments

1d5d66b

Merge branch 'main' into ss/test-scaling

7898398

bugfix

194b684

Merge branch 'ss/test-scaling' of github.com:CliMA/Oceananigans.jl in…

530f2d8

…to ss/test-scaling

small try

e02a9d0

Merge remote-tracking branch 'origin/main' into ss/test-scaling

d489cda

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve kernel efficiency of the WENO algorithm #3518

Improve kernel efficiency of the WENO algorithm #3518

simone-silvestri commented Mar 21, 2024

glwagner Mar 21, 2024

glwagner Mar 21, 2024

glwagner Mar 21, 2024

glwagner Mar 21, 2024

glwagner commented Mar 21, 2024

		@@ -1,4 +1,4 @@
		using Oceananigans.Architectures: architecture, on_architecture

	@inline $identity(i, j, k, grid, c) = @inbounds c[i, j, k]
	@inline $identity(i, j, k, grid, a::Number) = a
	@inline $identity(i, j, k, grid, F::TF, args...) where TF<:Function = F(i, j, k, grid, args...)

Improve kernel efficiency of the WENO algorithm #3518

Are you sure you want to change the base?

Improve kernel efficiency of the WENO algorithm #3518

Conversation

simone-silvestri commented Mar 21, 2024

glwagner Mar 21, 2024

Choose a reason for hiding this comment

glwagner Mar 21, 2024

Choose a reason for hiding this comment

glwagner Mar 21, 2024

Choose a reason for hiding this comment

glwagner Mar 21, 2024

Choose a reason for hiding this comment

glwagner commented Mar 21, 2024