Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve kernel efficiency of the WENO algorithm #3518

Draft
wants to merge 159 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
159 commits
Select commit Hold shift + click to select a range
884f884
correct partition
simone-silvestri Mar 5, 2024
7a53b67
Merge branch 'main' into ss/correct-partitions
navidcy Mar 5, 2024
a335280
inothing -> isnothing
simone-silvestri Mar 6, 2024
db5ef62
Merge branch 'ss/correct-partitions' of github.com:CliMA/Oceananigans…
simone-silvestri Mar 6, 2024
f702cbe
Merge branch 'main' into ss/correct-partitions
simone-silvestri Mar 12, 2024
ff158dd
Merge branch 'main' into ss/correct-partitions
simone-silvestri Mar 12, 2024
73fd9c6
does not work on julia 1.9?
simone-silvestri Mar 12, 2024
d2d2449
hopefully last one?
simone-silvestri Mar 12, 2024
eaf318f
this was the last
simone-silvestri Mar 12, 2024
81b3339
other ones slipped
simone-silvestri Mar 12, 2024
efe7b82
other ones remaining
simone-silvestri Mar 12, 2024
d34e07b
let's test it out!
simone-silvestri Mar 12, 2024
1e9cb8e
using fastmath?
simone-silvestri Mar 12, 2024
6e06ac7
fast math and reduce divisions by 4
simone-silvestri Mar 13, 2024
2a39065
remove inbounds where I don't need it
simone-silvestri Mar 13, 2024
159aab3
try fastmath
simone-silvestri Mar 13, 2024
622fb37
try simplifying
simone-silvestri Mar 13, 2024
8f23c87
bugfix
simone-silvestri Mar 13, 2024
79c3699
bugfix
simone-silvestri Mar 13, 2024
9fef163
remove JS weno
simone-silvestri Mar 13, 2024
15c764f
remove js weno
simone-silvestri Mar 13, 2024
e91aa61
reducing registers?
simone-silvestri Mar 13, 2024
492f3af
reduce registers
simone-silvestri Mar 13, 2024
70fa840
try it now
simone-silvestri Mar 13, 2024
c969ebc
bugfix
simone-silvestri Mar 13, 2024
d906af0
json3
simone-silvestri Mar 13, 2024
47078e1
try it now
simone-silvestri Mar 13, 2024
da87968
ready to go
simone-silvestri Mar 13, 2024
845a252
new stencils
simone-silvestri Mar 13, 2024
6c3c136
bugfix
simone-silvestri Mar 13, 2024
44312e0
go back
simone-silvestri Mar 13, 2024
89028ad
sadly just weno js
simone-silvestri Mar 13, 2024
9026f36
optimized WENO
simone-silvestri Mar 13, 2024
f68cd61
try it out
simone-silvestri Mar 13, 2024
c084ae6
compiles
simone-silvestri Mar 13, 2024
05d8dbd
bugfix
simone-silvestri Mar 13, 2024
cfdfec8
go like this
simone-silvestri Mar 13, 2024
681bf51
another bugfix
simone-silvestri Mar 13, 2024
8066b63
try now
simone-silvestri Mar 13, 2024
7af0939
working!
simone-silvestri Mar 13, 2024
4cad70d
removing JS weno
simone-silvestri Mar 13, 2024
d58ef5f
the ntuple trick
simone-silvestri Mar 13, 2024
748728b
always zweno
simone-silvestri Mar 13, 2024
3330e6d
let's go
simone-silvestri Mar 13, 2024
3556c99
make this gpu work
simone-silvestri Mar 13, 2024
76b9cdb
test it out
simone-silvestri Mar 13, 2024
9df2bd0
just add refs
simone-silvestri Mar 13, 2024
f295384
Optimization done!
simone-silvestri Mar 13, 2024
108cb78
test
simone-silvestri Mar 13, 2024
9f2419b
correct
simone-silvestri Mar 13, 2024
f8d0356
weno js
simone-silvestri Mar 13, 2024
7f034c0
another fix
simone-silvestri Mar 13, 2024
d4651e2
add this
simone-silvestri Mar 13, 2024
6aea6b4
minimum registers?
simone-silvestri Mar 13, 2024
bff5c23
with bugfix
simone-silvestri Mar 13, 2024
2fa3ecd
try like this
simone-silvestri Mar 13, 2024
bb80e23
just write down the loop
simone-silvestri Mar 13, 2024
027dceb
unroll manually
simone-silvestri Mar 13, 2024
9bbe281
bugfix
simone-silvestri Mar 13, 2024
a0f3b87
add here
simone-silvestri Mar 13, 2024
893fccf
let's go with weno 4
simone-silvestri Mar 13, 2024
b756aff
go with weno4
simone-silvestri Mar 13, 2024
040b68d
go with weno 4
simone-silvestri Mar 13, 2024
af1603b
wenos till 5
simone-silvestri Mar 13, 2024
e46f935
last correction?
simone-silvestri Mar 13, 2024
6b95382
test it now
simone-silvestri Mar 13, 2024
629bd49
dorrect weno 5
simone-silvestri Mar 13, 2024
1f8a6ad
should be ok?
simone-silvestri Mar 13, 2024
7edc1ef
again
simone-silvestri Mar 13, 2024
548f7d2
try it like this?
simone-silvestri Mar 13, 2024
72ed359
using mod
simone-silvestri Mar 13, 2024
ccae862
added static arrays
simone-silvestri Mar 13, 2024
596d512
bugfix
simone-silvestri Mar 13, 2024
1ee5385
try it like this
simone-silvestri Mar 13, 2024
4826f0c
back
simone-silvestri Mar 13, 2024
c86cd99
done
simone-silvestri Mar 13, 2024
161d5d7
try shared
simone-silvestri Mar 13, 2024
b50b7e6
shared memory
simone-silvestri Mar 13, 2024
0c58a85
try
simone-silvestri Mar 13, 2024
f11b4fe
bugfix
simone-silvestri Mar 13, 2024
d5c9b8c
back to correct weno
simone-silvestri Mar 13, 2024
7e85c03
try it out
simone-silvestri Mar 13, 2024
880ea78
works? maybe...
simone-silvestri Mar 13, 2024
a23e066
shared mem let's go
simone-silvestri Mar 13, 2024
0b41b22
only some shared mem
simone-silvestri Mar 13, 2024
ed6b357
probably it will not work?
simone-silvestri Mar 13, 2024
bc5b1d9
test it out
simone-silvestri Mar 13, 2024
f84d257
this should work?
simone-silvestri Mar 13, 2024
c147599
retry unrolling
simone-silvestri Mar 13, 2024
3ab1b7d
let's try it out
simone-silvestri Mar 13, 2024
83778ea
remove tid and wrk
simone-silvestri Mar 13, 2024
defdb88
rmeove tid wrk
simone-silvestri Mar 13, 2024
746b4f4
remove all this
simone-silvestri Mar 13, 2024
86d0748
small correction
simone-silvestri Mar 13, 2024
0e3df4d
show registers
simone-silvestri Mar 13, 2024
4bf023a
test registers
simone-silvestri Mar 13, 2024
b5e7fd9
chnage stuff
simone-silvestri Mar 13, 2024
73a4c47
this should go here
simone-silvestri Mar 13, 2024
d72670a
final version?
simone-silvestri Mar 13, 2024
bd4d1dd
some cleaning up
simone-silvestri Mar 13, 2024
a46a942
still a problem on GPUs
simone-silvestri Mar 14, 2024
7fa8968
some changes
simone-silvestri Mar 14, 2024
bd2eb01
let's try it out!
simone-silvestri Mar 14, 2024
b1fffa0
some comments
simone-silvestri Mar 14, 2024
bc01a9f
good show method
simone-silvestri Mar 14, 2024
b2ae570
changes
simone-silvestri Mar 14, 2024
20b4dd2
some change
simone-silvestri Mar 15, 2024
3adc569
small correction
simone-silvestri Mar 15, 2024
c0343c9
unravel even more
simone-silvestri Mar 16, 2024
d876733
now it should work?
simone-silvestri Mar 16, 2024
0a9f530
let's see if tests pass
simone-silvestri Mar 16, 2024
6efe4f4
all the way to weno 6
simone-silvestri Mar 16, 2024
1ae9427
possibility of doing it another way
simone-silvestri Mar 16, 2024
b3dd2c4
mah
simone-silvestri Mar 16, 2024
1d0fde2
let's try it now
simone-silvestri Mar 16, 2024
88b67ad
give it a small try
simone-silvestri Mar 17, 2024
25d3e21
try again
simone-silvestri Mar 17, 2024
34e4b29
some bugfix
simone-silvestri Mar 17, 2024
870c23e
bugfix
simone-silvestri Mar 17, 2024
c768169
try again
simone-silvestri Mar 17, 2024
96eaaec
bugfix
simone-silvestri Mar 17, 2024
9746f5f
try it now
simone-silvestri Mar 17, 2024
49a60de
bugfix
simone-silvestri Mar 17, 2024
4be9503
go with the change
simone-silvestri Mar 17, 2024
0f29b68
tests
simone-silvestri Mar 17, 2024
eae5665
bugfix
simone-silvestri Mar 17, 2024
4ba20ee
final optimization
simone-silvestri Mar 17, 2024
0acf1e0
test it out
simone-silvestri Mar 17, 2024
5d843c0
this is probably the best?
simone-silvestri Mar 17, 2024
1a2d5ba
boh
simone-silvestri Mar 19, 2024
cb84026
bugfix
simone-silvestri Mar 19, 2024
39bbdd5
bugfix
simone-silvestri Mar 19, 2024
22dac23
using the reciprocal
simone-silvestri Mar 19, 2024
27d8775
let's goooo!
simone-silvestri Mar 19, 2024
d50b457
final versions?
simone-silvestri Mar 19, 2024
c7f9dd9
optimize
simone-silvestri Mar 19, 2024
b3da014
adding weno 6?
simone-silvestri Mar 19, 2024
f555571
bugfix
simone-silvestri Mar 19, 2024
46d306b
fix getvalue
simone-silvestri Mar 19, 2024
cae5f02
improve scheme
simone-silvestri Mar 19, 2024
8ed2480
remove useless file
simone-silvestri Mar 19, 2024
83447bc
bugfix
simone-silvestri Mar 19, 2024
58aad89
bugfixes
simone-silvestri Mar 20, 2024
5d7d243
all done
simone-silvestri Mar 20, 2024
43f852d
bugfix
simone-silvestri Mar 20, 2024
93a09d4
bugfix
simone-silvestri Mar 20, 2024
39fdad4
bugfix
simone-silvestri Mar 20, 2024
a7deaa0
more bugfixes
simone-silvestri Mar 20, 2024
a93197b
add mention
simone-silvestri Mar 21, 2024
a878de2
remove non-necessary directions
simone-silvestri Mar 22, 2024
b4b79d2
add some metaprogramming
simone-silvestri Mar 22, 2024
9e653a0
full metaprogramming
simone-silvestri Mar 22, 2024
8e8112e
this works
simone-silvestri Mar 24, 2024
1d5d66b
some comments
simone-silvestri Mar 24, 2024
7898398
Merge branch 'main' into ss/test-scaling
simone-silvestri Mar 25, 2024
194b684
bugfix
simone-silvestri Mar 26, 2024
530f2d8
Merge branch 'ss/test-scaling' of github.com:CliMA/Oceananigans.jl in…
simone-silvestri Mar 26, 2024
e02a9d0
small try
simone-silvestri Apr 2, 2024
d489cda
Merge remote-tracking branch 'origin/main' into ss/test-scaling
simone-silvestri Jun 19, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 3 additions & 9 deletions Manifest.toml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

julia_version = "1.10.3"
manifest_format = "2.0"
project_hash = "04d395caf937b0921325a77873167e8baa293a99"
project_hash = "ebd246712231f728236bc79e507cd0d30883b432"

[[deps.AbstractFFTs]]
deps = ["LinearAlgebra"]
Expand Down Expand Up @@ -403,15 +403,9 @@ version = "1.5.0"

[[deps.JSON3]]
deps = ["Dates", "Mmap", "Parsers", "PrecompileTools", "StructTypes", "UUIDs"]
git-tree-sha1 = "eb3edce0ed4fa32f75a0a11217433c31d56bd48b"
git-tree-sha1 = "95220473901735a0f4df9d1ca5b171b568b2daa3"
uuid = "0f8b85d8-7281-11e9-16c2-39a750bddbf1"
version = "1.14.0"

[deps.JSON3.extensions]
JSON3ArrowExt = ["ArrowTypes"]

[deps.JSON3.weakdeps]
ArrowTypes = "31f734f8-188a-4ce0-8406-c8a06bd891cd"
version = "1.13.2"

[[deps.JuliaNVTXCallbacks_jll]]
deps = ["Artifacts", "JLLWrappers", "Libdl", "Pkg"]
Expand Down
2 changes: 2 additions & 0 deletions Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ IncompleteLU = "40713840-3770-5561-ab4c-a76e7d0d7895"
InteractiveUtils = "b77e0a4c-d291-57a0-90e8-8db25a27a240"
IterativeSolvers = "42fd0dbc-a981-5370-80f2-aaf504508153"
JLD2 = "033835bb-8acc-5ee8-8aae-3f567f8a3819"
JSON3 = "0f8b85d8-7281-11e9-16c2-39a750bddbf1"
KernelAbstractions = "63c18a36-062a-441e-b654-da1e3ab1ce7c"
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
Logging = "56ddb016-857b-54e1-b83d-db4d58db5568"
Expand All @@ -32,6 +33,7 @@ Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
Rotations = "6038ab10-8711-5258-84ad-4b1120ba62dc"
SeawaterPolynomials = "d496a93d-167e-4197-9f49-d3af4ff8fe40"
SparseArrays = "2f01184e-e22b-5df5-ae63-d93ebab69eaf"
StaticArrays = "90137ffa-7385-5640-81b9-e52037218182"
Statistics = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"
StructArrays = "09ab397b-f2b6-538f-b94a-2f83cf4a842a"

Expand Down
5 changes: 5 additions & 0 deletions src/Advection/Advection.jl
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,8 @@ import Base: show, summary
import Oceananigans.Grids: required_halo_size
import Oceananigans.Architectures: on_architecture

using KernelAbstractions.Extras.LoopInfo: @unroll

abstract type AbstractAdvectionScheme{B, FT} end
abstract type AbstractCenteredAdvectionScheme{B, FT} <: AbstractAdvectionScheme{B, FT} end
abstract type AbstractUpwindBiasedAdvectionScheme{B, FT} <: AbstractAdvectionScheme{B, FT} end
Expand All @@ -65,6 +67,9 @@ include("centered_reconstruction.jl")
include("upwind_biased_reconstruction.jl")
include("weno_reconstruction.jl")
include("weno_interpolants.jl")
include("weno_default_stencils.jl")
include("weno_function_stencils.jl")
include("weno_velocity_stencils.jl")
include("stretched_weno_smoothness.jl")
include("multi_dimensional_reconstruction.jl")
include("vector_invariant_upwinding.jl")
Expand Down
1 change: 1 addition & 0 deletions src/Advection/tracer_advection_operators.jl
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ end
##### Tracer advection operator
#####


"""
div_uc(i, j, k, grid, advection, U, c)

Expand Down
2 changes: 1 addition & 1 deletion src/Advection/upwind_biased_advective_fluxes.jl
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,7 @@ end
##### Tracer advection operators
#####

@inline function advective_tracer_flux_x(i, j, k, grid, scheme::UpwindScheme, U, c)
@inline function advective_tracer_flux_x(i, j, k, grid, scheme::UpwindScheme, U, c)

@inbounds ũ = U[i, j, k]
cᴸ = _left_biased_interpolate_xᶠᵃᵃ(i, j, k, grid, scheme, c)
Expand Down
46 changes: 34 additions & 12 deletions src/Advection/vector_invariant_advection.jl
Original file line number Diff line number Diff line change
Expand Up @@ -108,9 +108,9 @@ Vector Invariant, Dimension-by-dimension reconstruction
└── smoothness δv²: FunctionStencil f = v_smoothness
```
"""
function VectorInvariant(; vorticity_scheme = EnstrophyConserving(),
function VectorInvariant(; vorticity_scheme = EnstrophyConserving(),
vorticity_stencil = VelocityStencil(),
vertical_scheme = EnergyConserving(),
vertical_scheme = EnergyConserving(),
divergence_scheme = vertical_scheme,
kinetic_energy_gradient_scheme = divergence_scheme,
upwinding = OnlySelfUpwinding(; cross_scheme = divergence_scheme),
Expand Down Expand Up @@ -152,15 +152,33 @@ const VectorInvariantVelocityVerticalUpwinding = VectorInvariant{<:Any, <:Any,
Base.summary(a::VectorInvariant) = string("Vector Invariant, Dimension-by-dimension reconstruction")
Base.summary(a::MultiDimensionalVectorInvariant) = string("Vector Invariant, Multidimensional reconstruction")

Base.show(io::IO, a::VectorInvariant{N, FT}) where {N, FT} =
print(io, summary(a), " \n",
" Vorticity flux scheme: ", "\n",
" $(a.vorticity_scheme isa WENO ? "├" : "└")── $(summary(a.vorticity_scheme))",
" $(a.vorticity_scheme isa WENO ? "\n └── smoothness ζ: $(a.vorticity_stencil)\n" : "\n")",
" Vertical advection / Divergence flux scheme: ", "\n",
" $(a.vertical_scheme isa WENO ? "├" : "└")── $(summary(a.vertical_scheme))",
"$(a.vertical_scheme isa AbstractUpwindBiasedAdvectionScheme ?
"\n └── upwinding treatment: $(a.upwinding)" : "")")
function Base.show(io::IO, a::VectorInvariant{N, FT}) where {N, FT}

δscheme = a.divergence_scheme
vscheme = a.vertical_scheme
ζscheme = a.vorticity_scheme
kscheme = a.kinetic_energy_gradient_scheme

msg1 = " Vorticity flux scheme: \n"
msg2 = "└── $(summary(ζscheme)) \n"
msg3 = " Kinetic energy gradient flux scheme: \n"
msg4 = "└── $(summary(kscheme)) \n"
msg5 = " Vertical advection scheme: \n"
msg6 = "└── $(summary(vscheme)) \n"
msg7 = (a.vertical_scheme isa EnergyConserving) ? "" : " Divergence flux scheme: \n"
msg8 = isempty(msg7) ? "" : "└── $(summary(a.divergence_scheme)) \n"

upwinding = (δscheme isa WENO) || (kscheme isa WENO) || (ζscheme isa WENO)

msg9 = upwinding ? " WENO smoothness stencils: \n" : ""
msg10 = !(ζscheme isa WENO) ? "" : "└── smoothness ζ: $(a.vorticity_stencil)\n"
msg11 = !(δscheme isa WENO) ? "" : "└── smoothness δx_U : $(a.upwinding.δU_stencil)\n"
msg12 = !(δscheme isa WENO) ? "" : "└── smoothness δy_V : $(a.upwinding.δV_stencil)\n"
msg11 = !(kscheme isa WENO) ? "" : "└── smoothness δx_u² : $(a.upwinding.δu²_stencil)\n"
msg12 = !(kscheme isa WENO) ? "" : "└── smoothness δy_v² : $(a.upwinding.δv²_stencil)\n"

return print(io, summary(a), "\n", msg1, msg2, msg3, msg4, msg5, msg6, msg7, msg8, msg9, msg10, msg11, msg12)
end

#####
##### Convenience for WENO Vector Invariant
Expand Down Expand Up @@ -309,6 +327,10 @@ end
return 1/Vᶜᶠᶜ(i, j, k, grid) * (Φᵟ + 𝒜ᶻ)
end

# Fallback for centered advection schemes
@inline upwinded_divergence_flux_Uᶠᶜᶜ(i, j, k, grid, scheme, u, v) = @inbounds u[i, j, k] * _symmetric_interpolate_xᶠᵃᵃ(i, j, k, grid, scheme.divergence_scheme, flux_div_xyᶜᶜᶜ, u, v)
@inline upwinded_divergence_flux_Vᶜᶠᶜ(i, j, k, grid, scheme, u, v) = @inbounds v[i, j, k] * _symmetric_interpolate_yᵃᶠᵃ(i, j, k, grid, scheme.divergence_scheme, flux_div_xyᶜᶜᶜ, u, v)

#####
##### Horizontal advection 4 formulations:
##### 1. Energy conservative
Expand Down Expand Up @@ -346,7 +368,7 @@ end
return - upwind_biased_product(v̂, ζᴸ, ζᴿ)
end

@inline function horizontal_advection_V(i, j, k, grid, scheme::VectorInvariantUpwindVorticity, u, v)
@inline function horizontal_advection_V(i, j, k, grid, scheme::VectorInvariantUpwindVorticity, u, v)

Sζ = scheme.vorticity_stencil

Expand Down
2 changes: 1 addition & 1 deletion src/Advection/vector_invariant_self_upwinding.jl
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
end

@inline function upwinded_divergence_flux_Vᶜᶠᶜ(i, j, k, grid, scheme::VectorInvariantSelfVerticalUpwinding, u, v)

δV_stencil = scheme.upwinding.δV_stencil
cross_scheme = scheme.upwinding.cross_scheme

Expand Down
Loading