Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow initialization after updating to v0.90.0 #3381

Closed
zhihua-zheng opened this issue Nov 7, 2023 · 35 comments
Closed

Slow initialization after updating to v0.90.0 #3381

zhihua-zheng opened this issue Nov 7, 2023 · 35 comments
Labels
performance 🏍️ So we can get the wrong answer even faster

Comments

@zhihua-zheng
Copy link
Contributor

With the recent updates of Oceananigans, I noticed an unusual behavior of the model that often leads to much slower initialization.

Below is a minimum working example that demonstrates the issue:

using Oceananigans
using Oceananigans.Units

grid = RectilinearGrid(CPU(),
                       size = (3, 3, 3),
                       extent = (1, 1, 1),
                       topology = (Periodic, Bounded, Bounded))
model = HydrostaticFreeSurfaceModel(; grid)
simulation = Simulation(model, Δt=20minutes, stop_time=20days)

u, v, w = model.velocities
ζ = ∂x(v) - ∂y(u)
fields_slice = Dict("u" => u, "v" => v, "w" => w, "ζ" => ζ)
simulation.output_writers[:top] = NetCDFOutputWriter(model, fields_slice;
                                                     filename = "mwe.nc",
                                                     schedule = TimeInterval(0.5day),
                                                     overwrite_existing = true,
                                                     indices = (:, :, grid.Nz))
run!(simulation)

Running this code with Julia 1.9.3 and Oceananigans v0.90.0 gives an initialization time of ~ 15 minutes, much longer than common values of a few seconds. The same issue also appears on GPU.

This speed issue disappears either when ζ = ∂x(v) - ∂y(u) is replaced with ζ = Field(∂x(v) - ∂y(u)), or when ζ is the only variable in fields_slice. However, as pointed out by @tomchor, wrapping outputs in Field() tends to waste more memory (a very precious resource on the GPU), so it may be worthwhile to investigate further why this happens.

@navidcy
Copy link
Collaborator

navidcy commented Nov 7, 2023

Interesting! Which version were you before updating?

@navidcy
Copy link
Collaborator

navidcy commented Nov 7, 2023

When you say "initialization" you mean the time between when you call run!(simulation) and before the simulation actually starts?

@navidcy navidcy changed the title Slow initialization Slow initialization after updating to v0.90.0 Nov 7, 2023
@navidcy navidcy added the performance 🏍️ So we can get the wrong answer even faster label Nov 7, 2023
@zhihua-zheng
Copy link
Contributor Author

Interesting! Which version were you before updating?

I used v0.88.0 before. But even if I pin Oceananigans to v0.88.0, the issue still occurs, and the contents of Manifest.toml are not the same as before.

@zhihua-zheng
Copy link
Contributor Author

When you say "initialization" you mean the time between when you call run!(simulation) and before the simulation actually starts?

I think so, it corresponds to this line in the log for example:

[ Info: Initializing simulation...
[ Info:     ... simulation initialization complete (16.509 minutes)

@glwagner
Copy link
Member

glwagner commented Nov 7, 2023

Interesting! Which version were you before updating?

I used v0.88.0 before. But even if I pin Oceananigans to v0.88.0, the issue still occurs, and the contents of Manifest.toml are not the same as before.

That suggests that this could be an issue with your environment...

What other packages are you using?

What can happen is that a package got upgraded when you bumped up to 0.90.0, but then because compat was satisfied with that upgraded package, it did not get changed when you subsequently bumped down to 0.88.0. Upgrading is conservative, it doesn't happen unless you ask for it.

@navidcy
Copy link
Collaborator

navidcy commented Nov 7, 2023

Can you post the output of

using Pkg; Pkg.status()

@zhihua-zheng
Copy link
Contributor Author

That suggests that this could be an issue with your environment...

What other packages are you using?

This issue occurs in a clean Julia environment. No other package is added explicitly.

@navidcy
Copy link
Collaborator

navidcy commented Nov 7, 2023

I'll try to reproduce it!

@zhihua-zheng
Copy link
Contributor Author

Can you post the output of

using Pkg; Pkg.status()
Status `~/Projects/TRACE-SEAS/Test-LK/Project.toml`
  [9e8cae18] Oceananigans v0.90.0

@navidcy
Copy link
Collaborator

navidcy commented Nov 7, 2023

Can you post the output of

using Pkg; Pkg.status()
Status `~/Projects/TRACE-SEAS/Test-LK/Project.toml`
  [9e8cae18] Oceananigans v0.90.0

OK, I see. Pretty clean environment! :)

@glwagner
Copy link
Member

glwagner commented Nov 8, 2023

Possibly it matters what is in the global environment, but I'm not sure...

@navidcy
Copy link
Collaborator

navidcy commented Nov 9, 2023

hm..

@navidcy
Copy link
Collaborator

navidcy commented Nov 9, 2023

So I did a bench. I run this:

using Oceananigans
using Oceananigans.Units

grid = RectilinearGrid(CPU(),
                       size = (3, 3, 3),
                       extent = (1, 1, 1),
                       topology = (Periodic, Bounded, Bounded))
model = HydrostaticFreeSurfaceModel(; grid)

Δt=20minutes
simulation = Simulation(model, Δt=20minutes, stop_time=4Δt)

u, v, w = model.velocities
ζ = ∂x(v) - ∂y(u)
fields_slice = Dict("u" => u, "v" => v, "w" => w, "ζ" => ζ)
simulation.output_writers[:top] = NetCDFOutputWriter(model, fields_slice;
                                                     filename = "mwe.nc",
                                                     schedule = TimeInterval(0.5day),
                                                     overwrite_existing = true,
                                                     indices = (:, :, grid.Nz))

@time run!(simulation)

on my laptop on an evnironment with only Oceananigans.

Just for the record, my general env contains:

(@v1.9) pkg> st
Status `~/.julia/environments/v1.9/Project.toml`
  [6e4b80f9] BenchmarkTools v1.3.2
  [13f3f980] CairoMakie v0.10.12
  [e9467ef8] GLMakie v0.8.12
  [db073c08] GeoMakie v0.5.1
  [7073ff75] IJulia v1.24.2
  [12c4ca38] Imaginocean v0.1.0 `https://github.com/navidcy/Imaginocean.jl#main`
  [85f8d34a] NCDatasets v0.13.1
  [5fb14364] OhMyREPL v0.5.23
  [c3e4b0f8] Pluto v0.19.32
  [295af30f] Revise v3.5.7

Now on an environment with Oceananigans v0.89.3 I get:

julia> @time run!(simulation)
[ Info: Initializing simulation...
[ Info:     ... simulation initialization complete (18.715 minutes)
[ Info: Executing initial time step...
[ Info:     ... initial time step complete (7.933 seconds).
[ Info: Simulation is stopping after running for 18.861 minutes.
[ Info: Simulation time 1.333 hours equals or exceeds stop time 1.333 hours.
1134.663423 seconds (2.18 G allocations: 1.143 TiB, 10.69% gc time, 100.03% compilation time)

while with Oceananigans v0.90.1 I get:

julia> @time run!(simulation)
[ Info: Initializing simulation...
[ Info:     ... simulation initialization complete (13.845 minutes)
[ Info: Executing initial time step...
[ Info:     ... initial time step complete (8.351 seconds).
[ Info: Simulation is stopping after running for 13.998 minutes.
[ Info: Simulation time 1.333 hours equals or exceeds stop time 1.333 hours.
842.743291 seconds (2.18 G allocations: 1.143 TiB, 13.55% gc time, 100.04% compilation time)

Seems that v0.90.1 is even slightly better?

@zhihua-zheng, what do you reckon. Can you check what does your general Julia environment includes?

@jagoosw
Copy link
Collaborator

jagoosw commented Nov 9, 2023

I had a quick to at running this and got the same issue with v0.90.1 taking a very long time, what computers are you using? Perhaps this is an Apple Silicon problem?

@navidcy
Copy link
Collaborator

navidcy commented Nov 9, 2023

I'm on an Apple Silicon M1:

julia> versioninfo()
Julia Version 1.9.3
Commit bed2cd540a (2023-08-24 14:43 UTC)
Build Info:

    Note: This is an unofficial build, please report bugs to the project
    responsible for this build and not to the Julia project unless you can
    reproduce the issue using official builds available at https://julialang.org/downloads

Platform Info:
  OS: macOS (arm64-apple-darwin22.6.0)
  CPU: 10 × Apple M1 Max
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, apple-m1)
  Threads: 8 on 8 virtual cores
Environment:
  JULIA_EDITOR = code

@navidcy
Copy link
Collaborator

navidcy commented Nov 9, 2023

And with

julia> Threads.nthreads()
8

@zhihua-zheng
Copy link
Contributor Author

Can you check what does your general Julia environment includes?

My global Julia environment is empty, or maybe just with Oceananigans. Same issue on non-Apple computers.

@navidcy
Copy link
Collaborator

navidcy commented Nov 9, 2023

I'm puzzled, why can't I reproduce the issue?....

@zhihua-zheng
Copy link
Contributor Author

Wait, isn't your initialization time also very long with both versions of Oceananigans? Is that expected?

@navidcy
Copy link
Collaborator

navidcy commented Nov 9, 2023

Well, I don't know what is long or short (but I agree, O(10mins) seems bit long).
But most importantly, in my benches I find that v0.90.1 is faster than 0.89.3, contrary to what you claim.

@zhihua-zheng
Copy link
Contributor Author

Well, I don't know what is long or short (but I agree, O(10mins) seems bit long). But most importantly, in my benches I find that v0.90.1 is faster than 0.89.3, contrary to what you claim.

Do you see a difference if you wrap ζ in Field? I guess that is my major claim.

@glwagner
Copy link
Member

Ah yes as noted here:

This speed issue disappears either when ζ = ∂x(v) - ∂y(u) is replaced with ζ = Field(∂x(v) - ∂y(u)), or when ζ is the only variable in fields_slice.

That's pretty bizarre. That makes me think it's some kind of weird interaction between NCDatasets and Oceananigans. I can say, with ζ = Field(∂x(v) - ∂y(u)) the output type is different. It would seem more complex in fact, because it has one more layer of indirection (ie it's the window that refers to a 3D computed field, rather than a windowed computed field). So I don't know why that would compile faster. Honestly I don't think any of us has much experience optimizing compile time. Perhaps first reading this blog post:

https://julialang.org/blog/2020/08/invalidations/

and then the source code for NetCDFOutputWriter will lead to revelations.

@navidcy
Copy link
Collaborator

navidcy commented Nov 10, 2023

@zhihua-zheng is this a non-issue with JLD2?

@jagoosw
Copy link
Collaborator

jagoosw commented Nov 10, 2023

Before I updated and induced the problem I was getting O(seconds) initialisation time on M1 Mac so I think you're experiencing the problem with both versions @navidcy

I tried using JLD2 and got the same slow initialisation. Do they have a common dependency that does something to the field?

@jagoosw
Copy link
Collaborator

jagoosw commented Nov 10, 2023

Another thing, this is coming from BinaryOperation only I think because if I try and save e.g. x = u + v I also get the problem, but if I save ∂x(v) only I do not, so its not the saving of all AbstractOperations

@glwagner
Copy link
Member

They do... they both call

function output_indices(output::Union{AbstractField, Reduction}, grid, indices, with_halos)
indices = validate_indices(indices, location(output), grid)
if !with_halos # Maybe chop those indices
loc = map(instantiate, location(output))
topo = map(instantiate, topology(grid))
indices = map(restrict_to_interior, indices, loc, topo, size(grid))
end
return indices
end
function construct_output(user_output::Union{AbstractField, Reduction}, grid, user_indices, with_halos)
indices = output_indices(user_output, grid, user_indices, with_halos)
return Field(user_output; indices)
end

But the code is identical for both until

return Field(user_output; indices)

Then if we have AbstractOperation we go to

function Field(operand::AbstractOperation;

whereas if you call Field(field::Field; indices) you get

Field(f::Field; indices=f.indices) = view(f, indices...) # hmm...

Of course, if you use Field(abstract_op) then you also have to call the above code.

Here's an idea. What if you execute compute!(Field(abstract_op)) but still pass abstract_op to the output writers. Is it slow in that case? (wondering if the ordering matters)

@jagoosw
Copy link
Collaborator

jagoosw commented Nov 10, 2023

Hmm I'll have a go at that tomorrow. Weirdly when I try and run the above script from the Oceananigans repo (i.e. not installed with Pkg) I do not get this problem.

@glwagner
Copy link
Member

glwagner commented Nov 10, 2023

Hmm I'll have a go at that tomorrow. Weirdly when I try and run the above script from the Oceananigans repo (i.e. not installed with Pkg) I do not get this problem.

That might suggest its an interaction with another package, because when you use Oceananigans' repo you probably use the repo's Manifest.toml; otherwise you may be pulling in different packages. You can compare the Oceananigans Manifest to whatever Manifest gets used in the slow case. I think there may be tools for comparing Manifests?

@johnryantaylor
Copy link
Contributor

Another PhD student working with me ran into this problem of very long initialization times too. He was using a Windows laptop and, like @jagoosw, I have been able to reproduce it using an apple silicon mac. Also, we are using JLD2 instead of NetCDF, so I don't think the problem is specific to apple silicon or NetCDF. Also, the problem goes away when I downgrade to Oceananigans v0.85.0. Wrapping the output in Field() as @zhihua-zheng suggested does seem to help, but even when doing that, the startup is quite slow using v0.90.1 when complex diagnostics are calculated. Downgrading to v0.85.0 downgrades other dependencies, so its hard to tell where the problem is arising. In case its useful, here is the list of packages that are downgraded when I go from 0.90.1 to 0.85.0:
⌅ [052768ef] ↓ CUDA v5.1.0 ⇒ v4.4.1
⌅ [0c68f7d7] ↓ GPUArrays v9.1.0 ⇒ v8.8.1
⌅ [61eb1bfa] ↓ GPUCompiler v0.25.0 ⇒ v0.21.4
⌅ [85f8d34a] ↓ NCDatasets v0.13.1 ⇒ v0.12.17
⌃ [9e8cae18] ↓ Oceananigans v0.90.1 ⇒ v0.85.0
⌅ [0e08944d] ↓ PencilArrays v0.19.2 ⇒ v0.18.1
⌅ [4ee394cb] ↓ CUDA_Driver_jll v0.7.0+0 ⇒ v0.5.0+1
⌅ [76a88914] ↓ CUDA_Runtime_jll v0.10.0+1 ⇒ v0.6.0+0

@glwagner
Copy link
Member

Thank you all for your informative responses! I dug into this a bit...

tl;dr It looks like the issue may be "fixed" on julia 1.10-beta3 (and there is a 1.10-rc1 now).

Benchmarks

Using an empty Project.toml, I can reproduce the issue when using julia 1.9:

(base) gregorywagner:test/ $ julia19 --project test.jl                                                      [3:18:25]
┌ Warning: Overwriting existing ./mwe.nc.
└ @ Oceananigans.OutputWriters ~/.julia/packages/Oceananigans/f5Cpw/src/OutputWriters/netcdf_output_writer.jl:359
[ Info: Initializing simulation...
[ Info:     ... simulation initialization complete (15.100 minutes)
[ Info: Executing initial time step...
[ Info:     ... initial time step complete (8.244 seconds).
[ Info: Simulation is stopping after running for 15.251 minutes.
[ Info: Simulation time 1.333 hours equals or exceeds stop time 1.333 hours.
919.254871 seconds (2.19 G allocations: 1.143 TiB, 12.80% gc time, 100.00% compilation time)

Way, way too long. (So it's good we have this issue.)

But on julia 1.10-beta3 I get

[ Info: Simulation time 1.333 hours equals or exceeds stop time 1.333 hours.
 17.237010 seconds (26.28 M allocations: 1.741 GiB, 2.14% gc time, 99.58% compilation time: <1% of which was recompilation)

much better. (Note that on 1.10 we get a lot of annoying warnings which is documented on #3374 and is relatively easily fixed.)

Also, things are fine if I use the Oceananigans Manifest.toml, even with julia 1.9:

(base) gregorywagner:Oceananigans.jl/ (main✗) $ julia19 --project test.jl                                                                                      [3:13:09]
┌ Warning: Overwriting existing ./mwe.nc.
└ @ Oceananigans.OutputWriters ~/Projects/Oceananigans.jl/src/OutputWriters/netcdf_output_writer.jl:359
[ Info: Initializing simulation...
[ Info:     ... simulation initialization complete (2.381 seconds)
[ Info: Executing initial time step...
[ Info:     ... initial time step complete (8.751 seconds).
[ Info: Simulation is stopping after running for 12.013 seconds.
[ Info: Simulation time 1.333 hours equals or exceeds stop time 1.333 hours.
 16.285006 seconds (38.66 M allocations: 2.598 GiB, 5.35% gc time, 99.82% compilation time)

That's weird...

What's the problem?

We haven't figured it out. One clue could be that downgrading to 0.85.0 fixes the problem.

For completeness, here's a diff of the Oceananigans Manifest.toml, and the Manifest.toml in my "clean" environment:

Click me
3c3
< julia_version = "1.9.3"
---
> julia_version = "1.9.2"
5c5
< project_hash = "72ed8b1b7715053c6d7b675f75dd867b9f153685"
---
> project_hash = "bfbc7775b0a550569ac26abdec5f544ef80e881c"
23c23
< git-tree-sha1 = "76289dc51920fdc6e0013c872ba9551d54961c24"
---
> git-tree-sha1 = "02f731463748db57cc2ebfbd9fbc9ce8280d3433"
25c25
< version = "3.6.2"
---
> version = "3.7.1"
37c37
< git-tree-sha1 = "f83ec24f76d4c8f525099b2ac475fc098138ec31"
---
> git-tree-sha1 = "16267cf279190ca7c1b30d020758ced95db89cd0"
39c39
< version = "7.4.11"
---
> version = "7.5.1"
93,94c93,94
< deps = ["AbstractFFTs", "Adapt", "BFloat16s", "CEnum", "CUDA_Driver_jll", "CUDA_Runtime_Discovery", "CUDA_Runtime_jll", "Crayons", "DataFrames", "ExprTools", "GPUArrays", "GPUCompiler", "KernelAbstractions", "LLVM", "LazyArtifacts", "Libdl", "LinearAlgebra", "Logging", "NVTX", "Preferences", "PrettyTables", "Printf", "Random", "Random123", "RandomNumbers", "Reexport", "Requires", "SparseArrays", "Statistics", "UnsafeAtomicsLLVM"]
< git-tree-sha1 = "f062a48c26ae027f70c44f48f244862aec47bf99"
---
> deps = ["AbstractFFTs", "Adapt", "BFloat16s", "CEnum", "CUDA_Driver_jll", "CUDA_Runtime_Discovery", "CUDA_Runtime_jll", "Crayons", "DataFrames", "ExprTools", "GPUArrays", "GPUCompiler", "KernelAbstractions", "LLVM", "LLVMLoopInfo", "LazyArtifacts", "Libdl", "LinearAlgebra", "Logging", "NVTX", "Preferences", "PrettyTables", "Printf", "Random", "Random123", "RandomNumbers", "Reexport", "Requires", "SparseArrays", "Statistics", "UnsafeAtomicsLLVM"]
> git-tree-sha1 = "64461b0e9df3069248979113ce8ab6d11bd371cf"
96,97c96
< version = "5.0.0"
< weakdeps = ["SpecialFunctions"]
---
> version = "5.1.0"
99a99
>     ChainRulesCoreExt = "ChainRulesCore"
101a102,105
>     [deps.CUDA.weakdeps]
>     ChainRulesCore = "d360d2e6-b24c-11e9-a2a3-2a2ae2dbcce4"
>     SpecialFunctions = "276daf66-3868-5448-9aa4-cd146d93841b"
>
104c108
< git-tree-sha1 = "35a37bb72b35964f2895c12c687ae263b4ac170c"
---
> git-tree-sha1 = "1e42ef1bdb45487ff28de16182c0df4920181dc3"
106c110
< version = "0.6.0+3"
---
> version = "0.7.0+0"
116c120
< git-tree-sha1 = "178a606891f82b6d734f0eadd5336b7aad44d5df"
---
> git-tree-sha1 = "92394521ec4582c11d089a3b15b76ef2cb850994"
118c122
< version = "0.9.2+1"
---
> version = "0.10.0+1"
174c178
< git-tree-sha1 = "131498c78453d02b4821d8b93f6e44595399f19f"
---
> git-tree-sha1 = "253193dfb0384646936c5ff3230b27a20d91261e"
176c180
< version = "0.2.3"
---
> version = "0.2.4"
282c286
< git-tree-sha1 = "8ad8f375ae365aa1eb2f42e2565a40b55a4b69a8"
---
> git-tree-sha1 = "85d7fb51afb3def5dcb85ad31c3707795c8bccc1"
284c288
< version = "9.0.0"
---
> version = "9.1.0"
294c298
< git-tree-sha1 = "5e4487558477f191c043166f8301dd0b4be4e2b2"
---
> git-tree-sha1 = "a846f297ce9d09ccba02ead0cae70690e072a119"
296c300
< version = "0.24.5"
---
> version = "0.25.0"
308a313,318
> [[deps.Hwloc_jll]]
> deps = ["Artifacts", "JLLWrappers", "Libdl"]
> git-tree-sha1 = "8ecb0b34472a3c98f945e3c75fc7d5428d165511"
> uuid = "e33a78d0-f292-5ffc-b300-72abe9b543c8"
> version = "2.9.3+0"
>
348c358
< git-tree-sha1 = "1169632f425f79429f245113b775a0e3d121457c"
---
> git-tree-sha1 = "b435d190ef8369cf4d79cc9dd5fba88ba0165307"
350c360
< version = "0.9.2"
---
> version = "0.9.3"
358,359c368,369
< deps = ["FileIO", "MacroTools", "Mmap", "OrderedCollections", "Pkg", "Printf", "Reexport", "Requires", "TranscodingStreams", "UUIDs"]
< git-tree-sha1 = "c11d691a0dc8e90acfa4740d293ade57f68bfdbb"
---
> deps = ["FileIO", "MacroTools", "Mmap", "OrderedCollections", "Pkg", "PrecompileTools", "Printf", "Reexport", "Requires", "TranscodingStreams", "UUIDs"]
> git-tree-sha1 = "9bbb5130d3b4fa52846546bca4791ecbdfb52730"
361c371
< version = "0.4.35"
---
> version = "0.4.38"
383c393
< git-tree-sha1 = "5f1ecfddb6abde48563d08b2cc7e5116ebcd6c27"
---
> git-tree-sha1 = "95063c5bc98ba0c47e75e05ae71f1fed4deac6f6"
385c395
< version = "0.9.10"
---
> version = "0.9.12"
394,395c404,405
< deps = ["CEnum", "LLVMExtra_jll", "Libdl", "Printf", "Unicode"]
< git-tree-sha1 = "4ea2928a96acfcf8589e6cd1429eff2a3a82c366"
---
> deps = ["CEnum", "LLVMExtra_jll", "Libdl", "Preferences", "Printf", "Requires", "Unicode"]
> git-tree-sha1 = "c879e47398a7ab671c782e02b51a4456794a7fa3"
397c407,411
< version = "6.3.0"
---
> version = "6.4.0"
> weakdeps = ["BFloat16s"]
>
>     [deps.LLVM.extensions]
>     BFloat16sExt = "BFloat16s"
401c415
< git-tree-sha1 = "e7c01b69bcbcb93fd4cbc3d0fea7d229541e18d2"
---
> git-tree-sha1 = "a84f8f1e8caaaa4e3b4c101306b9e801d3883ace"
403c417,422
< version = "0.0.26+0"
---
> version = "0.0.27+0"
>
> [[deps.LLVMLoopInfo]]
> git-tree-sha1 = "2e5c102cfc41f48ae4740c7eca7743cc7e7b75ea"
> uuid = "8b046642-f1f6-4319-8d3c-209ddc03c586"
> version = "1.0.0"
412c431
< git-tree-sha1 = "f2355693d6778a178ade15952b7ac47a4ff97996"
---
> git-tree-sha1 = "50901ebc375ed41dbf8058da26f9de442febbbec"
414c433
< version = "1.3.0"
---
> version = "1.3.1"
499c518
< git-tree-sha1 = "781916a2ebf2841467cda03b6f1af43e23839d85"
---
> git-tree-sha1 = "8f6af051b9e8ec597fa09d8885ed79fd582f33c9"
501c520
< version = "0.1.9"
---
> version = "0.1.10"
526c545
< git-tree-sha1 = "a7023883872e52bc29bcaac74f19adf39347d2d5"
---
> git-tree-sha1 = "b01beb91d20b0d1312a9471a36017b5b339d26de"
528c547
< version = "10.1.4+0"
---
> version = "10.1.4+1"
570a590,601
> [[deps.Oceananigans]]
> deps = ["Adapt", "CUDA", "Crayons", "CubedSphere", "Dates", "Distances", "DocStringExtensions", "FFTW", "Glob", "IncompleteLU", "InteractiveUtils", "IterativeSolvers", "JLD2", "KernelAbstractions", "LinearAlgebra", "Logging", "MPI", "NCDatasets", "OffsetArrays", "OrderedCollections", "PencilArrays", "PencilFFTs", "Pkg", "Printf", "Random", "Rotations", "SeawaterPolynomials", "SparseArrays", "Statistics", "StructArrays"]
> path = "/Users/gregorywagner/Projects/Oceananigans.jl"
> uuid = "9e8cae18-63c1-5223-a75c-80ca9d6e9a09"
> version = "0.90.1"
>
>     [deps.Oceananigans.extensions]
>     OceananigansEnzymeCoreExt = "EnzymeCore"
>
>     [deps.Oceananigans.weakdeps]
>     EnzymeCore = "f151be2c-9106-41f4-ab19-57ee4f262869"
>
588,589c619,620
< deps = ["Artifacts", "CompilerSupportLibraries_jll", "JLLWrappers", "LazyArtifacts", "Libdl", "MPIPreferences", "TOML"]
< git-tree-sha1 = "e25c1778a98e34219a00455d6e4384e017ea9762"
---
> deps = ["Artifacts", "CompilerSupportLibraries_jll", "Hwloc_jll", "JLLWrappers", "LazyArtifacts", "Libdl", "MPIPreferences", "PMIx_jll", "TOML", "Zlib_jll", "libevent_jll", "prrte_jll"]
> git-tree-sha1 = "694458ae803b684f09c07f90459cb79655fb377d"
591c622
< version = "4.1.6+0"
---
> version = "5.0.0+0"
595c626
< git-tree-sha1 = "ceeda72c9fd6bbebc4f4f598560789145a8b6c4c"
---
> git-tree-sha1 = "cc6e1927ac521b659af340e0ca45828a3ffc748f"
597c628
< version = "3.0.11+0"
---
> version = "3.0.12+0"
609a641,646
> [[deps.PMIx_jll]]
> deps = ["Artifacts", "Hwloc_jll", "JLLWrappers", "Libdl", "Zlib_jll", "libevent_jll"]
> git-tree-sha1 = "8b3b19351fa24791f94d7ae85faf845ca1362541"
> uuid = "32165bc3-0280-59bc-8c0b-c33b6203efab"
> version = "4.2.7+0"
>
618c655
< git-tree-sha1 = "716e24b21538abc91f6205fd1d8363f39b442851"
---
> git-tree-sha1 = "a935806434c9d4c506ba941871b327b96d41f2bf"
620c657
< version = "2.7.2"
---
> version = "2.8.0"
673c710
< git-tree-sha1 = "ee094908d720185ddbdc58dbe0c1cbe35453ec7a"
---
> git-tree-sha1 = "6842ce83a836fbbc0cfeca0b5a4de1a4dcbdb8d1"
675c712
< version = "2.2.7"
---
> version = "2.2.8"
748c785
< git-tree-sha1 = "30449ee12237627992a99d5e30ae63e4d78cd24a"
---
> git-tree-sha1 = "3bac05bc7e74a75fd9cba4295cde4045d9fe2386"
750c787
< version = "1.2.0"
---
> version = "1.2.1"
759c796
< git-tree-sha1 = "04bdff0b09c65ff3e06a05e3eb7b120223da3d39"
---
> git-tree-sha1 = "0e7508ff27ba32f26cd459474ca2ede1bc10991f"
761c798
< version = "1.4.0"
---
> version = "1.4.1"
771c808
< git-tree-sha1 = "c60ec5c62180f27efea3ba2908480f8055e17cee"
---
> git-tree-sha1 = "5165dfb9fd131cf0c6957a3a7605dede376e7b63"
773c810
< version = "1.1.1"
---
> version = "1.2.0"
895c932
< git-tree-sha1 = "a1f34829d5ac0ef499f6d84428bd6b4c71f02ead"
---
> git-tree-sha1 = "cb76cf677714c095e535e3501ac7954732aeea2d"
897c934
< version = "1.11.0"
---
> version = "1.11.1"
927,928c964
< deps = ["Random", "Test"]
< git-tree-sha1 = "9a6ae7ed916312b41236fcef7e0af564ef934769"
---
> git-tree-sha1 = "1fbeaaca45801b4ba17c251dd8603ef24801dd84"
930c966,970
< version = "0.9.13"
---
> version = "0.10.2"
> weakdeps = ["Random", "Test"]
>
>     [deps.TranscodingStreams.extensions]
>     TestExt = ["Test", "Random"]
987a1028,1033
> [[deps.libevent_jll]]
> deps = ["Artifacts", "JLLWrappers", "Libdl", "OpenSSL_jll"]
> git-tree-sha1 = "f04ec6d9a186115fb38f858f05c0c4e1b7fc9dcb"
> uuid = "1080aeaf-3a6a-583e-a51c-c537b09f60ec"
> version = "2.1.13+1"
>
996a1043,1048
>
> [[deps.prrte_jll]]
> deps = ["Artifacts", "Hwloc_jll", "JLLWrappers", "Libdl", "PMIx_jll", "libevent_jll"]
> git-tree-sha1 = "5adb2d7a18a30280feb66cad6f1a1dfdca2dc7b0"
> uuid = "eb928a42-fffd-568d-ab9c-3f5d54fc65b9"
> version = "3.0.2+0"

There are quite a few differences to some suspicious packages (eg those involved in LLVM) so...

(PS, is there a better way to compare Manifests? I wonder.)

It might not be anything to do with our code. Nevertheless, @navidcy and I combed through the git blame for two files: output_construction.jl and computed_field.jl. This line was changed

@apply_regionally boundary_conditions = FieldBoundaryConditions(indices, boundary_conditions)

but... I tested this by changing just that line back to the 0.85 version, and still hit the very long compile time.

@jagoosw
Copy link
Collaborator

jagoosw commented Nov 13, 2023

Just to add to this, I started going through manually installing the version of packages in the Oceananigans manifest to try and weed out which one it was and none of the suspicious ones like LLVM made a difference. I didn't get round to trying them all.

@glwagner
Copy link
Member

glwagner commented Nov 13, 2023

Thanks, that's helpful @jagoosw. Just one more thought... I realized after I did the testing for my previous post that the hang occurs at "Initializing simulation...". This implies that the problem isn't with any constructors (eg the Field constructor above) but rather the actual computations, probably.

A big change from 0.85 (which occurred in 0.88) is the introduction of the KernelParameters abstraction for offsetting indices within kernels, used here:

parameters = KernelParameters(size(comp), map(offset_index, comp.indices))

and I think other places, which @simone-silvestri can advise.

KernelParameters extends some KernelAbstractions functionality in a non-trivial way I think. Maybe there are some things we can improve there:

https://github.com/CliMA/Oceananigans.jl/blob/main/src/Utils/kernel_launching.jl

Even if the issue is fixed on 1.10, I think we still ought to understand this problem better since it might come back in the future (things like this often do...)

@navidcy
Copy link
Collaborator

navidcy commented Mar 11, 2024

What's the status of this issue?

@johnryantaylor
Copy link
Contributor

I just tested this with Julia v1.10 and Oceananigans v0.90.11 and the problem seems to have gone away. A simulation that had taken 18 minutes to initialize now takes about 20 seconds! I think we can close this issue now, but I'm still not sure what the underlying issue was, so something to keep in mind as @glwagner says above.

@navidcy
Copy link
Collaborator

navidcy commented Mar 20, 2024

I'm delighted to hear about the speedup. Still also wondering what was the culprit.

@navidcy navidcy closed this as completed Mar 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance 🏍️ So we can get the wrong answer even faster
Projects
None yet
Development

No branches or pull requests

5 participants