-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow initialization after updating to v0.90.0 #3381
Comments
Interesting! Which version were you before updating? |
When you say "initialization" you mean the time between when you call |
I used v0.88.0 before. But even if I pin Oceananigans to v0.88.0, the issue still occurs, and the contents of |
I think so, it corresponds to this line in the log for example:
|
That suggests that this could be an issue with your environment... What other packages are you using? What can happen is that a package got upgraded when you bumped up to 0.90.0, but then because compat was satisfied with that upgraded package, it did not get changed when you subsequently bumped down to 0.88.0. Upgrading is conservative, it doesn't happen unless you ask for it. |
Can you post the output of using Pkg; Pkg.status() |
This issue occurs in a clean Julia environment. No other package is added explicitly. |
I'll try to reproduce it! |
|
OK, I see. Pretty clean environment! :) |
Possibly it matters what is in the global environment, but I'm not sure... |
hm.. |
So I did a bench. I run this: using Oceananigans
using Oceananigans.Units
grid = RectilinearGrid(CPU(),
size = (3, 3, 3),
extent = (1, 1, 1),
topology = (Periodic, Bounded, Bounded))
model = HydrostaticFreeSurfaceModel(; grid)
Δt=20minutes
simulation = Simulation(model, Δt=20minutes, stop_time=4Δt)
u, v, w = model.velocities
ζ = ∂x(v) - ∂y(u)
fields_slice = Dict("u" => u, "v" => v, "w" => w, "ζ" => ζ)
simulation.output_writers[:top] = NetCDFOutputWriter(model, fields_slice;
filename = "mwe.nc",
schedule = TimeInterval(0.5day),
overwrite_existing = true,
indices = (:, :, grid.Nz))
@time run!(simulation) on my laptop on an evnironment with only Oceananigans. Just for the record, my general env contains: (@v1.9) pkg> st
Status `~/.julia/environments/v1.9/Project.toml`
[6e4b80f9] BenchmarkTools v1.3.2
[13f3f980] CairoMakie v0.10.12
[e9467ef8] GLMakie v0.8.12
[db073c08] GeoMakie v0.5.1
[7073ff75] IJulia v1.24.2
[12c4ca38] Imaginocean v0.1.0 `https://github.com/navidcy/Imaginocean.jl#main`
[85f8d34a] NCDatasets v0.13.1
[5fb14364] OhMyREPL v0.5.23
[c3e4b0f8] Pluto v0.19.32
[295af30f] Revise v3.5.7 Now on an environment with Oceananigans v0.89.3 I get: julia> @time run!(simulation)
[ Info: Initializing simulation...
[ Info: ... simulation initialization complete (18.715 minutes)
[ Info: Executing initial time step...
[ Info: ... initial time step complete (7.933 seconds).
[ Info: Simulation is stopping after running for 18.861 minutes.
[ Info: Simulation time 1.333 hours equals or exceeds stop time 1.333 hours.
1134.663423 seconds (2.18 G allocations: 1.143 TiB, 10.69% gc time, 100.03% compilation time) while with Oceananigans v0.90.1 I get: julia> @time run!(simulation)
[ Info: Initializing simulation...
[ Info: ... simulation initialization complete (13.845 minutes)
[ Info: Executing initial time step...
[ Info: ... initial time step complete (8.351 seconds).
[ Info: Simulation is stopping after running for 13.998 minutes.
[ Info: Simulation time 1.333 hours equals or exceeds stop time 1.333 hours.
842.743291 seconds (2.18 G allocations: 1.143 TiB, 13.55% gc time, 100.04% compilation time) Seems that v0.90.1 is even slightly better? @zhihua-zheng, what do you reckon. Can you check what does your general Julia environment includes? |
I had a quick to at running this and got the same issue with v0.90.1 taking a very long time, what computers are you using? Perhaps this is an Apple Silicon problem? |
I'm on an Apple Silicon M1: julia> versioninfo()
Julia Version 1.9.3
Commit bed2cd540a (2023-08-24 14:43 UTC)
Build Info:
Note: This is an unofficial build, please report bugs to the project
responsible for this build and not to the Julia project unless you can
reproduce the issue using official builds available at https://julialang.org/downloads
Platform Info:
OS: macOS (arm64-apple-darwin22.6.0)
CPU: 10 × Apple M1 Max
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-14.0.6 (ORCJIT, apple-m1)
Threads: 8 on 8 virtual cores
Environment:
JULIA_EDITOR = code |
And with julia> Threads.nthreads()
8 |
My global Julia environment is empty, or maybe just with Oceananigans. Same issue on non-Apple computers. |
I'm puzzled, why can't I reproduce the issue?.... |
Wait, isn't your initialization time also very long with both versions of Oceananigans? Is that expected? |
Well, I don't know what is long or short (but I agree, O(10mins) seems bit long). |
Do you see a difference if you wrap |
Ah yes as noted here:
That's pretty bizarre. That makes me think it's some kind of weird interaction between https://julialang.org/blog/2020/08/invalidations/ and then the source code for |
@zhihua-zheng is this a non-issue with JLD2? |
Before I updated and induced the problem I was getting O(seconds) initialisation time on M1 Mac so I think you're experiencing the problem with both versions @navidcy I tried using JLD2 and got the same slow initialisation. Do they have a common dependency that does something to the field? |
Another thing, this is coming from |
They do... they both call Oceananigans.jl/src/OutputWriters/output_construction.jl Lines 32 to 47 in 7053657
But the code is identical for both until
Then if we have
whereas if you call Oceananigans.jl/src/Fields/field.jl Line 181 in 7053657
Of course, if you use Here's an idea. What if you execute |
Hmm I'll have a go at that tomorrow. Weirdly when I try and run the above script from the Oceananigans repo (i.e. not installed with Pkg) I do not get this problem. |
That might suggest its an interaction with another package, because when you use Oceananigans' repo you probably use the repo's Manifest.toml; otherwise you may be pulling in different packages. You can compare the Oceananigans Manifest to whatever Manifest gets used in the slow case. I think there may be tools for comparing Manifests? |
Another PhD student working with me ran into this problem of very long initialization times too. He was using a Windows laptop and, like @jagoosw, I have been able to reproduce it using an apple silicon mac. Also, we are using JLD2 instead of NetCDF, so I don't think the problem is specific to apple silicon or NetCDF. Also, the problem goes away when I downgrade to Oceananigans v0.85.0. Wrapping the output in Field() as @zhihua-zheng suggested does seem to help, but even when doing that, the startup is quite slow using v0.90.1 when complex diagnostics are calculated. Downgrading to v0.85.0 downgrades other dependencies, so its hard to tell where the problem is arising. In case its useful, here is the list of packages that are downgraded when I go from 0.90.1 to 0.85.0: |
Thank you all for your informative responses! I dug into this a bit... tl;dr It looks like the issue may be "fixed" on julia 1.10-beta3 (and there is a 1.10-rc1 now). BenchmarksUsing an empty
Way, way too long. (So it's good we have this issue.) But on julia 1.10-beta3 I get
much better. (Note that on 1.10 we get a lot of annoying warnings which is documented on #3374 and is relatively easily fixed.) Also, things are fine if I use the Oceananigans Manifest.toml, even with julia 1.9:
That's weird... What's the problem?We haven't figured it out. One clue could be that downgrading to 0.85.0 fixes the problem. For completeness, here's a Click me
There are quite a few differences to some suspicious packages (eg those involved in LLVM) so... (PS, is there a better way to compare Manifests? I wonder.) It might not be anything to do with our code. Nevertheless, @navidcy and I combed through the
but... I tested this by changing just that line back to the 0.85 version, and still hit the very long compile time. |
Just to add to this, I started going through manually installing the version of packages in the Oceananigans manifest to try and weed out which one it was and none of the suspicious ones like |
Thanks, that's helpful @jagoosw. Just one more thought... I realized after I did the testing for my previous post that the hang occurs at "Initializing simulation...". This implies that the problem isn't with any constructors (eg the A big change from 0.85 (which occurred in 0.88) is the introduction of the
and I think other places, which @simone-silvestri can advise.
https://github.com/CliMA/Oceananigans.jl/blob/main/src/Utils/kernel_launching.jl Even if the issue is fixed on 1.10, I think we still ought to understand this problem better since it might come back in the future (things like this often do...) |
What's the status of this issue? |
I just tested this with Julia v1.10 and Oceananigans v0.90.11 and the problem seems to have gone away. A simulation that had taken 18 minutes to initialize now takes about 20 seconds! I think we can close this issue now, but I'm still not sure what the underlying issue was, so something to keep in mind as @glwagner says above. |
I'm delighted to hear about the speedup. Still also wondering what was the culprit. |
With the recent updates of Oceananigans, I noticed an unusual behavior of the model that often leads to much slower initialization.
Below is a minimum working example that demonstrates the issue:
Running this code with Julia 1.9.3 and Oceananigans v0.90.0 gives an initialization time of ~ 15 minutes, much longer than common values of a few seconds. The same issue also appears on GPU.
This speed issue disappears either when
ζ = ∂x(v) - ∂y(u)
is replaced withζ = Field(∂x(v) - ∂y(u))
, or whenζ
is the only variable infields_slice
. However, as pointed out by @tomchor, wrapping outputs inField()
tends to waste more memory (a very precious resource on the GPU), so it may be worthwhile to investigate further why this happens.The text was updated successfully, but these errors were encountered: