-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fill velocity halos in a single pass for ConformalCubedSphereGrid #3201
Comments
Is this task required to complete the cubed sphere, or should we regard it as an optimization that's important for performance but not functionality? |
It's a "performance" task really but I have the gut feeling that it might be impeding performance so much that we won't be able to consider the cubed sphere done if we don't deal with this. So probably good idea to leave it in the milestone of global simulation using cubed sphere as is now? |
"Done" isn't very precise since the cubed sphere will never be "done". But perhaps we can put a number on performance for the first milestone, which will allow us to conclude whether we need this optimization or not. Can you explain where the gut feeling comes from? Will filling halos be so expensive even on just one GPU, or is this a distributed problem? Currently, 1/4 degree is performant on one GPU. |
True. Ideally we want to be close to the scalings/performance we got with lat-lon grid? That’s perhaps not feasible..? I don’t know how close is good enough tho.
Well at least some gut feeling comes from that am pretty sure that it can be reduced in half by getting done in a single pass. But you are on point, I don’t have a gut feeling regarding how much impact the two passes have on performance. |
We expect to be at lower performance. For that reason we have dedicated two independent milestones to the cubed sphere. The first milestone is rather susinct "complete the cubed sphere implementation". The second milestone pertain to performance: "achieve 10 SYPD at 25 km resolution". I think this is nice, because we want to separate tasks into ones that are required for correct functionality, versus tasks that are oriented towards performance rather than correctness. |
I think high performance at 25 km resolution will prove difficult also because we are effectively dividing our kernel size by 1/6 (unless we figure out how to coalesce kernels across panels). On a large GPU this will lead to performance degredation at 25 km resolution, because even a single-panel kernel covering the whole globe at 25 km barely saturates one GPU. Recovering that performance for multi-region simulations may be difficult, especially in the face of the added complexity of distribution across multiple GPUs. |
closing this; closed by #3488 |
At the moment we fill the velocity halos with multiple passes, e.g.,
Oceananigans.jl/validation/multi_region/multi_region_cubed_sphere.jl
Lines 115 to 119 in 2447ea7
We should utilize the grid's connectivity and develop a method to fill the velocity halos that only requires one pass. This is very important for performance and scaling on distributed systems.
The text was updated successfully, but these errors were encountered: