Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

General-purpose batched tridiagonal solver #533

Merged
merged 30 commits into from
Dec 11, 2019

Conversation

ali-ramadhan
Copy link
Member

@ali-ramadhan ali-ramadhan commented Nov 25, 2019

This PR adds a general-purpose batched tridiagonal solver which we can use for vertically stretched pressure solves, implicit vertical diffusion, sub-stepping of atmospheric acoustic waves, etc.

The coefficients and right-hand-side can be specified as a 1D array (shared by all the tridiagonal systems), a 3D array (different for each tridiagonal system), or a function (coefficient is calculated on the fly).

Should probably add proper docstrings before merging.

The vertically stretched pressure solver (PR #306) can be built on top of this batched tridiagonal solver.

@ali-ramadhan ali-ramadhan added feature 🌟 Something new and shiny numerics 🧮 So things don't blow up and boil the lobsters alive labels Nov 25, 2019
@codecov
Copy link

codecov bot commented Dec 2, 2019

Codecov Report

Merging #533 into master will increase coverage by 0.37%.
The diff coverage is 93.93%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #533      +/-   ##
==========================================
+ Coverage   71.09%   71.47%   +0.37%     
==========================================
  Files          69       70       +1     
  Lines        1958     1991      +33     
==========================================
+ Hits         1392     1423      +31     
- Misses        566      568       +2
Impacted Files Coverage Δ
src/Solvers/Solvers.jl 100% <ø> (ø) ⬆️
src/Solvers/batched_tridiagonal_solver.jl 100% <100%> (ø)
src/utils.jl 71.29% <66.66%> (-0.28%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cdf5091...3f82f44. Read the comment docs.

@ali-ramadhan
Copy link
Member Author

@glwagner @suyashbire1 This PR should be ready for review. Let me know what you guys think.

Right now it's a pretty standalone solver but I'd like to convert the example in PR #306 to a pressure solver that uses BatchedTridiagonalSolver so I can test the vertically stretched grid in PR #543.

Future improvements for solving batched tridiagonal systems on the GPU:

  1. Instead of using a 3D temporary array to store intermediate computations, can probably recompute the coefficients on the fly. I initially thought we just needed a 1D temporary array but the GPU needs a 3D array so threads aren't reading/writing to the same locations.
  2. Once vertically stretched grid tests pass, should probably upgrade to a faster TDMA algorithm courtesy of @maleadt. See https://gist.github.com/maleadt/1ec91b3b12ede9898958c95596cabe8b

@ali-ramadhan
Copy link
Member Author

GitLab CI seems to have crapped out but GPU tests pass on Cyclops: https://gist.github.com/ali-ramadhan/ec039bd71c21daff23c140b8aec180eb

@maleadt
Copy link
Collaborator

maleadt commented Dec 4, 2019

GitLab CI seems to have crapped out but GPU tests pass on Cyclops: https://gist.github.com/ali-ramadhan/ec039bd71c21daff23c140b8aec180eb

There's some people that are heavily using GPUs on cyclops, resulting in OOMs even before we allocate anything (hence the vague error code). You can use the change from JuliaGPU/CuArrays.jl#526 to select a device more intelligently (drop the thorough check if you don't care about compute capability -- you probably don't).

@ali-ramadhan
Copy link
Member Author

@maleadt Oh that's awesome, thanks for the update! So we just update to CuArrays#master and the GPU device with the most free memory gets automatically selected or do we have to manually select a device during testing like CuArrays

candidates = [(dev=dev,
                cap=capability(dev),
                mem=CuContext(ctx->CUDAdrv.available_memory(), dev))
              for dev in devices()]

thorough = parse(Bool, get(ENV, "CI_THOROUGH", "false"))
if thorough
    sort!(candidates, by=x->(x.cap, x.mem))
else
    sort!(candidates, by=x->x.mem)
end
pick = last(candidates)
device!(pick.dev)

@maleadt
Copy link
Collaborator

maleadt commented Dec 4, 2019

do we have to manually select a device during testing like CuArrays

Like that. By default, CuArrays follows the default CUDA order, I think it only makes sense for CI to pick a device with as much memory as possible.

@ali-ramadhan
Copy link
Member Author

Ah thanks for the tip @maleadt! I'll make a PR to improve the GPU CI.

Copy link
Member

@glwagner glwagner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good, and nice code. I'm impressed.

src/Solvers/batched_tridiagonal_solver.jl Outdated Show resolved Hide resolved
ϕ[i, j, 1] = f₁ / β

@unroll for k = 2:Nz
cₖ₋₁ = get_coefficient(c, i, j, k-1, grid, p)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These symbols don't render on github; all I see is cₖ₋₁, for example.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah hmmm it's just subscript k-1. Looks fine on my browser so must be font-dependent.

Could change to superscript as we seem to have had fewer issues with superscript in the past.

It's pretty low-level/private code so I'll leave it with subscripts for now but if becomes an issue outside of GitHub once more then we can change it.

image

test/test_solvers.jl Outdated Show resolved Hide resolved

solve_batched_tridiagonal_system!(ϕ, arch, btsolver)

return ϕ[:] ≈ ϕ_correct
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need the [:]?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ϕ_correct is a 1D array because backslash returns a vector while ϕ is a 1×1×Nz array so I use ϕ[:] to flatten it to a 1D vector for to work.

@ali-ramadhan ali-ramadhan merged commit b1fe0ab into master Dec 11, 2019
@ali-ramadhan ali-ramadhan deleted the ar/batched-tridiagonal-solver branch December 11, 2019 11:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature 🌟 Something new and shiny numerics 🧮 So things don't blow up and boil the lobsters alive
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants