General-purpose batched tridiagonal solver #533

ali-ramadhan · 2019-11-25T12:42:06Z

This PR adds a general-purpose batched tridiagonal solver which we can use for vertically stretched pressure solves, implicit vertical diffusion, sub-stepping of atmospheric acoustic waves, etc.

The coefficients and right-hand-side can be specified as a 1D array (shared by all the tridiagonal systems), a 3D array (different for each tridiagonal system), or a function (coefficient is calculated on the fly).

Should probably add proper docstrings before merging.

The vertically stretched pressure solver (PR #306) can be built on top of this batched tridiagonal solver.

codecov · 2019-12-02T20:17:22Z

Codecov Report

Merging #533 into master will increase coverage by 0.37%.
The diff coverage is 93.93%.

@@            Coverage Diff             @@
##           master     #533      +/-   ##
==========================================
+ Coverage   71.09%   71.47%   +0.37%     
==========================================
  Files          69       70       +1     
  Lines        1958     1991      +33     
==========================================
+ Hits         1392     1423      +31     
- Misses        566      568       +2

Impacted Files	Coverage Δ
src/Solvers/Solvers.jl	`100% <ø> (ø)`	⬆️
src/Solvers/batched_tridiagonal_solver.jl	`100% <100%> (ø)`
src/utils.jl	`71.29% <66.66%> (-0.28%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cdf5091...3f82f44. Read the comment docs.

ali-ramadhan · 2019-12-03T22:14:56Z

@glwagner @suyashbire1 This PR should be ready for review. Let me know what you guys think.

Right now it's a pretty standalone solver but I'd like to convert the example in PR #306 to a pressure solver that uses BatchedTridiagonalSolver so I can test the vertically stretched grid in PR #543.

Future improvements for solving batched tridiagonal systems on the GPU:

Instead of using a 3D temporary array to store intermediate computations, can probably recompute the coefficients on the fly. I initially thought we just needed a 1D temporary array but the GPU needs a 3D array so threads aren't reading/writing to the same locations.
Once vertically stretched grid tests pass, should probably upgrade to a faster TDMA algorithm courtesy of @maleadt. See https://gist.github.com/maleadt/1ec91b3b12ede9898958c95596cabe8b

ali-ramadhan · 2019-12-04T01:21:48Z

GitLab CI seems to have crapped out but GPU tests pass on Cyclops: https://gist.github.com/ali-ramadhan/ec039bd71c21daff23c140b8aec180eb

maleadt · 2019-12-04T14:54:17Z

GitLab CI seems to have crapped out but GPU tests pass on Cyclops: https://gist.github.com/ali-ramadhan/ec039bd71c21daff23c140b8aec180eb

There's some people that are heavily using GPUs on cyclops, resulting in OOMs even before we allocate anything (hence the vague error code). You can use the change from JuliaGPU/CuArrays.jl#526 to select a device more intelligently (drop the thorough check if you don't care about compute capability -- you probably don't).

ali-ramadhan · 2019-12-04T19:28:24Z

@maleadt Oh that's awesome, thanks for the update! So we just update to CuArrays#master and the GPU device with the most free memory gets automatically selected or do we have to manually select a device during testing like CuArrays

candidates = [(dev=dev,
                cap=capability(dev),
                mem=CuContext(ctx->CUDAdrv.available_memory(), dev))
              for dev in devices()]

thorough = parse(Bool, get(ENV, "CI_THOROUGH", "false"))
if thorough
    sort!(candidates, by=x->(x.cap, x.mem))
else
    sort!(candidates, by=x->x.mem)
end
pick = last(candidates)
device!(pick.dev)

maleadt · 2019-12-04T20:53:27Z

do we have to manually select a device during testing like CuArrays

Like that. By default, CuArrays follows the default CUDA order, I think it only makes sense for CI to pick a device with as much memory as possible.

ali-ramadhan · 2019-12-06T13:24:03Z

Ah thanks for the tip @maleadt! I'll make a PR to improve the GPU CI.

glwagner

Looks pretty good, and nice code. I'm impressed.

src/Solvers/batched_tridiagonal_solver.jl

glwagner · 2019-12-09T13:06:40Z

src/Solvers/batched_tridiagonal_solver.jl

+            ϕ[i, j, 1] = f₁ / β
+
+            @unroll for k = 2:Nz
+                cₖ₋₁ = get_coefficient(c, i, j, k-1, grid, p)


These symbols don't render on github; all I see is cₖ₋₁, for example.

Ah hmmm it's just subscript k-1. Looks fine on my browser so must be font-dependent.

Could change to superscript as we seem to have had fewer issues with superscript in the past.

It's pretty low-level/private code so I'll leave it with subscripts for now but if becomes an issue outside of GitHub once more then we can change it.

test/test_solvers.jl

glwagner · 2019-12-09T13:09:07Z

test/test_solvers.jl

+
+    solve_batched_tridiagonal_system!(ϕ, arch, btsolver)
+
+    return ϕ[:] ≈ ϕ_correct


Why do we need the [:]?

ϕ_correct is a 1D array because backslash returns a vector while ϕ is a 1×1×Nz array so I use ϕ[:] to flatten it to a 1D vector for ≈ to work.

ali-ramadhan added 11 commits November 19, 2019 19:27

Prototyping a batched tridiagonal solver

bc1093c

Plain TDMA algorithm

f5d7fbd

Small fixes and add to Solvers module.

adc202e

Add test for 1D tridiagonal system.

81937f4

Better constructor for BatchedTridiagonalSolver

d6ce352

We only need a single solve_batched_tridiagonal_system! method.

f9a104f

Batched tridiagonal solver test with multiple right hand sides.

a59fcd3

BatchedTridiagonalSolver stores grid and params now.

24a4ed9

Test BatchedTridiagonalSolver with functions for coefficients

04968b4

Switch to GPUifyLoops kernel for solve_batched_tridiagonal_system!

247ca39

Test BatchedTridiagonalSolver on GPU as well.

a88d4a5

ali-ramadhan added feature 🌟 Something new and shiny numerics 🧮 So things don't blow up and boil the lobsters alive labels Nov 25, 2019

Merge branch 'master' into ar/batched-tridiagonal-solver

6d391a5

ali-ramadhan added 13 commits December 2, 2019 15:40

Docstrings for batched tridiagonal solver.

a322ccb

array_type utility that makes it easier to juggle Arrays and CuArrays

49b4c60

Ensure that temporary solver array is CuArray on GPU.

028ebc4

Gotta import array_type

6c603ef

Update CUDA packages and GPUifyLoops

59066f6

Gotta split the solver into a wrapper and kernel

95157d6

Forgot to pass arch to BatchedTridiagonalSolver

17ebe6f

Split CPU and GPU tests to use different sizes

c23ef19

Fix a GPU test

5640c09

Merge branch 'master' into ar/batched-tridiagonal-solver

6d931a7

Gotta make the temporary storage a 3D array to avoid GPU race conditions

f4f6f3e

Had to add Test as a dependency for some reason.

d663dfd

Remerge CPU and GPU tests for Batched tridiagonal solver.

5388806

ali-ramadhan requested a review from glwagner December 3, 2019 22:01

Forgot to nuke test_poisson_solvers.jl

7e5b3a8

ali-ramadhan requested a review from suyashbire1 December 3, 2019 22:06

No idea how this typo made it through.

d644f16

glwagner approved these changes Dec 9, 2019

View reviewed changes

ali-ramadhan added 3 commits December 10, 2019 21:23

Improve BatchedTridiagonalSolver docstring.

928f5fe

Perform backslash Tridiagonal solves on CPU to avoid GPU scalar ops.

b7bfd31

Merge branch 'master' into ar/batched-tridiagonal-solver

3f82f44

ali-ramadhan merged commit b1fe0ab into master Dec 11, 2019

ali-ramadhan deleted the ar/batched-tridiagonal-solver branch December 11, 2019 11:14

ali-ramadhan mentioned this pull request Jan 30, 2020

Batched tridiagonal solver? ali-ramadhan/Atmosfoolery.jl#17

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

General-purpose batched tridiagonal solver #533

General-purpose batched tridiagonal solver #533

ali-ramadhan commented Nov 25, 2019 •

edited

Loading

codecov bot commented Dec 2, 2019 •

edited

Loading

ali-ramadhan commented Dec 3, 2019

ali-ramadhan commented Dec 4, 2019

maleadt commented Dec 4, 2019

ali-ramadhan commented Dec 4, 2019

maleadt commented Dec 4, 2019

ali-ramadhan commented Dec 6, 2019

glwagner left a comment

glwagner Dec 9, 2019

ali-ramadhan Dec 11, 2019

glwagner Dec 9, 2019

ali-ramadhan Dec 11, 2019


		solve_batched_tridiagonal_system!(ϕ, arch, btsolver)

		return ϕ[:] ≈ ϕ_correct

General-purpose batched tridiagonal solver #533

General-purpose batched tridiagonal solver #533

Conversation

ali-ramadhan commented Nov 25, 2019 • edited Loading

codecov bot commented Dec 2, 2019 • edited Loading

Codecov Report

ali-ramadhan commented Dec 3, 2019

ali-ramadhan commented Dec 4, 2019

maleadt commented Dec 4, 2019

ali-ramadhan commented Dec 4, 2019

maleadt commented Dec 4, 2019

ali-ramadhan commented Dec 6, 2019

glwagner left a comment

Choose a reason for hiding this comment

glwagner Dec 9, 2019

Choose a reason for hiding this comment

ali-ramadhan Dec 11, 2019

Choose a reason for hiding this comment

glwagner Dec 9, 2019

Choose a reason for hiding this comment

ali-ramadhan Dec 11, 2019

Choose a reason for hiding this comment

ali-ramadhan commented Nov 25, 2019 •

edited

Loading

codecov bot commented Dec 2, 2019 •

edited

Loading