Skip to content
Victor A. P. Magri edited this page Aug 2, 2024 · 12 revisions

GPU support for solvers and preconditioners

The table below lists which solvers and preconditioners in HYPRE are GPU-enabled. Note that not all options for each solver or preconditioner have been ported to run on GPUs. "N/A" indicates that a method is not accessible via the corresponding interface.

Preconditioners

Preconditioner Struct Interface SStruct Interface IJ Interface
BoomerAMG N/A Yes Yes
MGR N/A N/A Yes
AMS N/A N/A Yes
ADS N/A N/A Yes
FSAI N/A N/A Yes
ILU N/A N/A Yes
Hybrid N/A N/A Yes
PFMG Yes N/A N/A
SMG Yes N/A N/A
Split N/A Yes N/A
ParaSails N/A N/A No
Euclid N/A N/A No
ParILUT N/A N/A No

Solvers

Solver Struct Interface SStruct Interface IJ Interface
PCG Yes Yes Yes
GMRES Yes Yes Yes
FlexGMRES Yes Yes Yes
LGMRES Yes Yes Yes
BiCGSTAB Yes Yes Yes

Memory locations and execution policies

Basic information on how to compile and run on GPUs can be found in the Users Manual. This document is intended to provide additional details.

Hypre provides two user-level memory locations, HYPRE_MEMORY_HOST and HYPRE_MEMORY_DEVICE, where HYPRE_MEMORY_HOST is always the CPU memory while HYPRE_MEMORY_DEVICE can be mapped to different memory spaces based on the configure options of hypre. When built with --with-cuda or --with-device-openmp, HYPRE_MEMORY_DEVICE is the GPU device memory, and when built with additionally --enable-unified-memory, it is the GPU unified memory (UM). For a non-GPU build, HYPRE_MEMORY_DEVICE is also mapped to the CPU memory. The default memory location of hypre's matrix and vector objects is HYPRE_MEMORY_DEVICE, which can be changed by HYPRE_SetMemoryLocation(...).

The execution policies define the platform of running computations based on the memory locations of participating objects. The default policy is HYPRE_EXEC_HOST, i.e., executing on the host if the objects are accessible from the host. It can be adjusted by HYPRE_SetExecutionPolicy(...). Clearly, this policy only has effect to objects on UM, since UM is accessible from both CPUs and GPUs.

Current best practices configuration settings for SMG/PFMG on GPUs

No special changes to the solvers' interfaces need to be made other than to give GPU memory addresses for all input pointers.

Current best practices configuration settings for BoomerAMG on GPUs

Current AMG setup and solve parameters that have GPU support are listed as follows:

  • AMG setup

    • Coarsening algorithm: PMIS (8) and aggressive coarsening
    • Interpolation algorithms: direct (3), BAMG-direct (15), extended+i (6), extended (14) and extended+e (18). Second-stage interpolation with aggressive coarsening: extended (5) and extended+e (7)
    • RAP: 2-multiplication R(AP), 1-multiplication RAP
  • AMG solve

    • Smoothers: Jacobi (7), two-stage Gauss-Seidel (11, 12), l1-Jacobi (18), and Chebyshev (16). Relaxation order can be lexicographic order, or C/F for (7) and (18).
    • Matrix-by-vector: save local transposes of P to explicitly multiply with P^{T}

A sample code of setting up IJ matrix A and solve Ax=b using AMG-preconditioned CG is shown below.

 cudaSetDevice(device_id); /* GPU binding */
 ...
 HYPRE_Init(); /* must be the first HYPRE function call */
 ...
 /* AMG in GPU memory (default) */
 HYPRE_SetMemoryLocation(HYPRE_MEMORY_DEVICE);
 /* setup AMG on GPUs */
 HYPRE_SetExecutionPolicy(HYPRE_EXEC_DEVICE);
 /* use hypre's SpGEMM instead of cuSPARSE */
 HYPRE_SetSpGemmUseCusparse(FALSE);
 /* use GPU RNG */
 HYPRE_SetUseGpuRand(TRUE);
 if (useHypreGpuMemPool)
 {
    /* use hypre's GPU memory pool */
    HYPRE_SetGPUMemoryPoolSize(bin_growth, min_bin, max_bin, max_bytes);
 }
 else if (useUmpireGpuMemPool)
 {
    /* or use Umpire GPU memory pool */
    HYPRE_SetUmpireUMPoolName("HYPRE_UM_POOL_TEST");
    HYPRE_SetUmpireDevicePoolName("HYPRE_DEVICE_POOL_TEST");
 }
 ...
 /* setup IJ matrix A */
 HYPRE_IJMatrixCreate(comm, first_row, last_row, first_col, last_col, &ij_A);
 HYPRE_IJMatrixSetObjectType(ij_A, HYPRE_PARCSR);
 /* GPU pointers; efficient in large chunks */
 HYPRE_IJMatrixAddToValues(ij_A, num_rows, num_cols, rows, cols, data);
 HYPRE_IJMatrixAssemble(ij_A);
 HYPRE_IJMatrixGetObject(ij_A, (void **) &parcsr_A);
 ...
 /* setup AMG */
 HYPRE_ParCSRPCGCreate(comm, &solver);
 HYPRE_BoomerAMGCreate(&precon);
 HYPRE_BoomerAMGSetRelaxType(precon, rlx_type); /* 7, 18, 11, 12, (3, 4, 6) */
 HYPRE_BoomerAMGSetRelaxOrder(precon, FALSE); /* must be false */
 HYPRE_BoomerAMGSetCoarsenType(precon, coarsen_type); /* 8 */
 HYPRE_BoomerAMGSetInterpType(precon, interp_type); /* 3, 15, 6, 14, 18 */
 HYPRE_BoomerAMGSetAggNumLevels(precon, agg_num_levels);
 HYPRE_BoomerAMGSetAggInterpType(precon, agg_interp_type); /* 5 or 7 */
 HYPRE_BoomerAMGSetKeepTranspose(precon, TRUE); /* keep transpose to avoid SpMTV */
 HYPRE_BoomerAMGSetRAP2(precon, FALSE); /* RAP in two multiplications (default: FALSE) */
 HYPRE_ParCSRPCGSetPrecond(solver, HYPRE_BoomerAMGSolve, HYPRE_BoomerAMGSetup, precon);
 HYPRE_PCGSetup(solver, parcsr_A, b, x);
 ...
 /* solve */
 HYPRE_PCGSolve(solver, parcsr_A, b, x);
 ...
 HYPRE_Finalize(); /* must be the last HYPRE function call */

Build hypre with Umpire

Add the following configure options. The default is to use Umpire pooling allocator for GPU device and unified memory.

--with-umpire --with-umpire-include=/path-of-umpire-install/include 
--with-umpire-lib-dirs=/path-of-umpire-install/lib 
--with-umpire-libs=umpire