Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: ICICLE msm integration #498

Merged
merged 27 commits into from
Mar 8, 2024
Merged

feat: ICICLE msm integration #498

merged 27 commits into from
Mar 8, 2024

Conversation

alxiong
Copy link
Contributor

@alxiong alxiong commented Feb 27, 2024

Description

closes: #490

Unit tests already test the correctness of the MSM results.
Benchmark can be run via cargo bench --bench msm --features "test-srs icicle"

  • Fixed cudatoolkit inside nix shell is incompatible with cuda driver problem (causing CudaErrorInsufficientDriver err)
  • Tweak around benchmark code and setup
  • Update Changelog

Benchmark

TL;DR: it's about 50x speedup compared to arkworks! 🎉

Current criterion benchmark:

MSM with arkworks/19    time:   [1.5497 s 1.5526 s 1.5554 s]
MSM with arkworks/20    time:   [2.8726 s 2.8801 s 2.8848 s]
MSM with arkworks/21    time:   [5.8268 s 5.8572 s 5.8847 s]
MSM with arkworks/22    time:   [11.618 s 11.641 s 11.665 s]

MSM with ICICLE/19      time:   [26.753 ms 26.841 ms 27.018 ms]
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high severe
MSM with ICICLE/20      time:   [48.716 ms 48.728 ms 48.737 ms]
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) low mild
MSM with ICICLE/21      time:   [87.649 ms 87.671 ms 87.697 ms]
MSM with ICICLE/22      time:   [164.33 ms 164.40 ms 164.45 ms]

Individual GPU-accelerated breakdown MSM(2^20)

Start:   Committing to polynomial of degree 1048576
··Start:   Type Conversion: ark->ICICLE: Group
··End:     Type Conversion: ark->ICICLE: Group .....................................11.500ms
··Start:   Load group elements: CPU->GPU
··End:     Load group elements: CPU->GPU ...........................................5.197ms
··Start:   Type Conversion: ark->ICICLE: Scalar
··End:     Type Conversion: ark->ICICLE: Scalar ....................................5.835ms
··Start:   Load scalars: CPU->GPU
··End:     Load scalars: CPU->GPU ..................................................2.624ms
··Start:   GPU-accelerated MSM
··End:     GPU-accelerated MSM .....................................................20.493ms
··Start:   Load MSM result GPU->CPU
··End:     Load MSM result GPU->CPU ................................................26.000ms
··Start:   Type Conversion: ICICLE->ark: Group
··End:     Type Conversion: ICICLE->ark: Group .....................................27.140µs
End:     Committing to polynomial of degree 1048576  ...............................84.879ms

note: the "GPU-accelerated MSM" is somewhat misleading, because we use non-blocking async MSM on GPU, the computation actually wasn't finished before our CPU moved on, thus part of "Load MSM result GPU->CPU" is "synchronizing the result on the cuda stream" which means wait for the work to finish. Since we know the loading a single projective group should be instant, the more accurate MSM computation time is 20.49 + 26 = 47 sec which aligns with the criterion output above.


Before we can merge this PR, please make sure that all the following items have been
checked off. If any of the checklist items are not applicable, please leave them but
write a little note why.

  • Targeted PR against correct branch (main)
  • Linked to GitHub issue with discussion and accepted design OR have an explanation in the PR that describes this work.
  • Wrote unit tests
  • Updated relevant documentation in the code
  • Added a relevant changelog entry to the Pending section in CHANGELOG.md
  • Re-reviewed Files changed in the GitHub PR explorer

@alxiong
Copy link
Contributor Author

alxiong commented Feb 27, 2024

Current benchmark doesn't look right, the difference is only 5x, we should expect at least >10x speedup. In initial exploration, we observed 200x differences!

Again could due to setup, or warm up (i believe criterion only warm up CPU, not GPU)

The number below is run on computing the same MSM (committing to the same polynomial) multiple times, and taking the average runtime. (instead of running multiple instances of MSM at the same time, which GPU should be better at?)

MSM with arkworks/12    time:   [20.128 ms 20.157 ms 20.189 ms]
MSM with arkworks/13    time:   [35.529 ms 35.653 ms 35.778 ms]
MSM with arkworks/14    time:   [67.961 ms 68.246 ms 68.623 ms]
MSM with arkworks/15    time:   [119.05 ms 119.43 ms 119.77 ms]
MSM with arkworks/16    time:   [216.39 ms 216.61 ms 216.91 ms]
MSM with arkworks/17    time:   [435.28 ms 435.69 ms 436.10 ms]
MSM with arkworks/18    time:   [820.43 ms 821.17 ms 822.05 ms]
MSM with arkworks/19    time:   [1.5854 s 1.5876 s 1.5902 s]
MSM with arkworks/20    time:   [2.9161 s 2.9178 s 2.9199 s]
MSM with arkworks/21    time:   [5.9400 s 5.9642 s 5.9849 s]
MSM with arkworks/22    time:   [11.604 s 11.664 s 11.716 s]

MSM with ICICLE/12      time:   [15.503 ms 15.512 ms 15.523 ms]
MSM with ICICLE/13      time:   [17.806 ms 17.834 ms 17.862 ms]
MSM with ICICLE/14      time:   [22.151 ms 22.192 ms 22.222 ms]
MSM with ICICLE/15      time:   [30.990 ms 31.176 ms 31.352 ms]
MSM with ICICLE/16      time:   [51.316 ms 51.570 ms 51.919 ms]
MSM with ICICLE/17      time:   [89.319 ms 89.740 ms 90.149 ms]
MSM with ICICLE/18      time:   [160.70 ms 161.77 ms 163.22 ms]
MSM with ICICLE/19      time:   [318.50 ms 319.18 ms 319.94 ms]
MSM with ICICLE/20      time:   [635.62 ms 637.99 ms 640.69 ms]
MSM with ICICLE/21      time:   [1.2574 s 1.2616 s 1.2680 s]
MSM with ICICLE/22      time:   [2.5118 s 2.5236 s 2.5354 s]

@philippecamacho philippecamacho mentioned this pull request Feb 27, 2024
6 tasks
end_timer!(conv_time);

// load them on host first
let bases = HostOrDeviceSlice::Host(bases);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to convert and load SRS bases everytime during committing? Can we just load it once and reuse it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another point: in VID, the polynomial degree is much lower. Thus we also won't need to upload too many powers_of_g elements to GPUs.

Copy link
Contributor Author

@alxiong alxiong Feb 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to convert and load SRS bases everytime during committing? Can we just load it once and reuse it?

this is exactly what I meant on being unsure about the API boundary

ultimately we probably would not use this function in a standalone way, instead, we will pick it apart, and flesh out the full steps inside vid's function, to have fine-grained control on when are data being loaded, and maximize reuse.

imo, we would modify our struct Advz and store Option<&HostOrDeviceSlice<T>> that stores cudamem ref to the srs loaded in the previous run.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I feel it might be better to split it at the PCS level rather than in the VID code. Because the PCS commit function itself shouldn't upload SRS everytime. Is it possible that we add another API, load_srs_to_gpu(), and the commit_in_gpu function can take &HostOrDevicesSlice<T> and no longer need to upload srs, and it will return an error if SRS hasn't been uploaded?

In the VID code, it will call load_srs_to_gpu and commit_in_gpu when needed.

Copy link
Contributor Author

@alxiong alxiong Feb 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

your suggestion makes more sense! I'll implement that!

to go even further, I would also separate out load_poly_coeffs_on_gpu(), and during commit_on_gpu() only takes in pointers, this is because we could be reusing the coeffs from on-gpu FFT in the future. This would account for that flexibility.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a slight annoyance is lifetime due to HostOrDeviceSlice<'a, T> as a return parameter, I've been fighting this today. I don't want to assign 'static for it, as we shouldn't expect the reference to live that long, we only want it to be as long as the cuda pointer being active.

I'll figure sth out, but just to share some engineering journey.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in 514d479

@alxiong
Copy link
Contributor Author

alxiong commented Mar 2, 2024

I'm so annoyed that I can't get my nix shell to work. outside it, i can compile and run icicle code, but inside I can only successfully compile, but any invocation of cuda FFI code would failed.

$ nix develop .#cudaShell
$ cargo test gpu --release --features icicle
runtime error trace
---- pcs::univariate_kzg::tests::gpu_end_to_end_test stdout ----
thread 'pcs::univariate_kzg::tests::gpu_end_to_end_test' panicked at primitives/src/pcs/univariate_kzg/mod.rs:1026:14:
test failed for bn254: IcicleError("IcicleError { icicle_error_code: InternalCudaError, cuda_error: Some(cudaErrorInsufficientDriver), reason: Some(\"Runtime CUDA error.\") }")
stack backtrace:
   0:     0x555555b12366 - std::backtrace_rs::backtrace::libunwind::trace::hbee8a7973eeb6c93
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/../../backtrace/src/backtrace/libunwind.rs:104:5
   1:     0x555555b12366 - std::backtrace_rs::backtrace::trace_unsynchronized::hc8ac75eea3aa6899
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:     0x555555b12366 - std::sys_common::backtrace::_print_fmt::hc7f3e3b5298b1083
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/sys_common/backtrace.rs:68:5
   3:     0x555555b12366 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::hbb235daedd7c6190
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/sys_common/backtrace.rs:44:22
   4:     0x555555b3b720 - core::fmt::rt::Argument::fmt::h76c38a80d925a410
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/core/src/fmt/rt.rs:142:9
   5:     0x555555b3b720 - core::fmt::write::h3ed6aeaa977c8e45
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/core/src/fmt/mod.rs:1120:17
   6:     0x555555b0fbff - std::io::Write::write_fmt::h1299aa7741865f2b
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/io/mod.rs:1810:15
   7:     0x555555b12144 - std::sys_common::backtrace::_print::h5d645a07e0fcfdbb
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/sys_common/backtrace.rs:47:5
   8:     0x555555b12144 - std::sys_common::backtrace::print::h85035a511aafe7a8
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/sys_common/backtrace.rs:34:9
   9:     0x555555b13eb7 - std::panicking::default_hook::{{closure}}::hcce8cea212785a25
  10:     0x555555b13b9d - std::panicking::default_hook::hf5fcb0f213fe709a
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/panicking.rs:289:9
  11:     0x555555a8d207 - <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call::h60265c2dfa87ee34
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/alloc/src/boxed.rs:2029:9
  12:     0x555555a8d207 - test::test_main::{{closure}}::h77865bd3127078c6
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/test/src/lib.rs:138:21
  13:     0x555555b144d6 - <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call::hbc5ccf4eb663e1e5
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/alloc/src/boxed.rs:2029:9
  14:     0x555555b144d6 - std::panicking::rust_panic_with_hook::h095fccf1dc9379ee
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/panicking.rs:783:13
  15:     0x555555b14222 - std::panicking::begin_panic_handler::{{closure}}::h032ba12139b353db
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/panicking.rs:657:13
  16:     0x555555b12866 - std::sys_common::backtrace::__rust_end_short_backtrace::h9259bc2ff8fd0f76
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/sys_common/backtrace.rs:171:18
  17:     0x555555b13f80 - rust_begin_unwind
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/panicking.rs:645:5
  18:     0x5555555ace85 - core::panicking::panic_fmt::h784f20a50eaab275
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/core/src/panicking.rs:72:14
  19:     0x5555555ad393 - core::result::unwrap_failed::h03d8a5018196e1cd
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/core/src/result.rs:1649:5
  20:     0x555555793ade - jf_primitives::pcs::univariate_kzg::tests::gpu_end_to_end_test::h4a2c05e21a017c3d
  21:     0x555555915b69 - core::ops::function::FnOnce::call_once::h34aa7a3b7c3c87aa
  22:     0x555555a92adf - core::ops::function::FnOnce::call_once::h8dc6907944022cf6
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/core/src/ops/function.rs:250:5
  23:     0x555555a92adf - test::__rust_begin_short_backtrace::haae1a87433f1efb3
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/test/src/lib.rs:627:18
  24:     0x555555a91861 - test::run_test_in_process::{{closure}}::h8c7decfa7c14e152
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/test/src/lib.rs:650:60
  25:     0x555555a91861 - <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once::h19e6ff056d9d21e9
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/core/src/panic/unwind_safe.rs:272:9
  26:     0x555555a91861 - std::panicking::try::do_call::h89c848fcaa37c035
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/panicking.rs:552:40
  27:     0x555555a91861 - std::panicking::try::h57ab3dc74e2839b8
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/panicking.rs:516:19
  28:     0x555555a91861 - std::panic::catch_unwind::hfb6a1b1abc120fb9
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/panic.rs:142:14
  29:     0x555555a91861 - test::run_test_in_process::h5ae2f9875edd562d
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/test/src/lib.rs:650:27
  30:     0x555555a91861 - test::run_test::{{closure}}::h35d7300d8928a067
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/test/src/lib.rs:573:43
  31:     0x555555a58b96 - test::run_test::{{closure}}::h7525ced405d23d1b
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/test/src/lib.rs:601:41
  32:     0x555555a58b96 - std::sys_common::backtrace::__rust_begin_short_backtrace::h4e7db78ce05afad8
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/sys_common/backtrace.rs:155:18
  33:     0x555555a5dbf7 - std::thread::Builder::spawn_unchecked_::{{closure}}::{{closure}}::hcfbcb64f1a1b3482
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/thread/mod.rs:529:17
  34:     0x555555a5dbf7 - <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once::h9d89c5c4108bd689
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/core/src/panic/unwind_safe.rs:272:9
  35:     0x555555a5dbf7 - std::panicking::try::do_call::h8a4869bc94ec50c9
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/panicking.rs:552:40
  36:     0x555555a5dbf7 - std::panicking::try::h9a576f20ff81ac30
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/panicking.rs:516:19
  37:     0x555555a5dbf7 - std::panic::catch_unwind::hbcb4e3f860ef9830
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/panic.rs:142:14
  38:     0x555555a5dbf7 - std::thread::Builder::spawn_unchecked_::{{closure}}::h93c79a6be1505948
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/thread/mod.rs:528:30
  39:     0x555555a5dbf7 - core::ops::function::FnOnce::call_once{{vtable.shim}}::h426d96740c81bdaf
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/core/src/ops/function.rs:250:5
  40:     0x555555b19095 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h12de4fc57affb195
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/alloc/src/boxed.rs:2015:9
  41:     0x555555b19095 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h3c619f45059d5cf1
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/alloc/src/boxed.rs:2015:9
  42:     0x555555b19095 - std::sys::unix::thread::Thread::new::thread_start::hbac657605e4b7389
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/sys/unix/thread.rs:108:17
  43:     0x7ffff76a2333 - start_thread
  44:     0x7ffff7724efc - __clone3
  45:                0x0 - <unknown>

I first suspected that clang-sys that icicle's build.rs relies on didn't get the correct env var, but even with explicit setting it to the proper clang and llvm bin and lib, I still run into the above error.

I might need to dig deeper into what actually happens during the FFI calling phase, which object files are being used, and which dynamic library were used to built these executables etc. to pinpoint the source.

p.s. since icicle's build script explicitly use /usr/local/cuda/lib64 , I'm also setting my CUDA_PATH etc to a local installation instead of installing a cudatoolkit inside a nix shell

        baseShell = with pkgs;
          clang15Stdenv.mkDerivation {
            name = "clang15-nix-shell";
            buildInputs = [
              argbash
              openssl
              pkg-config
              git
              nixpkgs-fmt

              cargo-with-nightly
              stableToolchain
              nightlyToolchain
              cargo-sort
              clang-tools_15
              clangStdenv
              llvm_15
            ] ++ lib.optionals stdenv.isDarwin
              [ darwin.apple_sdk.frameworks.Security ];

            CARGO_TARGET_DIR = "target/nix_rustc";

            shellHook = ''
              export RUST_BACKTRACE=full
              export PATH="$PATH:$(pwd)/target/debug:$(pwd)/target/release"
              # Prevent cargo aliases from using programs in `~/.cargo` to avoid conflicts with local rustup installations.
              export CARGO_HOME=$HOME/.cargo-nix

              # Ensure `cargo fmt` uses `rustfmt` from nightly.
              export RUSTFMT="${nightlyToolchain}/bin/rustfmt"

              export C_INCLUDE_PATH="${llvmPackages_15.libclang.lib}/lib/clang/${llvmPackages_15.libclang.version}/include"
              export CC="${clang-tools_15.clang}/bin/clang"
              export CXX="${clang-tools_15.clang}/bin/clang++"
              export AR="${llvm_15}/bin/llvm-ar"
              export CFLAGS="-mcpu=generic"

              # ensure clang-sys got the correct version
              export LLVM_CONFIG_PATH="${llvmPackages_15.llvm.dev}/bin/llvm-config"
              export LIBCLANG_PATH=${llvmPackages_15.libclang.lib}/lib
              export CLANG_PATH=${clang-tools_15.clang}/bin/clang

              # by default choose u64_backend
              export RUSTFLAGS='--cfg curve25519_dalek_backend="u64"'
            ''
              # install pre-commit hooks
              + self.check.${system}.pre-commit-check.shellHook;
          };

//....
        devShells = {
          # enter with `nix develop .#cudaShell`
          cudaShell = baseShell.overrideAttrs (oldAttrs: {
            # for GPU/CUDA env (e.g. to run ICICLE code)
            name = "cuda-env-shell";
            buildInputs = oldAttrs.buildInputs ++ [ cmake util-linux gcc11 ];
            # CXX is overridden to use gcc11 as icicle-curves's build scripts need them, but gcc12 is not supported
            shellHook = oldAttrs.shellHook + ''
              export CUDA_PATH=/usr/local/cuda
              export PATH="${pkgs.gcc11}/bin:$CUDA_PATH/bin:$CUDA_PATH/nvvm/bin:$PATH"
              export LD_LIBRARY_PATH="$CUDA_PATH/lib64:$LIBCLANG_PATH"
            '';
          });
        };

🤦 need more time on this

@alxiong alxiong marked this pull request as ready for review March 6, 2024 18:06
@alxiong
Copy link
Contributor Author

alxiong commented Mar 6, 2024

The code is ready for review, interestingly I have some naive "warmup" function that makes the benchmark more accurate, as the warmup takes a constant of ~200ms, which you won't see now:

Start:   Committing to polynomial of degree 1048576
··Start:   Type Conversion: ark->ICICLE: Group
··End:     Type Conversion: ark->ICICLE: Group .....................................11.500ms
··Start:   Load group elements: CPU->GPU
··End:     Load group elements: CPU->GPU ...........................................5.197ms
··Start:   Type Conversion: ark->ICICLE: Scalar
··End:     Type Conversion: ark->ICICLE: Scalar ....................................5.835ms
··Start:   Load scalars: CPU->GPU
··End:     Load scalars: CPU->GPU ..................................................2.624ms
··Start:   GPU-accelerated MSM
··End:     GPU-accelerated MSM .....................................................20.493ms
··Start:   Load MSM result GPU->CPU
··End:     Load MSM result GPU->CPU ................................................26.000ms
··Start:   Type Conversion: ICICLE->ark: Group
··End:     Type Conversion: ICICLE->ark: Group .....................................27.140µs
End:     Committing to polynomial of degree 1048576  ...............................84.879ms

There are some remaining tasks:

  • Test passing for subslice (right now construction is fine, but memory dropping still panic; note without subslice things are fine)
  • add batch_commit() API which depends on the subslice (because I don't just want exact multiple number of bases of scalars, but I want multiple poly of any degrees <= supported_degree to be committed in batch)
  • update criterion benchmark code

Copy link
Contributor

@chancharles92 chancharles92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@alxiong alxiong merged commit 0679d65 into main Mar 8, 2024
5 checks passed
@alxiong alxiong deleted the icicle-msm branch March 8, 2024 15:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Integrate with ICICLE MSM
3 participants