Update CUDA custom call example code to use `ffi_call` #22141

dfm · 2024-06-27T13:20:22Z

Following up on #21925, we can update the example code in docs/cuda_custom_call to use ffi_call instead of manually registering core.Primitives. This removes quite a bit of boilerplate and doesn't require direct use of MLIR. This is meant as a demonstration of how ffi_call can be used for a common use case.

This could be useful for supporting the most common use cases for FFI custom calls. It has several benefits over using the `Primitive` based approach, but the biggest one (in my opinion) is that it doesn't require interacting with `mlir` at all. It does have the limitation that transforms would need to be registered using interfaces like `custom_vjp`, but many users of custom calls already do that. ~~The easiest to-do item (I think) is to implement batching using a `vectorized` parameter like `pure_callback`, but we could also think about more sophisticated vmapping interfaces in the future.~~ Done. The more difficult to-do is to think about how to support sharding, and we might actually want to expose an interface similar to the one from `custom_partitioning`. I have less experience with this part so I'll have to think some more about it, and feedback would be appreciated!

Following up on google#21925, we can update the example code in `docs/cuda_custom_call` to use `ffi_call` instead of manually registering `core.Primitive`s. This removes quite a bit of boilerplate and doesn't require direct use of MLIR.

dfm added 2 commits June 27, 2024 09:07

dfm self-assigned this Jun 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update CUDA custom call example code to use `ffi_call` #22141

Update CUDA custom call example code to use `ffi_call` #22141

dfm commented Jun 27, 2024

Update CUDA custom call example code to use ffi_call #22141

Are you sure you want to change the base?

Update CUDA custom call example code to use ffi_call #22141

Conversation

dfm commented Jun 27, 2024

Update CUDA custom call example code to use `ffi_call` #22141

Update CUDA custom call example code to use `ffi_call` #22141