Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Research Triton server as a potential integration to support multiple model backends/frameworks #15

Open
yondonfu opened this issue Sep 20, 2021 · 2 comments

Comments

@yondonfu
Copy link
Member

At the moment, the livepeer_dnn filter only supports a Tensorflow backend which means that only Tensorflow models can be used. There are downsides to only supporting Tensorflow as a backend. For example, Tensorflow itself consumes a lot of GPU VRAM at run time. We can address the downsides of the Tensorflow backend by supporting other deep learning backends/frameworks. Rather than implementing a standalone integration for each desired backend, we could research whether we could use something like Triton server to support a variety of different backends.

The goal of this research would be to determine the following:

  • The pros/cons of using Triton server
  • How Triton server could be integrated into ffmpeg
@cyberj0g
Copy link

Findings so far:

  1. No maintained local build script (CMake, Makefile) - which we would need if linking Triton directly into Ffmpeg using C API. Docker is the recommended way to build and deploy.
  2. Viable Ffmpeg integration options are:
    a. link library and use C API
    b. GRPC protocol. There's no C client, but Protobuf bindings can be generated with protobuf-c.
  3. Triton inference server docker image is 13.3 Gb. It's dependencies are nvidia-docker and Nvidia driver compatible with container's CUDA version.

@cyberj0g
Copy link

After further exploration, a viable option for accessing the model from FFMpeg C code seem to be HTTP REST API with memory sharing. Triton server supports RAM/VRAM sharing, which is also managed through HTTP REST API. Without data transfer, inference request\response through HTTP would impose minimal overhead, and we will be able to benefit from dynamic batching and multiple back-end support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants