Skip to content

Latest commit

 

History

History
19 lines (11 loc) · 1.14 KB

ML_IN_PRODUCTION.md

File metadata and controls

19 lines (11 loc) · 1.14 KB

Machine Learning in Production

Serving ML/DL Models

In most of the use cases, RESTful API is always the preference of deploying machine learning models. In this section, I will describe some popular approaches.

Clipper

Clipper is a low-latency prediction serving system for machine learning. Clipper makes it simple to integrate machine learning into user-facing serving systems.

GraphPipe

GraphPipe is a protocol and collection of software designed to simplify machine learning model deployment and decouple it from framework-specific model implementations.

NVIDIA TensorRT Inference Server

TensorRT Inference Server (TRTIS) provides a cloud inferencing solution optimized for NVIDIA GPUs. The server provides an inference service via an HTTP or gRPC endpoint, allowing remote clients to request inferencing for any model being managed by the server.

Use Nginx Proxy to Enable Multi-services

See README.md Upgrade Django Built-in Server section.