LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
-
Updated
Jul 10, 2024 - Python
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡
APIs aggregator for inference, fine-tuning and build models.
The easiest way to serve AI/ML models in production - Build Model Inference Service, LLM APIs, Multi-model Inference Graph/Pipelines, LLM/RAG apps, and more!
Run any open-source LLMs, such as Llama 2, Mistral, as OpenAI compatible API endpoint in the cloud.
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
Assistant virtuel pour aider aux démarches administratives
🔮 SuperDuper: Bring AI to your database! Build, deploy and manage any AI application directly with your existing data infrastructure, without moving your data. Including streaming inference, scalable model training and vector search.
JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).
PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"
20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
Implementation of Model-Distributed Inference for Large Language Models, built on top of LitGPT
dstack is an easy-to-use and flexible container orchestrator for running AI workloads in any cloud or data center.
Telegram bot for different language models. Supports system prompts and images
Optimized local inference for LLMs with HuggingFace-like APIs for quantization, vision/language models, multimodal agents, speech, vector DB, and RAG.
Design, conduct and analyze results of AI-powered surveys and experiments. Simulate social science and market research with large numbers of AI agents and LLMs.
We examine whether LLMs can maintain consistency over extended multiple text generation for 10 medical personas. We propose 5 novel plausibility metrics, and propose an ontology of common LLM errors.
VELOCITI Benchmark Evaluation and Visualisation Code
Use your open source local model from the terminal
LLM Serving Performance Evaluation Harness
Add a description, image, and links to the llm-inference topic page so that developers can more easily learn about it.
To associate your repository with the llm-inference topic, visit your repo's landing page and select "manage topics."