llm-inference

The easiest way to serve AI/ML models in production - Build Model Inference Service, LLM APIs, Multi-model Inference Graph/Pipelines, LLM/RAG apps, and more!

python machine-learning deep-learning model-serving multimodal mlops ml-engineering ai-inference llm generative-ai llmops llm-serving model-inference-service llm-inference inference-platform

Updated Jul 10, 2024
Python

bentoml / OpenLLM

Star

Run any open-source LLMs, such as Llama 2, Mistral, as OpenAI compatible API endpoint in the cloud.

Updated Jul 10, 2024
Python

predibase / lorax

Star

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

transformers pytorch llama gpt lora model-serving fine-tuning llm llmops llm-serving llm-inference

Updated Jul 10, 2024
Python

RubenGres / GouvX

Star

Assistant virtuel pour aider aux démarches administratives

civic-tech rag llm-inference

Updated Jul 10, 2024
Python

🔮 SuperDuper: Bring AI to your database! Build, deploy and manage any AI application directly with your existing data infrastructure, without moving your data. Including streaming inference, scalable model training and vector search.

Updated Jul 9, 2024
Python

google / JetStream

Star

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).

gpu inference pytorch transformer llama gpt gemma model-serving tpu jax mlops large-language-models llm llmops llm-inference llama2

Updated Jul 9, 2024
Python

google / jetstream-pytorch

Star

PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"

inference pytorch batching attention llama gemma model-serving tpu llm llm-inference llama2

Updated Jul 10, 2024
Python

Lightning-AI / litgpt

Star

20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.

ai deep-learning artificial-intelligence large-language-models llm llms llm-inference

Updated Jul 9, 2024
Python

davmacario / MDI-LLM

Star

Implementation of Model-Distributed Inference for Large Language Models, built on top of LitGPT

ai torch llms llm-inference

Updated Jul 9, 2024
Python

dstackai / dstack

Star

dstack is an easy-to-use and flexible container orchestrator for running AI workloads in any cloud or data center.

python training aws machine-learning cloud azure gpu gcp orchestration fine-tuning llms llmops llm-training llm-inference

Updated Jul 10, 2024
Python

IlyaGusev / saiga_bot

Star

Telegram bot for different language models. Supports system prompts and images

telegram-bot language-model llm llm-inference

Updated Jul 9, 2024
Python

dusty-nv / NanoLLM

Star

Optimized local inference for LLMs with HuggingFace-like APIs for quantization, vision/language models, multimodal agents, speech, vector DB, and RAG.

speech multimodal rag edge-ai vector-database vision-transformer llm-inference

Updated Jul 9, 2024
Python

expectedparrot / edsl

Star

Design, conduct and analyze results of AI-powered surveys and experiments. Simulate social science and market research with large numbers of AI agents and LLMs.

python open-source openai surveys experiments domain-specific-language market-research social-science synthetic-data data-labeling llm anthropic llm-agent llm-inference llama2 llm-framework mixtral deepinfra

Updated Jul 9, 2024
Python

Aadit3003 / llm-medical-personas

Star

We examine whether LLMs can maintain consistency over extended multiple text generation for 10 medical personas. We propose 5 novel plausibility metrics, and propose an ontology of common LLM errors.