#

llm-inference

Here are 254 public repositories matching this topic...

bentoml / OpenLLM

Run any open-source LLMs, such as Llama 2, Mistral, as OpenAI compatible API endpoint in the cloud.

Updated Jul 10, 2024
Python

Lightning-AI / litgpt

20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.

ai deep-learning artificial-intelligence large-language-models llm llms llm-inference

Updated Jul 10, 2024
Python

BentoML

bentoml / BentoML

The easiest way to serve AI/ML models in production - Build Model Inference Service, LLM APIs, Multi-model Inference Graph/Pipelines, LLM/RAG apps, and more!

python machine-learning deep-learning model-serving multimodal mlops ml-engineering ai-inference llm generative-ai llmops llm-serving model-inference-service llm-inference inference-platform

Updated Jul 10, 2024
Python

superduper

SuperDuperDB / superduper

🔮 SuperDuper: Bring AI to your database! Build, deploy and manage any AI application directly with your existing data infrastructure, without moving your data. Including streaming inference, scalable model training and vector search.

Updated Jul 10, 2024
Python

InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

llama cuda-kernels deepspeed llm fastertransformer llm-inference turbomind internlm llama2 codellama llama3

Updated Jul 10, 2024
Python

deepsparse

neuralmagic / deepsparse

Sparsity-aware deep learning inference runtime for CPUs

nlp performance computer-vision inference machinelearning pruning object-detection pretrained-models quantization cpus onnx sparsification llm-inference deepsparse

Updated Jul 5, 2024
Python

databricks / dbrx

Code examples and resources for DBRX, a large language model developed by Databricks

databricks llm generative-ai gen-ai llm-training llm-inference mosaic-ai

Updated May 1, 2024
Python

intel / intel-extension-for-transformers

⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡

retrieval chatbot rag habana large-language-model chatpdf llm-inference 4-bits speculative-decoding llm-cpu streamingllm intel-optimized-llamacpp neural-chat neural-chat-7b autoround gaudi3

Updated Jul 10, 2024
Python

predibase / lorax

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

transformers pytorch llama gpt lora model-serving fine-tuning llm llmops llm-serving llm-inference

Updated Jul 10, 2024
Python

dstackai / dstack

dstack is an easy-to-use and flexible container orchestrator for running AI workloads in any cloud or data center.

python training aws machine-learning cloud azure gpu gcp orchestration fine-tuning llms llmops llm-training llm-inference

Updated Jul 10, 2024
Python

ray-project / ray-llm

RayLLM - LLMs on Ray

distributed-systems transformers ray serving large-language-models llm llmops llm-serving llm-inference

Updated May 28, 2024
Python

SafeAILab / EAGLE

Official Implementation of EAGLE-1 and EAGLE-2

large-language-models llm-inference speculative-decoding

Updated Jul 1, 2024
Python

llmflows

stoyan-stoyanov / llmflows

LLMFlows - Simple, Explicit and Transparent LLM Apps

python machine-learning ai openai question-answering vector-database gpt-4 llm prompt-engineering llms chatgpt llmops llm-inference

Updated Apr 7, 2024
Python

anarchy-ai / LLM-VM

irresponsible innovation. Try now at https://chat.dev/

machine-learning deep-learning artificial-intelligence distillation distillation-model llm llm-agent llm-training llm-inference llm-local

Updated May 14, 2024
Python

run-ai / genv

GPU environment and cluster management with LLM support

Updated May 16, 2024
Python

hpcaitech / SwiftInfer

Efficient AI Inference & Serving

deep-learning inference artificial-intelligence llama gpt llm-serving llm-inference llama2

Updated Jan 8, 2024
Python

Kenza-AI / sagify

LLMs and Machine Learning done easily

openai cohere sagemaker large-language-models llm generative-ai langchain llmops large-language-model anthropic langchain-python llm-inference open-source-llm ai-gateway

Updated Mar 10, 2024
Python

FlagAI-Open / Aquila2

The official repo of Aquila2 series proposed by BAAI, including pretrained & chat large language models.

llm llm-training llm-inference

Updated Feb 4, 2024
Python

embedding_studio

EulerSearch / embedding_studio

Embedding Studio is a framework which allows you transform your Vector Database into a feature-rich Search Engine.

search-engine embeddings semantic-similarity search-algorithm query-parser fine-tuning unstructured-data vector-database embeddings-similarity unstructured-search llm-inference search-query-parser

Updated Mar 15, 2024
Python

rizerphe / local-llm-function-calling

A tool for generating function arguments and choosing what function to call with local LLMs

json-schema huggingface-transformers llm llm-inference openai-functions chatgpt-functions openai-function-call

Updated Mar 12, 2024
Python

Improve this page

Add a description, image, and links to the llm-inference topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the llm-inference topic, visit your repo's landing page and select "manage topics."