llm-inference

Here are 16 public repositories matching this topic...

openvinotoolkit / openvino

OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference

nlp natural-language-processing ai computer-vision deep-learning transformers inference speech-recognition yolo recommendation-system performance-boost good-first-issue openvino diffusion-models stable-diffusion generative-ai llm-inference optimize-ai deploy-ai

Updated Jul 26, 2024
C++

intel / neural-speed

Star

An innovative library for efficient LLM inference via low-bit quantization

Updated Jul 26, 2024
C++

nomic-ai / gpt4all

Star

GPT4All: Chat with Local LLMs on Any Device

llm-inference

Updated Jul 25, 2024
C++

b4rtaz / distributed-llama

Sponsor

Star

Tensor parallelism is all you need. Run LLMs on an AI cluster at home using any device. Distribute the workload, divide RAM usage, and increase inference speed.

neural-network distributed-computing llm llms open-llm llm-inference llama2 distributed-llm llama3

Updated Jul 25, 2024
C++

vectorch-ai / ScaleLLM

Star

A high-performance inference system for large language models, designed for production environments.

performance gpu model production cuda efficiency inference transformer llama speculative serving llm llm-inference llama3

Updated Jul 25, 2024
C++

zhengpeirong / _distributed-llama

Star

Leverage tensor parallelism techniques to run large language models in the CPU memory of edge devices.

raspberry-pi parallelism llm-inference

Updated Jul 25, 2024
C++

foldl / chatllm.cpp

Star

Pure C++ implementation of several models for real-time chatting on your computer (CPU)

llm llm-inference

Updated Jul 25, 2024
C++

SJTU-IPADS / PowerInfer

Star

High-speed Large Language Model Serving on PCs with Consumer-grade GPUs

falcon llama large-language-models llm local-inference llm-inference bamboo-7b

Updated Jul 15, 2024
C++

lean-dojo / LeanCopilot

Star

LLMs as Copilots for Theorem Proving in Lean

machine-learning theorem-proving lean formal-mathematics lean4 llm-inference

Updated Jul 15, 2024
C++

modelscope / dash-infer

Star

DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including x86 and ARMv9.

cpu llm llm-inference native-engine

Updated Jul 12, 2024
C++

Adriankhl / godot-llm

Star

LLM in Godot

gamedev game-development godotengine godot godot-engine gdextension llamacpp llm-inference

Updated Jun 23, 2024
C++

CodeInferflow is a efficient inference engine based on Inferflow for code large language models (Code LLMs). With CodeInferflow, you can locally deploy popular code LLMs and efficiently use code completion in VSCode.

llm-inference codellms