OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference
-
Updated
Jul 26, 2024 - C++
OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference
Tensor parallelism is all you need. Run LLMs on an AI cluster at home using any device. Distribute the workload, divide RAM usage, and increase inference speed.
A high-performance inference system for large language models, designed for production environments.
Leverage tensor parallelism techniques to run large language models in the CPU memory of edge devices.
Pure C++ implementation of several models for real-time chatting on your computer (CPU)
High-speed Large Language Model Serving on PCs with Consumer-grade GPUs
LLMs as Copilots for Theorem Proving in Lean
DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including x86 and ARMv9.
LLM in Godot
CodeInferflow is a efficient inference engine based on Inferflow for code large language models (Code LLMs). With CodeInferflow, you can locally deploy popular code LLMs and efficiently use code completion in VSCode.
Multi-Model and multi-tasking llama Discord Bot - Mirror of: https://gitlab.com/niansa/discord_llama
Super easy to use library for doing LLaMA/GPT-J stuff! - Mirror of: https://gitlab.com/niansa/libjustlm
Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).
Add a description, image, and links to the llm-inference topic page so that developers can more easily learn about it.
To associate your repository with the llm-inference topic, visit your repo's landing page and select "manage topics."