GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.
-
Updated
Sep 7, 2024 - C++
GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.
OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference
High-speed Large Language Model Serving on PCs with Consumer-grade GPUs
Tensor parallelism is all you need. Run LLMs on an AI cluster at home using any device. Distribute the workload, divide RAM usage, and increase inference speed.
LLMs as Copilots for Theorem Proving in Lean
校招、秋招、春招、实习好项目,带你从零动手实现支持LLama的大模型推理框架。
Pure C++ implementation of several models for real-time chatting on your computer (CPU)
A high-performance inference system for large language models, designed for production environments.
Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).
DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including x86 and ARMv9.
LLM in Godot
Aussie AI Base C++ Library is the source code repo for the book Generative AI in C++, along with various other AI/ML kernels.
Super easy to use library for doing LLaMA/GPT-J stuff! - Mirror of: https://gitlab.com/niansa/libjustlm
Multi-Model and multi-tasking llama Discord Bot - Mirror of: https://gitlab.com/niansa/discord_llama
Leverage tensor parallelism techniques to run large language models in the CPU memory of edge devices.
CodeInferflow is a efficient inference engine based on Inferflow for code large language models (Code LLMs). With CodeInferflow, you can locally deploy popular code LLMs and efficiently use code completion in VSCode.
Add a description, image, and links to the llm-inference topic page so that developers can more easily learn about it.
To associate your repository with the llm-inference topic, visit your repo's landing page and select "manage topics."