multimodal

Here are 713 public repositories matching this topic...

NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

machine-translation tts speech-synthesis neural-networks deeplearning speaker-recognition asr multimodal speech-translation large-language-models speaker-diariazation generative-ai

Updated Jul 3, 2024
Python

mbodiai / embodied-agents

Star

Seamlessly integrate state-of-the-art transformer models into robotics stacks

robotics artificial-intelligence transformer agents diffusion vlm multimodal embodied embodied-agent large-language-models llm generative-ai vision-language-model embodied-agents mbodi mbodiai

Updated Jul 3, 2024
Python

microsoft / unilm

Star

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Updated Jul 3, 2024
Python

danijar / granular

Star

Fast format for datasets

python machine-learning research ai artificial-intelligence datasets multimodal

Updated Jul 3, 2024
Python

Yangyi-Chen / Multimodal-AND-Large-Language-Models

Star

Paper list about multimodal and large language models, only used to record papers I read in the daily arxiv for personal needs.

machine-learning multimodal large-language-models general-purpose-model

Updated Jul 3, 2024

livekit / agents

Star

Build real-time multimodal AI applications 🤖🎙️📹

real-time video ai voice agents voice-assistant multimodal

Updated Jul 3, 2024
Python

modelscope / swift

Star

ms-swift: Use PEFT or Full-parameter to finetune 250+ LLMs or 40+ MLLMs. (Qwen2, GLM4, Internlm2, Yi, Llama3, Llava, MiniCPM-V, Deepseek, Baichuan2, Gemma2, Phi3-Vision, ...)

Updated Jul 3, 2024
Python

rerun-io / rerun

Star

Visualize streams of multimodal data. Fast, easy to use, and simple to integrate. Built in Rust using egui.

visualization python rust computer-vision cpp robotics multimodal

Updated Jul 3, 2024
Rust

pixeltable / pixeltable

Star

Data Infrastructure for Multimodal AI: Data, models, and orchestration in a unified declarative interface.

data-science machine-learning database ai computer-vision chatbot ml artificial-intelligence multimodal vector-database llm genai

Updated Jul 3, 2024
Python

isLinXu / paper-list

Star

autoupdate paper list

reinforcement-learning classification image-generation object-detection transfer-learning optical-flow object-tracking semantic-segmentation action-recognition audio-processing pose-estimation depth-estimation anomaly-detection multimodal scene-understanding graph-neural-networks llm

Updated Jul 3, 2024
Python

kyegomez / swarms

Sponsor

Star

The Enterprise-Grade Production-Ready Multi-Agent Orchestration Framework Join our Community: https://discord.com/servers/agora-999382051935506503

Updated Jul 2, 2024
Python

minjiyoon / MMGL

Star

Multimodal Graph Learning: how to encode multiple multimodal neighbors with their relations into LLMs

graph multimodal llm

Updated Jul 2, 2024
Python

mahmoodlab / MMP

Star

Multimodal prototyping for cancer survival prediction - ICML 2024

prototype survival optimal-transport pathology multimodal

Updated Jul 2, 2024
Jupyter Notebook

vdyma / fern

Star

The Multimodal Magic 🌿

research experimental multimodal

Updated Jul 2, 2024
Python

willxxy / awesome-mmps

Star

Corpus of resources for multimodal machine learning with physiological signals

machine-learning deep-learning signal-processing physiological-signals multimodal-learning multimodal multimodal-deep-learning multimodal-data

Updated Jul 2, 2024

dusty-nv / NanoLLM

Star

Optimized local inference for LLMs with HuggingFace-like APIs for quantization, vision/language models, multimodal agents, speech, vector DB, and RAG.

speech multimodal rag edge-ai vector-database vision-transformer llm-inference