[Feature Request]: Use RAG-Retrieved Images as Input for VLM-Enabled Chat Model #1255

Sam-Wu-dev · 2024-06-24T07:16:11Z

Is there an existing issue for the same feature request?

I have checked the existing issues.

Is your feature request related to a problem?

Yes, I'm frustrated when the images retrieved by the Retrieval-Augmented Generation (RAG) process cannot be directly used as input for a chat model that has Vision-Language Model (VLM) capabilities. This limitation reduces the effectiveness and seamless integration of visual information in conversational AI.

Describe the feature you'd like

I would like the chat model with VLM abilities to accept images retrieved by the RAG process as input. This feature would allow the chat model to analyze and generate responses based on both text and visual content, providing a richer and more comprehensive conversational experience.

Describe implementation you've considered

Input Handling: Modify the chat model's input processing to accept and handle images in addition to text. The model should be able to analyze visual content and incorporate it into its responses.
User Interface: Provide a user-friendly interface that allows users to see both the text and images being processed by the chat model.

Documentation, adoption, use case

No response

Additional information

No response

KevinHuSh added the Feature label Jun 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request]: Use RAG-Retrieved Images as Input for VLM-Enabled Chat Model #1255

[Feature Request]: Use RAG-Retrieved Images as Input for VLM-Enabled Chat Model #1255

Sam-Wu-dev commented Jun 24, 2024

[Feature Request]: Use RAG-Retrieved Images as Input for VLM-Enabled Chat Model #1255

[Feature Request]: Use RAG-Retrieved Images as Input for VLM-Enabled Chat Model #1255

Comments

Sam-Wu-dev commented Jun 24, 2024

Is there an existing issue for the same feature request?

Is your feature request related to a problem?

Describe the feature you'd like

Describe implementation you've considered

Documentation, adoption, use case

Additional information