Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request]: Use RAG-Retrieved Images as Input for VLM-Enabled Chat Model #1255

Open
1 task done
Sam-Wu-dev opened this issue Jun 24, 2024 · 0 comments
Open
1 task done
Labels

Comments

@Sam-Wu-dev
Copy link

Is there an existing issue for the same feature request?

  • I have checked the existing issues.

Is your feature request related to a problem?

Yes, I'm frustrated when the images retrieved by the Retrieval-Augmented Generation (RAG) process cannot be directly used as input for a chat model that has Vision-Language Model (VLM) capabilities. This limitation reduces the effectiveness and seamless integration of visual information in conversational AI.

Describe the feature you'd like

I would like the chat model with VLM abilities to accept images retrieved by the RAG process as input. This feature would allow the chat model to analyze and generate responses based on both text and visual content, providing a richer and more comprehensive conversational experience.

Describe implementation you've considered

  1. Input Handling: Modify the chat model's input processing to accept and handle images in addition to text. The model should be able to analyze visual content and incorporate it into its responses.
  2. User Interface: Provide a user-friendly interface that allows users to see both the text and images being processed by the chat model.

Documentation, adoption, use case

No response

Additional information

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants