Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request]: Image2Text Model Integration for PDF Image Descriptions #1254

Open
1 task done
Sam-Wu-dev opened this issue Jun 24, 2024 · 0 comments
Open
1 task done
Labels

Comments

@Sam-Wu-dev
Copy link

Is there an existing issue for the same feature request?

  • I have checked the existing issues.

Is your feature request related to a problem?

Yes, I'm always frustrated when images within a PDF are processed only with OCR, which extracts text but fails to capture the context and detailed descriptions of the images. This limitation is particularly problematic when using the General or Manual chunking methods, as they do not provide comprehensive descriptions of visual content.

Describe the feature you'd like

I would like to integrate an Image2Text model to process and describe images within PDFs. This model should be used in conjunction with the General or Manual chunking methods to provide detailed and contextually accurate descriptions of images. This integration would enhance the overall data extracted from PDFs by including rich, descriptive information about visual content.

Describe implementation you've considered

  1. Integration: Modify the existing PDF processing pipeline to incorporate the Image2Text model. When an image is detected during the General or Manual chunking process, the image should be sent to the Image2Text model for analysis.
  2. Output Handling: The descriptive text generated by the Image2Text model should be incorporated into the final extracted data, alongside any text extracted via OCR.
  3. User Interface: Provide options for users to enable or disable Image2Text processing, allowing flexibility based on their specific needs.

Documentation, adoption, use case

Use Case:

Technical Manuals: Users processing technical manuals often need detailed descriptions of diagrams and images to fully understand the content.
Research Papers: Researchers can benefit from comprehensive descriptions of charts and figures, which are critical for interpreting data and results.
Educational Materials: Educators and students can gain a better understanding of visual content in educational PDFs, such as textbooks and study guides.

Additional information

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants