[Feature Request]: Image2Text Model Integration for PDF Image Descriptions #1254

Sam-Wu-dev · 2024-06-24T07:06:40Z

Is there an existing issue for the same feature request?

I have checked the existing issues.

Is your feature request related to a problem?

Yes, I'm always frustrated when images within a PDF are processed only with OCR, which extracts text but fails to capture the context and detailed descriptions of the images. This limitation is particularly problematic when using the General or Manual chunking methods, as they do not provide comprehensive descriptions of visual content.

Describe the feature you'd like

I would like to integrate an Image2Text model to process and describe images within PDFs. This model should be used in conjunction with the General or Manual chunking methods to provide detailed and contextually accurate descriptions of images. This integration would enhance the overall data extracted from PDFs by including rich, descriptive information about visual content.

Describe implementation you've considered

Integration: Modify the existing PDF processing pipeline to incorporate the Image2Text model. When an image is detected during the General or Manual chunking process, the image should be sent to the Image2Text model for analysis.
Output Handling: The descriptive text generated by the Image2Text model should be incorporated into the final extracted data, alongside any text extracted via OCR.
User Interface: Provide options for users to enable or disable Image2Text processing, allowing flexibility based on their specific needs.

Documentation, adoption, use case

Use Case:

Technical Manuals: Users processing technical manuals often need detailed descriptions of diagrams and images to fully understand the content.
Research Papers: Researchers can benefit from comprehensive descriptions of charts and figures, which are critical for interpreting data and results.
Educational Materials: Educators and students can gain a better understanding of visual content in educational PDFs, such as textbooks and study guides.

Additional information

No response

KevinHuSh added the Feature label Jun 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request]: Image2Text Model Integration for PDF Image Descriptions #1254

[Feature Request]: Image2Text Model Integration for PDF Image Descriptions #1254

Sam-Wu-dev commented Jun 24, 2024

[Feature Request]: Image2Text Model Integration for PDF Image Descriptions #1254

[Feature Request]: Image2Text Model Integration for PDF Image Descriptions #1254

Comments

Sam-Wu-dev commented Jun 24, 2024

Is there an existing issue for the same feature request?

Is your feature request related to a problem?

Describe the feature you'd like

Describe implementation you've considered

Documentation, adoption, use case

Additional information