๐ Enhanced Multimodal RAG with Hugging Face
Upload a PDF document and ask questions about its content, including images and tables!
Now with improved PDF processing and multiple extraction methods
๐ค Upload Document
๐ฌ Chat Interface
๐ก Example Questions
๐ Instructions:
- Get HF Token: Visit Hugging Face Settings to get your token
- Upload PDF: Click "Choose File" and select your PDF document
- Process Document: Click "Process PDF" and wait for confirmation
- Ask Questions: Type questions or use example prompts
โจ Enhanced Features:
- ๐ Multiple Text Extraction Methods: PyPDF2, PyMuPDF, OCR, and Unstructured
- ๐ผ๏ธ Advanced Image Processing: Direct PDF image extraction + vision models
- ๐ Robust PDF Handling: Works with scanned PDFs, complex layouts, and image-heavy documents
- ๐ฌ Interactive Chat: Conversation history with multimodal understanding
- โก Error Recovery: Graceful fallbacks when one extraction method fails
- ๐ Processing Statistics: Detailed feedback on what was extracted
๐ง Models Used:
- ๐ญ Multimodal: Microsoft GIT-Large (understands images + text together)
- ๐ Text Generation: Google FLAN-T5-Base (optimized for Q&A)
- ๐๏ธ Vision: Salesforce BLIP (image captioning and understanding)
- ๐ Embeddings: Sentence Transformers all-MiniLM-L6-v2
- ๐ OCR: Tesseract for text recognition in images
๐ฏ Multimodal Capabilities:
- Text + Images: Can answer questions about both text content and visual elements
- Image Understanding: Describes charts, diagrams, photos in your PDFs
- OCR Integration: Extracts text from images within PDFs
- Context Awareness: Combines text and visual information for comprehensive answers
- Fallback Strategy: Uses multiple methods to ensure successful text extraction
๐ ๏ธ Troubleshooting:
- No text extracted: Try different PDF files, ensure not password-protected
- Large files: Keep PDFs under 50MB for optimal performance
- Scanned PDFs: OCR will automatically process image-based text
- Complex layouts: Multiple extraction methods handle various PDF formats