๐Ÿ“š Enhanced Multimodal RAG with Hugging Face

Upload a PDF document and ask questions about its content, including images and tables!

Now with improved PDF processing and multiple extraction methods

๐Ÿ“ค Upload Document

๐Ÿ’ฌ Chat Interface

๐Ÿ’ก Example Questions


๐Ÿ“‹ Instructions:

  1. Get HF Token: Visit Hugging Face Settings to get your token
  2. Upload PDF: Click "Choose File" and select your PDF document
  3. Process Document: Click "Process PDF" and wait for confirmation
  4. Ask Questions: Type questions or use example prompts

โœจ Enhanced Features:

  • ๐Ÿ“„ Multiple Text Extraction Methods: PyPDF2, PyMuPDF, OCR, and Unstructured
  • ๐Ÿ–ผ๏ธ Advanced Image Processing: Direct PDF image extraction + vision models
  • ๐Ÿ” Robust PDF Handling: Works with scanned PDFs, complex layouts, and image-heavy documents
  • ๐Ÿ’ฌ Interactive Chat: Conversation history with multimodal understanding
  • โšก Error Recovery: Graceful fallbacks when one extraction method fails
  • ๐Ÿ“Š Processing Statistics: Detailed feedback on what was extracted

๐Ÿ”ง Models Used:

  • ๐ŸŽญ Multimodal: Microsoft GIT-Large (understands images + text together)
  • ๐Ÿ“ Text Generation: Google FLAN-T5-Base (optimized for Q&A)
  • ๐Ÿ‘๏ธ Vision: Salesforce BLIP (image captioning and understanding)
  • ๐Ÿ” Embeddings: Sentence Transformers all-MiniLM-L6-v2
  • ๐Ÿ“– OCR: Tesseract for text recognition in images

๐ŸŽฏ Multimodal Capabilities:

  • Text + Images: Can answer questions about both text content and visual elements
  • Image Understanding: Describes charts, diagrams, photos in your PDFs
  • OCR Integration: Extracts text from images within PDFs
  • Context Awareness: Combines text and visual information for comprehensive answers
  • Fallback Strategy: Uses multiple methods to ensure successful text extraction

๐Ÿ› ๏ธ Troubleshooting:

  • No text extracted: Try different PDF files, ensure not password-protected
  • Large files: Keep PDFs under 50MB for optimal performance
  • Scanned PDFs: OCR will automatically process image-based text
  • Complex layouts: Multiple extraction methods handle various PDF formats