🐊 Snappy's unique approach unifies vision-language late interaction with structured OCR for region-level knowledge retrieval. Like the project? Drop a star! ⭐
-
Updated
Dec 23, 2025 - Python
🐊 Snappy's unique approach unifies vision-language late interaction with structured OCR for region-level knowledge retrieval. Like the project? Drop a star! ⭐
Jina VDR is a multilingual, multi-domain benchmark for visual document retrieval
Python program for searching pdf text, ranking the results and exporting highlighted search results in pdf. Uses trie structure, stack, heap, page graph. Converts queries to postfix notation. Allows for logical expressions and phrases. Offers did you mean functionality.
DocuVisQA(Document Visual Question Answering) is a Python project that leverages Google's Generative AI and Langchain for document processing, text splitting, and question answering. It also supports image processing with Streamlit for interactive UI.
A web interface that allows searching for PDFs by their content
Use semantic search on PDFs locally
CLI for merging PDF contexts.
In Development
Given a set of PDFs and the query, the most relevant pdf can be found with the help of TF-IDF. The code has not used any library to implement TF-IDF
Cognivia AI is a powerful AI-powered PDF search and question-answering system built with LangChain, Pinecone Vector Store, OpenAI, and Supabase. Upload PDFs, ask questions, and get intelligent answers with persistent conversation memory.
A tool to search for text in PDF files using multiple methods, including OCR (Optical Character Recognition).
Are you short on time?! Can't you search all the PDFs one by one for the content you want?! Well, PDF-Founder is here...
Programa que busca uma lista de nomes das Partes Processuais nos PDFs do Diário Oficial.
This Python script allows users to search through PDF documents located in predefined directories for specific keywords. It uses PyPDF2 to extract text from PDFs and supports single or dual keyword searches.
Repository for the Indexing, Search and Evaluation of UniChemFinder
📄 PDF Search Engine – Advanced keyword-based PDF search with logical operators, graph-based ranking, autocomplete, and highlighted exports.
Python console app that uses smart searching through the provided PDF. It showcases the use of tries for word searching.
A high-performance RAG system for PDFs using multi-vector embeddings (ColPali / ColQwen / ColSmol) with vector search in Qdrant, prefetch optimization, and reranking for improved relevance. Designed for speed, accuracy, and scalability, this system is ideal for building intelligent search, document understanding, and QA applications.
An AI-powered Streamlit app for PDF and web-based Q&A using RAG (Retrieval-Augmented Generation), Groq’s Mixtral LLM, and DeepAI image generation.
Build a workflow using CrewAI tools to scrape the content from the docs and then perform RAG on it.
Add a description, image, and links to the pdf-search topic page so that developers can more easily learn about it.
To associate your repository with the pdf-search topic, visit your repo's landing page and select "manage topics."