DocuSense is an AI-powered platform that enables users to upload, analyze, and interact with their documents in a smarter way. Powered by cutting-edge technologies like Google Gemini AI, Tesseract OCR, FAISS, and Sentence Transformers, DocuSense extracts insights, enables semantic search, and supports interactive Q&Aβhelping users efficiently manage and understand document data.
- Upload PDFs or scanned documents.
- Complete pipeline for AI-driven analysis including OCR, summarization, entity recognition, and classification.
- Accurate extraction of text from images and scanned PDFs using Tesseract OCR.
- Summarization: Generate both extractive and abstractive summaries.
- Named Entity Recognition (NER): Detect entities like names, locations, dates, and more.
- Document Classification: Automatically categorize documents (e.g., legal, academic, business).
- Vector-based search using FAISS and sentence-transformers.
- Index caching for fast retrieval and query responses.
- Ask questions in natural language and get context-aware answers powered by Google Gemini.
- Chat interface that understands the documentβs content.
- Visualize document summaries, classifications, and metadata.
- Modular backend services for AI, search, and document management.
- Easily extendable for future use cases.
- Legal Professionals: Extract key clauses and entities from case files and contracts.
- Researchers: Summarize papers, find related works, and explore topics faster.
- Business Analysts: Organize reports, extract financial data, and classify documents.
- Knowledge Workers: Build searchable archives of documents with intelligent interactions.
| Area | Tools & Frameworks |
|---|---|
| Backend | Python, FastAPI, Uvicorn, Pydantic, python-dotenv |
| AI / ML | Google Gemini AI, Sentence-Transformers, FAISS, Tesseract OCR |
| Frontend | React 18, Axios, React Dropzone, CSS |
| Database | SQLite (metadata), easily replaceable with PostgreSQL |
| DevOps | Docker, virtual environments (venv) |
.
βββ app/ # Backend application (FastAPI)
βββ data/ # Uploaded and processed document storage
βββ frontend/ # Frontend application (React)
βββ requirements.txt # Python dependencies
βββ README.md # Project documentation
For full structure details, see the Getting Started section below.
- Python 3.10 or higher
- Node.js 16 or higher
- Tesseract OCR installed
- Google Gemini API key
git clone https://github.com/royxlead/DocuSense.git
cd DocuSensepython -m venv env-doc
source env-doc/bin/activate # On Linux/macOS
# .\env-doc\Scripts\activate # On Windows
pip install -r requirements.txtCreate a .env file in the root directory:
GEMINI_API_KEY="your-google-gemini-api-key"cd frontend
npm installOpen two terminals:
Terminal 1 β Backend:
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000Terminal 2 β Frontend:
cd frontend
npm startVisit http://localhost:3000 to access the application.
DocuSense follows a decoupled architecture:
- Frontend (React SPA) β Handles file uploads, dashboards, and interactions.
- Backend (FastAPI) β Provides API endpoints for document processing and querying.
- AI Pipeline β OCR, summarization, classification, and embedding pipelines for text analysis.
- Search Index β FAISS-based semantic search for efficient querying.
- Chat Interface β Real-time Q&A powered by Geminiβs natural language models.
Contains all backend logic: routes, models, services, and utilities.
Stores user uploads and processed document data.
Single-page React application for interacting with DocuSense.
Contributions are welcome! Follow these steps:
- Fork the repository.
- Create a new branch (
git checkout -b feature/your-feature). - Commit your changes (
git commit -m "Add some feature"). - Push the branch (
git push origin feature/your-feature). - Open a pull request.
Please ensure your code follows best practices and includes tests where applicable.
This project is licensed under the MIT License. See the LICENSE file for details.
For questions or collaborations, feel free to open an issue or connect via GitHub discussions!
Here's a visual tour of the Document Intelligence Platform's key features.
The user-friendly interface allows you to easily drag and drop or select documents for processing.
After processing, all your documents are neatly organized on the dashboard, showing key information like classification and file size at a glance.
Dive deep into a single document's analysis, including its summary, extracted entities, and other metadata.
Engage in a conversation with your document. Ask questions in natural language and get intelligent answers powered by Gemini.
The chat interface is clean, intuitive, and provides helpful suggestions to start the conversation.
Letβs make document intelligence accessible, scalable, and interactive. Welcome to DocuSense! π



.png)
.png)