A document classification system engineered to automate the categorization of complex paperwork using deep learning.
- Framework Choice (FastAI & PyTorch): I chose FastAI on top of PyTorch to leverage its "high-level" abstractions for rapid prototyping without sacrificing the "low-level" control of PyTorch for fine-tuning model layers.
- Model Architecture: Implemented a Convolutional Neural Network (CNN) optimized for document layout recognition, allowing the system to distinguish between text-heavy documents and structured forms.
- Data Pipeline: Developed a custom Python-based preprocessing pipeline to normalize document images, ensuring consistent performance across varying scan qualities.
One of the most significant challenge in building IntelliScan was handling "Noisy" Data.
Initial model iterations struggled with low-quality scans and varying lighting conditions. I resolved this by implementing an image augmentation layer that artificially introduced noise, rotations, and blur during training. This "robustness training" improved the model's real-world accuracy significantly, teaching me that in AI, the quality and diversity of the data pipeline are often more critical than the complexity of the architecture itself.
While the current version performs localized inference, a review of my work identifies the following enhancements:
- Distributed Inference: To handle thousands of concurrent scans, the model should be containerized using Docker and deployed via AWS SageMaker or Google Vertex AI.
- Vector Search: For large-scale document retrieval, implementing a vector database like Pinecone would allow users to search for "similar" documents based on AI-generated embeddings.
- Backend: FastAPI, Python
- ML: Fast.ai, ResNet18
- Frontend: Streamlit
- Deployment: Docker, Hugging Face
- Single & batch file processing
- Real-time classification
- CSV export functionality
- RESTful API with automatic docs
- Supported Types: Invoices, Receipts, Contracts, Research Papers