An intelligent OCR system powered by DeepSeek models that extracts, understands, and structures text from images and documents. The system combines Optical Character Recognition (OCR) with Large Language Models (LLMs) to deliver clean, structured, and actionable data.
👉 Intelligent pipeline:
Image → OCR → Text → DeepSeek → Structured Output
- Hugging Face (platform)
- DeepSeek AI
- Model:
deepseek-ai/deepseek-coder-1.3b-instruct(free) - Role:
- Clean OCR text
- Correct recognition errors
- Structure data into JSON format
backend/: FastAPI, Tesseract OCR, Hugging Face integrationfrontend/: Vanilla JS, API service, upload UI
- Python (FastAPI)
- Tesseract OCR (pytesseract)
- Hugging Face Transformers
- DeepSeek Coder 1.3B
- Frontend (JS/HTML/CSS)
- OCR Extraction: Image → Raw text
- Intelligent cleaning (DeepSeek): Automatic correction (e.g. "Totl: 12O.OO USD" → "Total: 120.00 USD")
- JSON structuring: Automatic extraction of key fields
- Document types: Invoices, receipts, scanned documents
import pytesseract
from PIL import Image
text = pytesseract.image_to_string(Image.open("receipt.png"))from transformers import AutoTokenizer, AutoModelForCausalLM
model_name = "deepseek-ai/deepseek-coder-1.3b-instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
prompt = f"Clean and structure this OCR text into JSON:\n{text}"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=200)
result = tokenizer.decode(outputs[0])The frontend is located in the /frontend folder. It allows you to upload an image and see in real time the raw extraction and the JSON result.
- Start the backend (see below)
- Open
frontend/index.html(via Live Server orpy -m http.server 5500) - Configure the API base URL if needed (default:
http://localhost:8000)
- Install dependencies:
cd backend
pip install -r requirements.txt- Start the API:
uvicorn api.main:app --reload- Swagger UI: http://localhost:8000/docs
cd frontend
py -m http.server 5500🚀 AI Engineer / Data Engineer project — OCR + LLM (DeepSeek + Hugging Face)
