A FastAPI-based OCR service that extracts text and structure from images, supporting advanced features like preprocessing, language selection, confidence scores, and more.
- Extracts text from uploaded images or image URLs
- Returns detected languages with confidence
- Returns bounding boxes for blocks, paragraphs, lines, and words
- Includes confidence scores for each word
- Supports image preprocessing: grayscale, thresholding, denoising
- Allows language selection for OCR
- Can automatically deskew (rotate) images
- Clone the repository and navigate to the project directory.
- Install dependencies:
pip install -r requirements.txt
- Make sure Tesseract OCR is installed and in your system PATH.
uvicorn main:app --reloadOr, if uvicorn is not in your PATH:
python -m uvicorn main:app --reloadVisit http://localhost:8000/docs for interactive documentation.
Accepts either an image file upload or an image URL, with optional processing options.
file: (optional) Image file to uploadimage_url: (optional) URL to an imagepreprocess: (optional) Comma-separated preprocessing steps:grayscale,threshold,denoiseocr_lang: (optional) Tesseract language(s), e.g.eng,fra,eng+fra(default:eng)deskew: (optional)trueorfalse(default:false), deskew image before OCR
curl -X POST "http://localhost:8000/extract-text/" \
-F "file=@your_image.png" \
-F "preprocess=grayscale,threshold" \
-F "ocr_lang=eng+fra" \
-F "deskew=true"{
"status": true,
"text": "...extracted text...",
"boxCoordinates": [0.1, 0.2, 0.3, 0.4],
"blocks": [
{
"boxCoordinates": [ ... ],
"paragraphs": [
{
"boxCoordinates": [ ... ],
"lines": [
{
"boxCoordinates": [ ... ],
"text": "...",
"words": [
{ "text": "...", "boxCoordinates": [ ... ], "confidence": 95.0 }
]
}
]
}
]
}
],
"detectedLanguages": [
{ "languageCode": "en", "confidence": 0.98 }
],
"executionTimeMS": 1234
}A Dockerfile is provided for containerized deployment. Build and run with:
docker build -t image-to-text-api .
docker run -p 8000:8000 image-to-text-api- For best results, ensure Tesseract is installed and available in your system PATH.
- You can combine preprocessing options for improved OCR accuracy.
- The API supports both file uploads and image URLs.
MIT