OCR-based-CRVS-digitization

OCR-Based CRVS Form Digitalization

A comprehensive web platform built with Node.js, React.js, and PostgreSQL, designed to streamline the digitization of Civil Registration and Vital Statistics (CRVS) forms. This system utilizes Optical Character Recognition (OCR) to process handwritten forms, extract information, and enable accurate digitalization with built-in correction capabilities.

System Overview

This platform consists of three integrated components that work together to provide a complete form digitization solution:

Frontend Application: A React-based user interface for uploading, reviewing, validating, and managing CRVS forms.
Main Backend API: An Express.js service handling authentication, workspace management, and workflow orchestration.
OCR Processing Service: A specialized microservice for form data extraction and correction.

Together, these components create a workflow that transforms paper CRVS forms into structured digital data that can be validated, corrected, and stored.

Project Repositories

1. OCR-Based CRVS Digitization Frontend

The frontend application provides a user-friendly interface for the entire digitization process:

Secure JWT-based authentication
Workspace management for organizing forms
Form upload and lifecycle tracking
Side-by-side validation of OCR data with original form images
Responsive design built with Bootstrap

Tech Stack: React v18, React Router, Context API, Bootstrap v5, CSS Modules

2. OCR-Based CRVS Digitization Backend

The main backend API manages the core business logic and coordinates the overall system:

User and admin authentication
Workspace organization and statistics
Form lifecycle management (Upload → Validate → Draft → Finalize)
Integration with Firebase for file storage
Communication with the OCR service

Tech Stack: Node.js, Express.js, Prisma ORM, PostgreSQL, JWT, Firebase Storage

3. OCR-Correction Service

A specialized microservice that handles the OCR processing and data extraction:

Form processing from PDFs or images
Tesseract.js OCR engine integration
Support for various field types (text, numeric, date, checkbox)
Automated correction and validation of extracted data
Schema-driven configuration for form fields

Tech Stack: Node.js, Express.js, Tesseract.js, Jimp, Prisma ORM, node-poppler

System Architecture

┌─────────────────┐       ┌─────────────────┐       ┌─────────────────┐
│                 │       │                 │       │                 │
│  React Frontend │◄─────►│  Backend API    │◄─────►│  OCR Service    │
│                 │  JWT  │                 │  API  │                 │
└─────────────────┘       └─────────────────┘       └─────────────────┘
        ▲                         ▲                         ▲
        │                         │                         │
        ▼                         ▼                         ▼
┌─────────────────┐       ┌─────────────────┐       ┌─────────────────┐
│                 │       │                 │       │                 │
│  User Interface │       │   PostgreSQL    │       │ Form Processing │
│    Workflow     │       │    Database     │       │    & Schema     │
│                 │       │                 │       │                 │
└─────────────────┘       └─────────────────┘       └─────────────────┘

Workflow

Form Upload: Users upload scanned CRVS forms (PDFs) through the frontend
Storage: Forms are stored in Firebase via the main backend
OCR Processing: The backend triggers the OCR service to extract data
Validation: Users review and correct extracted data in the frontend
Finalization: Corrected data is stored in the database as validated records

Getting Started

To set up the complete system:

Clone all three repositories
Follow the installation instructions in each repository's README
Configure the environment variables to ensure proper communication between services
Start the services in this order:
- OCR Service
- Main Backend API
- Frontend Application

Detailed setup instructions are available in each repository's documentation.

Use Cases

Educational Institutions: Digitize student registration forms
Government Agencies: Process civil registration documents
Healthcare Organizations: Convert patient intake forms
Research Institutions: Digitize survey responses
NGOs: Process beneficiary registration forms

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OCR-based-CRVS-digitization

OCR-Based CRVS Form Digitalization

System Overview

Project Repositories

1. OCR-Based CRVS Digitization Frontend

2. OCR-Based CRVS Digitization Backend

3. OCR-Correction Service

System Architecture

Workflow

Getting Started

Use Cases

Popular repositories Loading

Repositories

Uh oh!

Uh oh!

Uh oh!

Uh oh!

People

Top languages

Uh oh!

Most used topics

Uh oh!