Nexus

AI pipeline transforming integration data into multilingual information sheets for refugees and immigrants

🎯 Mission

Nexus is an AI-powered data pipeline that processes integration services data and generates high-quality, multilingual information sheets for Réfugiés.info — a trusted platform serving refugees and immigrants in France.

The Challenge

Information about integration services (French language learning, employment support, housing assistance) exists across fragmented sources but often fails to meet the needs of vulnerable populations:

Language barriers: Content rarely available in the 8+ languages refugees speak
Quality inconsistency: Information sheets vary wildly in clarity and completeness
Manual overhead: Creating and maintaining multilingual content is time-intensive
Accessibility gaps: Content not optimized for low-literacy or mobile-first users

Our Solution

Nexus automates the pipeline from raw data to publication-ready information sheets:

Data Sources → Ingestion → Reconciliation → Enrichment → Langage Clair → Translation → Validation → Publication
   (Carif Oref,      ↓            ↓              ↓            ↓              ↓            ↓            ↓
   Data Inclusion)  Clean      Merge data    Fill gaps   AI-assisted    8 languages  Editorial    Réfugiés.info
                    & validate  from APIs    via web      transformation + quality    charter      API
                                             scraping     to plain       checks       compliance
                                                          language

Key Innovation: The Langage Clair stage uses AI to transform bureaucratic/technical source text into clear, accessible French—reifying the expertise of Réfugiés.info's editorial team. This AI-assisted process dramatically increases throughput compared to purely manual editorial work while maintaining quality standards.

Example Transformation:

Before: "Dispositif d'apprentissage du français : permet de gagner en autonomie au quotidien grâce à des ateliers sociolinguistiques et cours de français langue professionnelle"
After: "Des ateliers 2 fois par semaine pour progresser en français, mieux communiquer au quotidien et dans le milieu professionnel."

✨ Key Features

✨ Langage Clair AI: Transforms bureaucratic text into clear, accessible French using AI trained on Réfugiés.info editorial expertise
🌍 Multilingual by Design: Generates content in 8 languages with quality validation
📊 Data Quality First: Validates and reconciles data at every pipeline stage
🔍 Observability: Complete traceability from source data to published content
🧩 Modular Architecture: Independent, testable pipeline stages
👥 User-Centered: Built with mandatory user research and iterative testing
🔒 Privacy-First: GDPR-compliant with AI-specific privacy safeguards
📝 Editorial Compliance: Adheres to Réfugiés.info editorial charter standards

🏗️ Architecture

Technology Stack

Polyglot Monorepo (Python-first + Node.js tooling):

Following dsfr-kit convention: Python packages in libs/, Node.js packages in packages/

nexus/
├── libs/                  # Python packages (independent libraries)
│   ├── common/            # Shared utilities and types
│   │   ├── src/
│   │   └── tests/
│   ├── ingestion/         # Data ingestion stage
│   │   ├── src/
│   │   └── tests/
│   │       ├── contract/
│   │       ├── integration/
│   │       └── unit/
│   ├── reconciliation/    # Data reconciliation stage
│   │   ├── src/
│   │   └── tests/
│   ├── enrichment/        # Data enrichment stage
│   │   ├── src/
│   │   └── tests/
│   ├── langage_clair/     # ⭐ AI plain language transformation
│   │   ├── src/
│   │   └── tests/
│   ├── translation/       # Multilingual translation stage
│   │   ├── src/
│   │   └── tests/
│   ├── validation/        # Quality validation stage
│   │   ├── src/
│   │   └── tests/
│   └── publication/       # Publication to Réfugiés.info
│       ├── src/
│       └── tests/
├── packages/              # Node.js packages
│   └── tooling/           # Build scripts, dev tools
├── notebooks/             # Jupyter: Exploratory analysis
└── docs/                  # Documentation

Core Technologies:

Python 3.12+: Pipeline implementation (data processing, AI/ML)
uv: Fast Python package management and workspace configuration
ruff: Python linting and formatting
Node.js 22+: Developer tooling
pnpm: Node.js package management
biome: JavaScript/TypeScript linting
pytest: Testing framework (contract, integration, unit tests)
just: Command runner for common development tasks

Pipeline Stages

Each stage is implemented as an independent Python library in libs/, enabling:

Independent development: Teams can work on different stages simultaneously
Independent testing: Each stage has its own test suite (contract, integration, unit)
Independent deployment: Stages can be deployed and scaled separately
Clear dependencies: Shared code lives in libs/common/

Ingestion (libs/ingestion/): Fetch data from Data Inclusion API (includes Carif Oref data)
Reconciliation (libs/reconciliation/): Merge and deduplicate data from multiple sources
Enrichment (libs/enrichment/): Fill gaps via Carif Oref API and web scraping
Langage Clair (libs/langage_clair/) ⭐: AI-assisted transformation of bureaucratic/technical text into clear, accessible French (reifying Réfugiés.info editorial expertise)
Translation (libs/translation/): Generate multilingual content (8 languages) from the plain language French
Validation (libs/validation/): Ensure editorial charter compliance and quality standards
Publication (libs/publication/): Push to Réfugiés.info via API (to be designed)

🚀 Getting Started

Prerequisites

Python 3.12+ with uv installed
Node.js 22+ with pnpm installed
just command runner (installation)
Git for version control

Installation

# Clone the repository
git clone git@github.com:refugies-info/nexus.git
cd nexus

# Install all dependencies (Python + Node.js + pre-commit hooks)
just install

# Or install manually:
uv sync                      # Install Python dependencies
pnpm install                 # Install Node.js dependencies
uv run pre-commit install    # Setup pre-commit hooks (includes nbstripout)

Common Development Tasks

# List all available commands
just

# Run linting (Python + Node.js)
just lint

# Format code (Python + Node.js)
just format

# Run tests (all libraries)
just test

# Run tests for specific library
uv run pytest libs/ingestion/tests/

# Type checking
uv run mypy libs/

# Run notebooks (exploratory work)
jupyter lab notebooks/

Running Pipeline Stages

Each pipeline stage is an independent library that can be developed and tested separately:

# Example: Run ingestion stage (once implemented)
uv run python -m libs.ingestion

# Example: Import shared utilities
uv run python -c "from libs.common import utils"

📋 Development Workflow

Nexus follows a specification-driven development approach with constitutional principles:

1. Feature Specification

# Create feature branch
git checkout -b 001-data-ingestion

# Generate specification
/speckit.specify "Implement data ingestion from Data Inclusion API"

2. Implementation Planning

# Generate implementation plan
/speckit.plan

3. Task Generation

# Generate actionable tasks
/speckit.tasks

4. Analysis & Validation

# Check for inconsistencies
/speckit.analyze

5. Implementation

# Execute tasks with TDD (test-first approach)
/speckit.implement

🧪 Testing

Test-Driven Development (TDD) is mandatory per our constitution:

# Write tests FIRST (red phase)
uv run pytest tests/contract/test_ingestion.py  # Should FAIL

# Implement feature (green phase)
# ... write code ...

# Tests should now PASS
uv run pytest tests/contract/test_ingestion.py

# Refactor while keeping tests green
# ... improve code quality ...

Test Categories

Contract Tests: Validate API contracts and interfaces
Integration Tests: Test end-to-end pipeline stages
Unit Tests: Test individual functions and classes

📚 Documentation

Constitution: Project principles and governance (v1.5.0)
Contributing: Development guidelines and workflow
Architecture: Detailed system design
API Specification: Réfugiés.info publication API design

🌟 Constitutional Principles

Nexus is governed by 12 core principles:

Data Quality First: Validation and reconciliation at every stage
Pipeline Modularity: Independent, testable components
Multilingual by Design: 8-language support as first-class requirement
Editorial Compliance (NON-NEGOTIABLE): Réfugiés.info charter adherence
Integration Independence: Standalone system with clean API contracts
Observability & Traceability: Complete pipeline visibility
Incremental Delivery: MVP-first with Carif Oref use case
Technology Foundation: Polyglot monorepo (Python-first)
User-Centered Development (NON-NEGOTIABLE): Mandatory user research
Notebook Governance: Structured exploratory work with security
Langage Clair (NON-NEGOTIABLE) ⭐: AI-assisted plain language transformation
Culturally-Aware Translation (NON-NEGOTIABLE) ⭐: Cultural mediation with glossaries and annotations

🤝 Contributing

We welcome contributions! Please see CONTRIBUTING.md for:

Code of conduct
Development setup
Pull request process
Testing requirements
Constitutional compliance

Quick Contribution Guide

Fork & Clone: Fork the repo and clone locally
Feature Branch: Create branch following ###-feature-name convention
User Research: Conduct user research if feature affects end users
TDD: Write tests first, ensure they fail, then implement
Constitution Check: Validate compliance with all principles
Pull Request: Submit PR with test evidence and documentation

📊 Project Status

Current Phase: MVP Development

Focus: French language learning information sheets from Carif Oref

Roadmap

Q4 2025: MVP - Carif Oref French learning sheets

Data ingestion from Data Inclusion
Basic translation pipeline (8 languages)
Editorial validation workflow
Réfugiés.info API specification

Q1 2026: Expansion

Additional data sources
Enhanced translation quality
User feedback integration
Performance optimization

🔗 Related Projects

Réfugiés.info: Main platform (publication target)
Data Inclusion: Primary data source
Carif Oref: French learning data provider

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

💬 Contact & Support

Issues: GitHub Issues
Discussions: GitHub Discussions
Email: contact@refugies.info

🙏 Acknowledgments

Réfugiés.info team: For platform integration and user research access
Carif Oref: For providing French learning data
Data Inclusion: For integration data aggregation
ai-kit project: For constitutional principles inspiration

Built with ❤️ for refugees and immigrants in France

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.specify		.specify
.windsurf/workflows		.windsurf/workflows
libs		libs
packages/tooling		packages/tooling
specs/001-monorepo-setup		specs/001-monorepo-setup
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
biome.json		biome.json
justfile		justfile
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

Nexus

🎯 Mission

The Challenge

Our Solution

✨ Key Features

🏗️ Architecture

Technology Stack

Pipeline Stages

🚀 Getting Started

Prerequisites

Installation

Common Development Tasks

Running Pipeline Stages

📋 Development Workflow

1. Feature Specification

2. Implementation Planning

3. Task Generation

4. Analysis & Validation

5. Implementation

🧪 Testing

Test Categories

📚 Documentation

🌟 Constitutional Principles

🤝 Contributing

Quick Contribution Guide

📊 Project Status

Current Phase: MVP Development

Roadmap

🔗 Related Projects

📄 License

💬 Contact & Support

🙏 Acknowledgments

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages