AI pipeline transforming integration data into multilingual information sheets for refugees and immigrants
Nexus is an AI-powered data pipeline that processes integration services data and generates high-quality, multilingual information sheets for Réfugiés.info — a trusted platform serving refugees and immigrants in France.
Information about integration services (French language learning, employment support, housing assistance) exists across fragmented sources but often fails to meet the needs of vulnerable populations:
- Language barriers: Content rarely available in the 8+ languages refugees speak
- Quality inconsistency: Information sheets vary wildly in clarity and completeness
- Manual overhead: Creating and maintaining multilingual content is time-intensive
- Accessibility gaps: Content not optimized for low-literacy or mobile-first users
Nexus automates the pipeline from raw data to publication-ready information sheets:
Data Sources → Ingestion → Reconciliation → Enrichment → Langage Clair → Translation → Validation → Publication
(Carif Oref, ↓ ↓ ↓ ↓ ↓ ↓ ↓
Data Inclusion) Clean Merge data Fill gaps AI-assisted 8 languages Editorial Réfugiés.info
& validate from APIs via web transformation + quality charter API
scraping to plain checks compliance
language
Key Innovation: The Langage Clair stage uses AI to transform bureaucratic/technical source text into clear, accessible French—reifying the expertise of Réfugiés.info's editorial team. This AI-assisted process dramatically increases throughput compared to purely manual editorial work while maintaining quality standards.
Example Transformation:
- Before: "Dispositif d'apprentissage du français : permet de gagner en autonomie au quotidien grâce à des ateliers sociolinguistiques et cours de français langue professionnelle"
- After: "Des ateliers 2 fois par semaine pour progresser en français, mieux communiquer au quotidien et dans le milieu professionnel."
- ✨ Langage Clair AI: Transforms bureaucratic text into clear, accessible French using AI trained on Réfugiés.info editorial expertise
- 🌍 Multilingual by Design: Generates content in 8 languages with quality validation
- 📊 Data Quality First: Validates and reconciles data at every pipeline stage
- 🔍 Observability: Complete traceability from source data to published content
- 🧩 Modular Architecture: Independent, testable pipeline stages
- 👥 User-Centered: Built with mandatory user research and iterative testing
- 🔒 Privacy-First: GDPR-compliant with AI-specific privacy safeguards
- 📝 Editorial Compliance: Adheres to Réfugiés.info editorial charter standards
Polyglot Monorepo (Python-first + Node.js tooling):
Following dsfr-kit convention: Python packages in libs/, Node.js packages in packages/
nexus/
├── libs/ # Python packages (independent libraries)
│ ├── common/ # Shared utilities and types
│ │ ├── src/
│ │ └── tests/
│ ├── ingestion/ # Data ingestion stage
│ │ ├── src/
│ │ └── tests/
│ │ ├── contract/
│ │ ├── integration/
│ │ └── unit/
│ ├── reconciliation/ # Data reconciliation stage
│ │ ├── src/
│ │ └── tests/
│ ├── enrichment/ # Data enrichment stage
│ │ ├── src/
│ │ └── tests/
│ ├── langage_clair/ # ⭐ AI plain language transformation
│ │ ├── src/
│ │ └── tests/
│ ├── translation/ # Multilingual translation stage
│ │ ├── src/
│ │ └── tests/
│ ├── validation/ # Quality validation stage
│ │ ├── src/
│ │ └── tests/
│ └── publication/ # Publication to Réfugiés.info
│ ├── src/
│ └── tests/
├── packages/ # Node.js packages
│ └── tooling/ # Build scripts, dev tools
├── notebooks/ # Jupyter: Exploratory analysis
└── docs/ # Documentation
Core Technologies:
- Python 3.12+: Pipeline implementation (data processing, AI/ML)
- uv: Fast Python package management and workspace configuration
- ruff: Python linting and formatting
- Node.js 22+: Developer tooling
- pnpm: Node.js package management
- biome: JavaScript/TypeScript linting
- pytest: Testing framework (contract, integration, unit tests)
- just: Command runner for common development tasks
Each stage is implemented as an independent Python library in libs/, enabling:
- Independent development: Teams can work on different stages simultaneously
- Independent testing: Each stage has its own test suite (contract, integration, unit)
- Independent deployment: Stages can be deployed and scaled separately
- Clear dependencies: Shared code lives in
libs/common/
- Ingestion (
libs/ingestion/): Fetch data from Data Inclusion API (includes Carif Oref data) - Reconciliation (
libs/reconciliation/): Merge and deduplicate data from multiple sources - Enrichment (
libs/enrichment/): Fill gaps via Carif Oref API and web scraping - Langage Clair (
libs/langage_clair/) ⭐: AI-assisted transformation of bureaucratic/technical text into clear, accessible French (reifying Réfugiés.info editorial expertise) - Translation (
libs/translation/): Generate multilingual content (8 languages) from the plain language French - Validation (
libs/validation/): Ensure editorial charter compliance and quality standards - Publication (
libs/publication/): Push to Réfugiés.info via API (to be designed)
- Python 3.12+ with uv installed
- Node.js 22+ with pnpm installed
- just command runner (installation)
- Git for version control
# Clone the repository
git clone git@github.com:refugies-info/nexus.git
cd nexus
# Install all dependencies (Python + Node.js + pre-commit hooks)
just install
# Or install manually:
uv sync # Install Python dependencies
pnpm install # Install Node.js dependencies
uv run pre-commit install # Setup pre-commit hooks (includes nbstripout)# List all available commands
just
# Run linting (Python + Node.js)
just lint
# Format code (Python + Node.js)
just format
# Run tests (all libraries)
just test
# Run tests for specific library
uv run pytest libs/ingestion/tests/
# Type checking
uv run mypy libs/
# Run notebooks (exploratory work)
jupyter lab notebooks/Each pipeline stage is an independent library that can be developed and tested separately:
# Example: Run ingestion stage (once implemented)
uv run python -m libs.ingestion
# Example: Import shared utilities
uv run python -c "from libs.common import utils"Nexus follows a specification-driven development approach with constitutional principles:
# Create feature branch
git checkout -b 001-data-ingestion
# Generate specification
/speckit.specify "Implement data ingestion from Data Inclusion API"# Generate implementation plan
/speckit.plan# Generate actionable tasks
/speckit.tasks# Check for inconsistencies
/speckit.analyze# Execute tasks with TDD (test-first approach)
/speckit.implementTest-Driven Development (TDD) is mandatory per our constitution:
# Write tests FIRST (red phase)
uv run pytest tests/contract/test_ingestion.py # Should FAIL
# Implement feature (green phase)
# ... write code ...
# Tests should now PASS
uv run pytest tests/contract/test_ingestion.py
# Refactor while keeping tests green
# ... improve code quality ...- Contract Tests: Validate API contracts and interfaces
- Integration Tests: Test end-to-end pipeline stages
- Unit Tests: Test individual functions and classes
- Constitution: Project principles and governance (v1.5.0)
- Contributing: Development guidelines and workflow
- Architecture: Detailed system design
- API Specification: Réfugiés.info publication API design
Nexus is governed by 12 core principles:
- Data Quality First: Validation and reconciliation at every stage
- Pipeline Modularity: Independent, testable components
- Multilingual by Design: 8-language support as first-class requirement
- Editorial Compliance (NON-NEGOTIABLE): Réfugiés.info charter adherence
- Integration Independence: Standalone system with clean API contracts
- Observability & Traceability: Complete pipeline visibility
- Incremental Delivery: MVP-first with Carif Oref use case
- Technology Foundation: Polyglot monorepo (Python-first)
- User-Centered Development (NON-NEGOTIABLE): Mandatory user research
- Notebook Governance: Structured exploratory work with security
- Langage Clair (NON-NEGOTIABLE) ⭐: AI-assisted plain language transformation
- Culturally-Aware Translation (NON-NEGOTIABLE) ⭐: Cultural mediation with glossaries and annotations
We welcome contributions! Please see CONTRIBUTING.md for:
- Code of conduct
- Development setup
- Pull request process
- Testing requirements
- Constitutional compliance
- Fork & Clone: Fork the repo and clone locally
- Feature Branch: Create branch following
###-feature-nameconvention - User Research: Conduct user research if feature affects end users
- TDD: Write tests first, ensure they fail, then implement
- Constitution Check: Validate compliance with all principles
- Pull Request: Submit PR with test evidence and documentation
Focus: French language learning information sheets from Carif Oref
- Constitution ratified (v1.5.0)
- Project structure defined
- Development templates created
- Data ingestion implementation
- Translation pipeline
- Editorial validation
- Réfugiés.info API design
- User research for AI transparency
Q4 2025: MVP - Carif Oref French learning sheets
- Data ingestion from Data Inclusion
- Basic translation pipeline (8 languages)
- Editorial validation workflow
- Réfugiés.info API specification
Q1 2026: Expansion
- Additional data sources
- Enhanced translation quality
- User feedback integration
- Performance optimization
- Réfugiés.info: Main platform (publication target)
- Data Inclusion: Primary data source
- Carif Oref: French learning data provider
This project is licensed under the MIT License - see the LICENSE file for details.
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Email: contact@refugies.info
- Réfugiés.info team: For platform integration and user research access
- Carif Oref: For providing French learning data
- Data Inclusion: For integration data aggregation
- ai-kit project: For constitutional principles inspiration
Built with ❤️ for refugees and immigrants in France