Consumer Duty Evidence Engine

AI-assisted evidence review workflow for regulated financial-services artefacts

Consumer Duty Evidence Engine is a Django/React portfolio project that simulates a high-accountability evidence-review workflow inspired by FCA Consumer Duty monitoring expectations. It ingests complaints, disclosures, support transcripts, scripts, and policy materials; extracts structured claims and outcome-relevant facts; maps them to Consumer Duty outcome areas; scores evidence sufficiency; flags unsupported or contradictory evidence; routes uncertain cases into analyst review; and produces audit-ready outputs with traceable citations, evaluation metrics, and observable workflow state.

Screenshots

Why this project exists

Most portfolio AI applications stop at retrieval or summarisation.

This project is designed to demonstrate how AI operates inside a controlled, auditable workflow where outputs must be:

structured
traceable
reviewable
auditable
measurable
safe under failure

The focus is not generating answers, but managing evidence under uncertainty.

What this project demonstrates

AI-assisted extraction with strict schema validation
rule-assisted outcome mapping to a constrained taxonomy
evidence sufficiency scoring:
- supported
- weak support
- missing support
- contradictory support
- stale support
contradiction detection across multi-document case bundles
human review queues with assignment, approval, escalation, override controls, and audit logging
explicit state machine enforcing workflow correctness
observable async pipelines with WebSocket updates
regression-tested evaluation harness with 40+ benchmark cases
conservative fallback behaviour under provider failure or insufficient evidence

Architecture summary

System type: Django monolith with async workers and React frontend

Backend

Django + Django REST Framework (API and orchestration)
PostgreSQL (source of truth)
Redis (cache, broker, Channels layer)
Celery (async task pipeline)
Django Channels (WebSockets)

Frontend

React + TypeScript + Vite
React Query (server state)
Zod (schema validation)
Zustand (client state)

Supporting systems

pgvector (limited retrieval support)
evaluation harness (synthetic datasets + regression runner)
audit/event system
observability and metrics layer

Core workflow

Upload complaint and related artefacts
Persist artifacts and enqueue ingestion
Parse and segment documents asynchronously
Extract structured claims using strict schema validation
Map claims to Consumer Duty outcome areas
Link supporting and contradicting evidence
Assess evidence sufficiency
Detect contradictions and stale evidence
Generate structured recommendation memo (when safe)
Route uncertain cases into human review
Persist all actions in an audit timeline

System states

The system enforces a strict state machine:

new → ingestion_pending → parsing → parsed → extraction → mapping → assessment → recommendation

Terminal paths:

approved
needs_review
escalated
failed
archived

Invalid transitions are explicitly rejected.

Review workflow states:

unassigned → assigned → in_review → approved / overridden / escalated → closed

Tech stack

Backend

Django
Django REST Framework
PostgreSQL
Redis
Celery
Django Channels
pgvector
drf-spectacular (OpenAPI)

Frontend

React
TypeScript
Vite
React Router
React Query
Zod
Zustand

Tooling

Pytest
Vitest
Ruff, Black, isort
Docker Compose
GitHub Actions CI

Local setup

The commands below reflect the Windows development environment used for this project.

Requirements

Python 3.14
Node.js 20+
pnpm
Docker Desktop

Backend setup

cd D:\AI-Projects\consumer-duty-evidence-engine
.\.venv\Scripts\Activate.ps1
python -m pip install -r backend\requirements\dev.txt

Start infrastructure

docker compose up -d db redis

Run backend

cd backend
python manage.py migrate
python manage.py runserver

Run worker

cd backend
celery -A config worker -l info

Frontend setup

cd frontend
pnpm install
pnpm dev

Frontend: http://localhost:5173 Backend: http://localhost:8000

Demo data

The project includes 12 seeded demo cases, covering:

unclear fee disclosure
contradictory support scripts
missing evidence scenarios
stale policy/script cases
schema failure simulation
provider failure simulation and safe fallback routing
clearly supported cases

Seed data:

python infra/scripts/seed_demo_data.py

Evaluation

The system includes a synthetic evaluation harness with 40+ cases across:

supported scenarios
weak support
contradictions
missing evidence
stale evidence
adversarial formatting
routing edge cases
citation validation cases

Eval runner:

python infra/scripts/run_eval_suite.py

Metrics tracked:

claim precision / recall
outcome mapping accuracy
support-status accuracy
routing accuracy
citation validity rate
degraded-mode success rate

Reports:

evals/reports/latest-report.json

Observability and failure handling

The system explicitly models uncertainty and failure:

correlation IDs for request tracing
structured JSON logging
audit events for all state transitions
model execution logs (latency, status, cost)
WebSocket status updates

Failure handling:

schema validation failures → forced review
provider failures → conservative fallback or abstention
contradictory evidence → review routing
missing evidence → review routing
stale evidence → review routing

Fallback modes

rules-only mode when model unavailable
source-only mode when generation unsafe
request-more-evidence routing when the case lacks sufficient support for a safe recommendation

API documentation

OpenAPI schema:

/api/schema/

Swagger UI:

/api/docs/

Key endpoints:

/api/cases/
/api/cases/{id}/claims/
/api/cases/{id}/assessments/
/api/cases/{id}/recommendation/
/api/review-tasks/
/api/metrics/overview/
/api/evals/runs/

Architecture documentation

See:

docs/architecture/
docs/adr/
docs/domain/
docs/demos/

Key documents:

system overview
state machine
ingestion pipeline
evaluation design
failure modes
observability

Limitations

Portfolio-grade simulation, not production compliance software
Uses synthetic datasets rather than real regulated data
Simplified Consumer Duty taxonomy
Limited retrieval (pgvector used minimally)
Mock or constrained model integration

Honest claims

Built an AI-assisted evidence-review workflow with async ingestion, structured extraction, and human review routing
Implemented evidence sufficiency scoring and contradiction detection across multi-artifact case bundles
Designed an evaluation harness with regression datasets and measurable metrics
Added conservative fallback and failure-aware routing for provider failure and low-support cases
Exposed full audit trail, state transitions, and review actions through API and UI

License

This project is licensed under the MIT License.

See the LICENSE file for full details.

Name		Name	Last commit message	Last commit date
Latest commit History 96 Commits
.github/workflows		.github/workflows
backend		backend
demo_data		demo_data
docs		docs
evals		evals
frontend		frontend
infra		infra
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Consumer Duty Evidence Engine

AI-assisted evidence review workflow for regulated financial-services artefacts

Screenshots

Why this project exists

What this project demonstrates

Architecture summary

Core workflow

System states

Tech stack

Local setup

Requirements

Backend setup

Start infrastructure

Run backend

Run worker

Frontend setup

Demo data

Evaluation

Observability and failure handling

API documentation

Architecture documentation

Limitations

Honest claims

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Consumer Duty Evidence Engine

AI-assisted evidence review workflow for regulated financial-services artefacts

Screenshots

Why this project exists

What this project demonstrates

Architecture summary

Core workflow

System states

Tech stack

Local setup

Requirements

Backend setup

Start infrastructure

Run backend

Run worker

Frontend setup

Demo data

Evaluation

Observability and failure handling

API documentation

Architecture documentation

Limitations

Honest claims

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages