I build review-gated legal AI systems: evaluation harnesses, supervised workflows, and legal operating layers that make outputs cited, testable, and safe for human approval.
German-qualified lawyer, former NLP data scientist, and partner at gunnercooke. My work sits between legal engineering, product engineering, and AI governance: structured intake, deterministic checks, visible source provenance, human-approved outputs, and audit trails.
These are public-safe prototypes on synthetic data only — not client work, not production systems, not legal advice.
If you look at one repository, look at contract-review-eval-harness — an offline, deterministic evaluation harness for AI contract review. It scores model output against a hand-authored answer set and catches the failures lawyers actually care about: missed clauses, wrong risk severity, unsupported citations, and fabricated text.
git clone https://github.com/sebastianfoerste/contract-review-eval-harness
cd contract-review-eval-harness && make install && make test && make demoIn the sample run it catches a fabricated citation and marks the output for rejection. The thesis: legal AI quality should be measured, not asserted. A second idea runs alongside it: legal AI becomes useful when judgment is structured before the model acts, measured after it acts, and blocked until a human approves consequential use.
| Layer | The question it answers | Repository |
|---|---|---|
| Evaluation | How do we know the legal AI output is any good? | contract-review-eval-harness |
| Supervised workflow | How do we keep agentic legal work accountable? | legal-ops-agent |
| Legal operating layer | How does a GC scale intake, routing, approvals, and reporting? | legal-function-operating-system, ai-saas-legal-ops-starter-kit |
| Domain checks | Can regulation become cited, reviewable first-pass checks? | dpa-and-data-transfer-review, eu-ai-act-classifier, micar-whitepaper-linter, dora-third-party-register-and-resilience-workbench |
| Adoption | How does legal AI move from demo to daily use? | legal-ai-workshop-kit, legal-ai-adoption-dashboard |
Pick the three that match why you're here:
- Legal Engineer / Legal AI product →
contract-review-eval-harness·legal-ops-agent·dpa-and-data-transfer-review - AI SaaS General Counsel / Product Counsel →
ai-saas-legal-ops-starter-kit·legal-function-operating-system·dpa-and-data-transfer-review - AI governance / model evaluation →
eu-ai-act-classifier·contract-review-eval-harness·legal-ops-agent - Adoption / enablement / solutions →
legal-ai-workshop-kit·legal-ai-adoption-dashboard·legal-ops-agent - Financial regulation / crypto / MiCAR →
micar-whitepaper-linter·MiCAR-Authorization-Co-Pilot·eu-financial-reg-horizon-scanner
Useful legal AI is not about generating text. The questions I build around: is intake structured before drafting begins? Are assumptions, sources, and gaps visible? Can a user see what is draft, checked, approved, or blocked? Can quality be tested, not asserted? Can the workflow make a lawyer faster without pretending judgment has disappeared? That is why these projects lean on deterministic checks, evaluation scripts, explicit review states, blocked exports, and audit trails — not just prompts.
Most relevant for: legal AI product work · AI deployment in legal/regulated environments · legal engineering · AI governance and model evaluation · SaaS legal operations · privacy, financial-regulation, and product-counsel workflows. The strongest proof points are the evaluation harness, the approval-gated supervised workflow, the cited regulatory checks, and the legal operating system.
Partner at gunnercooke in Germany, advising on AI, SaaS, crypto, capital markets, payments, and EU financial regulation. German-qualified lawyer, admitted 2012; trained at Hengeler Mueller, Freshfields Bruckhaus Deringer, and Cleary Gottlieb. Earlier, data scientist at Dudenverlag building Python NLP pipelines.
Languages: German (native), English (fluent), French (professional working knowledge).
Synthetic examples only. No client data, no privileged material, no confidential negotiation history, no candidate data, no personal data. Public outputs are draft and review artifacts; they are not legal advice.


