An LLM-powered system for extracting ICD/medical codes from clinical text and processing doctor feedback to update, remove, or add codes — built with BAML and dynamic enums.
Based on the AI That Works series on large-scale classification.
Medical coding is a classification problem with thousands of possible codes that change over time. This project demonstrates how to handle that with:
- Code Extraction — Parse clinical notes and extract relevant ICD-10 / CPT codes using an LLM constrained to a dynamic enum
- Feedback Analysis — Process doctor feedback on extracted codes and produce structured actions:
update_code,remove_code, oradd_code
The key insight: medical code sets are too large and change too frequently to hardcode. Using BAML's dynamic enums, the available codes are injected at runtime — the LLM only sees the codes relevant to the current patient encounter.
Clinical Text
│
▼
┌─────────────┐ MedCodes (dynamic enum)
│ ExtractCodes │◄─── injected at runtime via TypeBuilder
└──────┬──────┘
│ MedCodes[]
▼
┌──────────────┐
│ Doctor Review │ (human-in-the-loop)
└──────┬───────┘
│ FeedbackInput
▼
┌─────────────────┐
│ AnalyzeFeedback │──► Action[]
└─────────────────┘ ├─ UpdateCodeAction { code, new_code, reason }
└─ RemoveCodeAction { code, reason }
(type: "remove_code" | "add_code")
- Python 3.10+
- uv package manager
- An OpenAI API key (set
OPENAI_API_KEYin your environment)
# Install dependencies
uv sync
# Generate the BAML client (converts .baml files to Python)
uv run baml-cli generateuv run python main.pyThe MedCodes enum is declared as @@dynamic in BAML — it has no hardcoded values. At runtime, you populate it using the TypeBuilder:
from baml_client.type_builder import TypeBuilder
type_builder = TypeBuilder()
val = type_builder.MedCodes.add_value("E11_9")
val.description("Type 2 diabetes mellitus without complications")
type_builder.MedCodes.add_value("I10")
type_builder.MedCodes.add_value("J06_9")
# ... add as many codes as relevant to this encounterThis means the LLM is constrained to only output valid codes from the set you provide — no hallucinated codes.
Takes raw clinical text and returns a list of MedCodes:
codes = b.ExtractCodes("Patient presents with E11.9 diabetes...", {
"tb": type_builder
})
# => [MedCodes.E11_9, MedCodes.I10, ...]Takes structured doctor feedback and returns a list of actions:
feedback = FeedbackInput(
items=[
FeedbackItem(code=codes[0], feedback="Update to E11.65 — peripheral angiopathy"),
FeedbackItem(code=codes[1], feedback=None), # no change needed
],
other="Also add Z79.4 — long-term insulin"
)
actions = b.AnalyzeFeedback(feedback, {"tb": type_builder})
for action in actions:
if action.type == "update_code":
print(f"Update {action.code} → {action.new_code}: {action.reason}")
elif action.type == "remove_code":
print(f"Remove {action.code}: {action.reason}")
elif action.type == "add_code":
print(f"Add {action.code}: {action.reason}")Tests are defined directly in BAML and can be run in the BAML VSCode Playground. Each test includes a type_builder block that populates the dynamic MedCodes enum with realistic ICD-10 codes plus distractors.
| Test | Function | What it covers |
|---|---|---|
extract_icd_codes |
ExtractCodes |
Standard ICD-10 extraction from a clinical note |
extract_mixed_codes |
ExtractCodes |
Mixed ICD-10 diagnosis + CPT procedure codes |
analyze_feedback_update_code |
AnalyzeFeedback |
Doctor requests a code update |
analyze_feedback_remove_and_add |
AnalyzeFeedback |
Remove one code + add a new one via free-text |
analyze_feedback_multiple_updates |
AnalyzeFeedback |
Batch: two updates + one removal |
├── main.py # Entry point — end-to-end demo
├── baml_src/
│ ├── update_codes.baml # Core functions, types, and tests
│ ├── resume.baml # Example: resume extraction
│ ├── clients.baml # LLM client configurations
│ └── generators.baml # BAML code generation settings
├── baml_client/ # Auto-generated Python client (do not edit)
├── pyproject.toml # Python project config (uv/pip)
└── uv.lock # Locked dependencies
| Episode | Topic | Video |
|---|---|---|
| #1 | Large Scale Classification | YouTube |
| #24 | Evals for Classification | YouTube |
- Large Scale Classification — How to classify into 1000+ categories using embeddings + LLM selection
- Evals for Classification — Evaluation, tuning, and building UIs around classification systems
- Episode #1 code — Full large-scale classification pipeline with vector store, narrowing strategies, and Streamlit dashboard
- Episode #24 code — Eval-focused iteration on classification with custom dashboards
- All AI That Works episodes & code — Full repo with every episode's project
- BAML Docs — Getting started, dynamic types, testing, and more
- Dynamic Types — How
@@dynamicenums and classes work - BAML Playground (VSCode) — Run and test BAML functions directly in your editor
- BAML GitHub — Star the repo!
Built with BAML and OpenAI.