Medical Code Classification & Feedback Loop

An LLM-powered system for extracting ICD/medical codes from clinical text and processing doctor feedback to update, remove, or add codes — built with BAML and dynamic enums.

Based on the AI That Works series on large-scale classification.

Overview

Medical coding is a classification problem with thousands of possible codes that change over time. This project demonstrates how to handle that with:

Code Extraction — Parse clinical notes and extract relevant ICD-10 / CPT codes using an LLM constrained to a dynamic enum
Feedback Analysis — Process doctor feedback on extracted codes and produce structured actions: update_code, remove_code, or add_code

The key insight: medical code sets are too large and change too frequently to hardcode. Using BAML's dynamic enums, the available codes are injected at runtime — the LLM only sees the codes relevant to the current patient encounter.

Architecture

Clinical Text
      │
      ▼
┌─────────────┐     MedCodes (dynamic enum)
│ ExtractCodes │◄─── injected at runtime via TypeBuilder
└──────┬──────┘
       │ MedCodes[]
       ▼
┌──────────────┐
│ Doctor Review │  (human-in-the-loop)
└──────┬───────┘
       │ FeedbackInput
       ▼
┌─────────────────┐
│ AnalyzeFeedback  │──► Action[]
└─────────────────┘       ├─ UpdateCodeAction { code, new_code, reason }
                          └─ RemoveCodeAction { code, reason }
                                (type: "remove_code" | "add_code")

Quick Start

Prerequisites

Python 3.10+
uv package manager
An OpenAI API key (set OPENAI_API_KEY in your environment)

Installation

# Install dependencies
uv sync

# Generate the BAML client (converts .baml files to Python)
uv run baml-cli generate

Run

uv run python main.py

How It Works

Dynamic Enums

The MedCodes enum is declared as @@dynamic in BAML — it has no hardcoded values. At runtime, you populate it using the TypeBuilder:

from baml_client.type_builder import TypeBuilder

type_builder = TypeBuilder()
val = type_builder.MedCodes.add_value("E11_9")
val.description("Type 2 diabetes mellitus without complications")

type_builder.MedCodes.add_value("I10")
type_builder.MedCodes.add_value("J06_9")
# ... add as many codes as relevant to this encounter

This means the LLM is constrained to only output valid codes from the set you provide — no hallucinated codes.

ExtractCodes

Takes raw clinical text and returns a list of MedCodes:

codes = b.ExtractCodes("Patient presents with E11.9 diabetes...", {
    "tb": type_builder
})
# => [MedCodes.E11_9, MedCodes.I10, ...]

AnalyzeFeedback

Takes structured doctor feedback and returns a list of actions:

feedback = FeedbackInput(
    items=[
        FeedbackItem(code=codes[0], feedback="Update to E11.65 — peripheral angiopathy"),
        FeedbackItem(code=codes[1], feedback=None),  # no change needed
    ],
    other="Also add Z79.4 — long-term insulin"
)

actions = b.AnalyzeFeedback(feedback, {"tb": type_builder})
for action in actions:
    if action.type == "update_code":
        print(f"Update {action.code} → {action.new_code}: {action.reason}")
    elif action.type == "remove_code":
        print(f"Remove {action.code}: {action.reason}")
    elif action.type == "add_code":
        print(f"Add {action.code}: {action.reason}")

Testing

Tests are defined directly in BAML and can be run in the BAML VSCode Playground. Each test includes a type_builder block that populates the dynamic MedCodes enum with realistic ICD-10 codes plus distractors.

Test	Function	What it covers
`extract_icd_codes`	`ExtractCodes`	Standard ICD-10 extraction from a clinical note
`extract_mixed_codes`	`ExtractCodes`	Mixed ICD-10 diagnosis + CPT procedure codes
`analyze_feedback_update_code`	`AnalyzeFeedback`	Doctor requests a code update
`analyze_feedback_remove_and_add`	`AnalyzeFeedback`	Remove one code + add a new one via free-text
`analyze_feedback_multiple_updates`	`AnalyzeFeedback`	Batch: two updates + one removal

Project Structure

├── main.py                      # Entry point — end-to-end demo
├── baml_src/
│   ├── update_codes.baml        # Core functions, types, and tests
│   ├── resume.baml              # Example: resume extraction
│   ├── clients.baml             # LLM client configurations
│   └── generators.baml          # BAML code generation settings
├── baml_client/                 # Auto-generated Python client (do not edit)
├── pyproject.toml               # Python project config (uv/pip)
└── uv.lock                      # Locked dependencies

Resources

Videos

Episode	Topic	Video
#1	Large Scale Classification	YouTube
#24	Evals for Classification	YouTube

Episode Pages

Large Scale Classification — How to classify into 1000+ categories using embeddings + LLM selection
Evals for Classification — Evaluation, tuning, and building UIs around classification systems

Code

Episode #1 code — Full large-scale classification pipeline with vector store, narrowing strategies, and Streamlit dashboard
Episode #24 code — Eval-focused iteration on classification with custom dashboards
All AI That Works episodes & code — Full repo with every episode's project

Documentation

BAML Docs — Getting started, dynamic types, testing, and more
Dynamic Types — How @@dynamic enums and classes work
BAML Playground (VSCode) — Run and test BAML functions directly in your editor
BAML GitHub — Star the repo!

Built with BAML and OpenAI.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
baml_src		baml_src
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
diagram.png		diagram.png
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Medical Code Classification & Feedback Loop

Overview

Architecture

Quick Start

Prerequisites

Installation

Run

How It Works

Dynamic Enums

ExtractCodes

AnalyzeFeedback

Testing

Project Structure

Resources

Videos

Episode Pages

Code

Documentation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

BoundaryML/icd-codes-classification

Folders and files

Latest commit

History

Repository files navigation

Medical Code Classification & Feedback Loop

Overview

Architecture

Quick Start

Prerequisites

Installation

Run

How It Works

Dynamic Enums

ExtractCodes

AnalyzeFeedback

Testing

Project Structure

Resources

Videos

Episode Pages

Code

Documentation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages