Skip to content

BoundaryML/icd-codes-classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Medical Code Classification & Feedback Loop

An LLM-powered system for extracting ICD/medical codes from clinical text and processing doctor feedback to update, remove, or add codes — built with BAML and dynamic enums.

Based on the AI That Works series on large-scale classification.

Overview

Medical coding is a classification problem with thousands of possible codes that change over time. This project demonstrates how to handle that with:

  1. Code Extraction — Parse clinical notes and extract relevant ICD-10 / CPT codes using an LLM constrained to a dynamic enum
  2. Feedback Analysis — Process doctor feedback on extracted codes and produce structured actions: update_code, remove_code, or add_code

The key insight: medical code sets are too large and change too frequently to hardcode. Using BAML's dynamic enums, the available codes are injected at runtime — the LLM only sees the codes relevant to the current patient encounter.

Architecture

Clinical Text
      │
      ▼
┌─────────────┐     MedCodes (dynamic enum)
│ ExtractCodes │◄─── injected at runtime via TypeBuilder
└──────┬──────┘
       │ MedCodes[]
       ▼
┌──────────────┐
│ Doctor Review │  (human-in-the-loop)
└──────┬───────┘
       │ FeedbackInput
       ▼
┌─────────────────┐
│ AnalyzeFeedback  │──► Action[]
└─────────────────┘       ├─ UpdateCodeAction { code, new_code, reason }
                          └─ RemoveCodeAction { code, reason }
                                (type: "remove_code" | "add_code")

Quick Start

Prerequisites

  • Python 3.10+
  • uv package manager
  • An OpenAI API key (set OPENAI_API_KEY in your environment)

Installation

# Install dependencies
uv sync

# Generate the BAML client (converts .baml files to Python)
uv run baml-cli generate

Run

uv run python main.py

How It Works

Dynamic Enums

The MedCodes enum is declared as @@dynamic in BAML — it has no hardcoded values. At runtime, you populate it using the TypeBuilder:

from baml_client.type_builder import TypeBuilder

type_builder = TypeBuilder()
val = type_builder.MedCodes.add_value("E11_9")
val.description("Type 2 diabetes mellitus without complications")

type_builder.MedCodes.add_value("I10")
type_builder.MedCodes.add_value("J06_9")
# ... add as many codes as relevant to this encounter

This means the LLM is constrained to only output valid codes from the set you provide — no hallucinated codes.

ExtractCodes

Takes raw clinical text and returns a list of MedCodes:

codes = b.ExtractCodes("Patient presents with E11.9 diabetes...", {
    "tb": type_builder
})
# => [MedCodes.E11_9, MedCodes.I10, ...]

AnalyzeFeedback

Takes structured doctor feedback and returns a list of actions:

feedback = FeedbackInput(
    items=[
        FeedbackItem(code=codes[0], feedback="Update to E11.65 — peripheral angiopathy"),
        FeedbackItem(code=codes[1], feedback=None),  # no change needed
    ],
    other="Also add Z79.4 — long-term insulin"
)

actions = b.AnalyzeFeedback(feedback, {"tb": type_builder})
for action in actions:
    if action.type == "update_code":
        print(f"Update {action.code}{action.new_code}: {action.reason}")
    elif action.type == "remove_code":
        print(f"Remove {action.code}: {action.reason}")
    elif action.type == "add_code":
        print(f"Add {action.code}: {action.reason}")

Testing

Tests are defined directly in BAML and can be run in the BAML VSCode Playground. Each test includes a type_builder block that populates the dynamic MedCodes enum with realistic ICD-10 codes plus distractors.

Test Function What it covers
extract_icd_codes ExtractCodes Standard ICD-10 extraction from a clinical note
extract_mixed_codes ExtractCodes Mixed ICD-10 diagnosis + CPT procedure codes
analyze_feedback_update_code AnalyzeFeedback Doctor requests a code update
analyze_feedback_remove_and_add AnalyzeFeedback Remove one code + add a new one via free-text
analyze_feedback_multiple_updates AnalyzeFeedback Batch: two updates + one removal

Project Structure

├── main.py                      # Entry point — end-to-end demo
├── baml_src/
│   ├── update_codes.baml        # Core functions, types, and tests
│   ├── resume.baml              # Example: resume extraction
│   ├── clients.baml             # LLM client configurations
│   └── generators.baml          # BAML code generation settings
├── baml_client/                 # Auto-generated Python client (do not edit)
├── pyproject.toml               # Python project config (uv/pip)
└── uv.lock                      # Locked dependencies

Resources

Videos

Episode Topic Video
#1 Large Scale Classification YouTube
#24 Evals for Classification YouTube

Episode Pages

Code

Documentation

Built with BAML and OpenAI.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages