MedLabelIQ

MedLabelIQ is a production-oriented, evidence-grounded medication question-answering system that combines:

official DailyMed SPL drug-label evidence,
RxNorm medication identity reasoning,
source-aware orchestration,
and mixed-source query decomposition + grounded synthesis.

The system answers medication questions only when its selected knowledge source directly supports the response. Otherwise, it returns a deterministic insufficient_evidence result.

MedLabelIQ was designed as a modern replacement for an earlier web-scraped medical QA prototype, addressing its core limitations:

weak knowledge-source organization,
unstructured document handling,
shallow retrieval,
unreliable chatbot responses,
limited grounding,
and lack of production observability.

Full Technical Documentation

For a detailed project walkthrough, UI demonstrations, API examples, evaluation proof, and observability analysis, see:

MedLabelIQ Comprehensive Documentation

Key Capabilities

1. Multi-Knowledge-Source QA

MedLabelIQ uses two distinct knowledge channels:

Source	Purpose
DailyMed SPL labels	Clinical label-grounded QA: indications, warnings, interactions, adverse reactions, dosage, etc.
RxNorm	Medication identity reasoning: brand/generic equivalence, active ingredients, brand-name lookup, identity definitions.

Examples:

Is Eliquis the same as apixaban?
→ RxNorm identity route

Can apixaban be taken with aspirin?
→ DailyMed clinical-label route

Is Eliquis the same as apixaban and can it prevent stroke?
→ Mixed-source composed route

2. Source-Aware Orchestration

Before answering, the system performs:

drug mention detection,
optional RxNorm-based drug normalization,
retrieval-family planning,
source-route planning,
execution of one of three answer branches:

RxNorm identity branch
DailyMed label branch
Mixed-source composition branch

The router can select:

rxnorm_identity
dailymed_label
multi_source_composed

3. Mixed-Source Query Decomposition and Synthesis

For compound questions that combine identity and clinical intent, MedLabelIQ decomposes the original query into branch-specific subqueries.

Example:

Original:
Is Eliquis the same as apixaban and can it prevent stroke?

Identity subquery:
Is Eliquis the same as apixaban?

Clinical subquery:
Can apixaban prevent stroke?

The system then:

answers the identity branch using RxNorm,
answers the clinical branch using DailyMed label evidence,
synthesizes one grounded final answer,
preserves both:
- R* citations for RxNorm identity support,
- E* citations for DailyMed label evidence.

Example composed answer:

Yes. RxNorm maps Eliquis and apixaban to the same ingredient concept: apixaban.
Yes. Apixaban is indicated to reduce the risk of stroke and systemic embolism
in patients with nonvalvular atrial fibrillation.

4. Official Structured Medication Knowledge

The DailyMed pipeline ingests SPL XML labels rather than loosely scraped web pages.

It preserves:

label metadata,
SET IDs,
label versions,
product and ingredient records,
section hierarchy,
section codes,
retrieval-family mappings,
and evidence provenance.

5. Section-Aware Knowledge Engineering

MedLabelIQ parses nested label sections and maps them to canonical retrieval families such as:

warnings_and_precautions
boxed_warning
indications_and_usage
adverse_reactions
drug_interactions
dosage_and_administration
contraindications
clinical_studies
medication_guide

This allows retrieval to respect the clinical structure of the label rather than treating labels as flat documents.

6. Hybrid Retrieval

The DailyMed QA branch uses:

PostgreSQL lexical retrieval,
Qdrant dense vector retrieval,
hybrid Reciprocal Rank Fusion,
drug-concept filtering,
retrieval-family filtering,
compact evidence-pack selection.

This improves relevance while reducing redundant prompt context.

7. Grounded Answer Generation

The DailyMed answer branch uses a Groq-hosted LLM with:

strict grounded answer schema,
explicit evidence citations,
evidence summaries,
verifier integration,
deterministic safety-note insertion.

Clinical answers cite evidence IDs such as:

E1, E2, E3

RxNorm identity answers cite:

R1, R2

Mixed-source composed answers may cite both:

R1, R2, E1

8. Abstention and Safety Controls

MedLabelIQ is intentionally conservative.

When evidence is not sufficient, it returns:

{
  "status": "insufficient_evidence",
  "answer": "The retrieved drug-label evidence is not sufficient to answer this question reliably.",
  "citations": [],
  "evidence_summary": "No retrieved evidence directly established the requested claim."
}

The system includes:

deterministic insufficient-evidence fallbacks,
post-generation verifier support,
guardrails for unsupported high-certainty claims,
guardrails for unsupported negative treatment-use claims,
conservative behavior when branch-specific support is incomplete.

9. Production-Oriented Application Layer

MedLabelIQ includes:

FastAPI backend,
Streamlit front end,
PostgreSQL persistence,
Qdrant vector store,
Dockerized local stack,
observability request logs,
source-aware analytics exports,
evaluation harnesses,
pytest suite,
GitHub Actions CI.

Why This Project Matters

Medication QA is high-stakes. A system that retrieves topically related text but fabricates unsupported conclusions is not trustworthy.

MedLabelIQ is built around a stricter principle:

Only answer when the selected knowledge source directly supports the response. Otherwise, abstain.

This project demonstrates practical engineering across:

domain-grounded RAG,
structured medical data ingestion,
biomedical entity normalization,
multi-source orchestration,
query decomposition,
hybrid search,
LLM grounding,
deterministic safety controls,
API design,
observability,
evaluation,
and containerized deployment.

System Architecture

flowchart TD
    A[User Question] --> B[FastAPI /qa/answer]

    B --> C[Drug Mention Detection]
    C --> D[RxNorm Drug Normalization]
    D --> E[Retrieval-Family Planner]
    E --> F[Source Router]

    F -->|Identity query| G[RxNorm Identity Branch]
    F -->|Clinical label query| H[DailyMed Label QA Branch]
    F -->|Mixed identity + clinical query| I[Mixed-Source Composition Branch]

    subgraph DailyMed Knowledge Pipeline
        J[DailyMed SPL APIs] --> K[Label Discovery and History Fetch]
        K --> L[SPL XML Download]
        L --> M[Structured XML Parser]
        M --> N[Canonical Section Mapping]
        N --> O[PostgreSQL Metadata Store]
        N --> P[Section-Aware Chunk Builder]
        P --> Q[PostgreSQL Lexical Index]
        P --> R[Qdrant Dense Vector Index]
    end

    H --> S[Hybrid Retrieval]
    Q --> S
    R --> S
    S --> T[Compact Evidence Pack]
    T --> U[Groq Grounded Answer Generator]
    U --> V[Verifier and Deterministic Guardrails]

    G --> W[Structured RxNorm Identity Answer]
    I --> X[Identity Subquery]
    I --> Y[Clinical Subquery]
    X --> G
    Y --> H
    W --> Z[Mixed / Final Answer Synthesis]
    V --> Z

    Z --> AA[Final Answer or Abstention]
    AA --> AB[Streamlit UI]
    AA --> AC[QA Request Logs]
    T --> AD[DailyMed Evidence Logs]
    AC --> AE[Source-Aware Analytics]
    AD --> AE

End-to-End Workflow

A. RxNorm Identity Workflow

Used for identity-style questions such as:

Is Eliquis the same as apixaban?
What is the generic name of Glucophage?
What is the active ingredient in Eliquis?
Is Glucophage a brand name?

Flow:

Query
→ Identity intent detection
→ RxNorm term resolution
→ Ingredient / brand concept traversal
→ Deterministic structured answer
→ R-citations

B. DailyMed Label QA Workflow

Used for clinical label questions such as:

What is omeprazole used for?
Can apixaban be taken with aspirin?
Can metformin cause lactic acidosis?
Does apixaban treat bacterial infections?

Flow:

Query
→ Drug detection / normalization
→ Retrieval-family planning
→ Hybrid label retrieval
→ Compact evidence pack
→ Grounded answer generation
→ Verification + guardrails
→ E-citations or abstention

C. Mixed-Source Composition Workflow

Used for compound questions such as:

Is Glucophage the same as metformin and what is it used for?
Is Eliquis the same as apixaban and can it prevent stroke?
Is Glucophage a brand name and what is it used for?

Flow:

Original query
→ Mixed-source route detection
→ Identity subquery decomposition
→ Clinical subquery decomposition
→ RxNorm identity execution
→ DailyMed clinical execution
→ Evidence-aware synthesis
→ R-citations + E-citations

DailyMed Corpus and Chunking

The ingestion pipeline builds a reproducible smoke corpus of 12 representative medication concepts:

acetaminophen
ibuprofen
metformin
lisinopril
atorvastatin
amoxicillin
sertraline
albuterol
omeprazole
apixaban
isotretinoin
methotrexate

The pipeline:

discovers label metadata,
retrieves label version history,
downloads SPL XML packages,
stores manifests and checksums,
validates artifact consistency,
parses hierarchical SPL sections,
chunks retrievable clinical text,
indexes chunks for lexical and dense retrieval.

Current Section-Aware Chunking Output

Metric	Value
Drugs in smoke corpus	12
Retrievable sections processed	520
Chunks created	867
Maximum words per chunk	220
Chunk overlap	40 words

Retrieval Example

uv run python -m medlabeliq.retrieval.search_cli `
  --query "acid-mediated GERD" `
  --drug omeprazole `
  --family indications_and_usage `
  --limit 5

Evaluation Results

1. Lexical Retrieval Evaluation

Exact terminology smoke set

Metric	Score
Cases	12
Hit@1	1.000
Hit@5	1.000
MRR	1.000

Paraphrase stress test

Metric	Score
Cases	12
Hit@1	0.333
Hit@5	0.333
MRR	0.333

This gap motivated the addition of dense retrieval and hybrid Reciprocal Rank Fusion.

2. Grounded DailyMed QA Evaluation

QA smoke set

Metric	Score
Overall pass	12/12
Status accuracy	12/12
Answered-case pass	8/8
Abstention-case pass	4/4
Citation-policy pass	12/12
Cited-heading pass	12/12
Safety-note pass	12/12

QA challenge set

Metric	Score
Overall pass	16/16
Status accuracy	16/16
Answered-case pass	10/10
Abstention-case pass	6/6
Citation-policy pass	16/16
Cited-heading pass	16/16
Safety-note pass	16/16

The challenge set includes:

paraphrased answerable questions,
negative unsupported treatment claims,
unsupported claims requiring abstention,
guarantee-style overgeneralization traps,
medically sensitive warning and contraindication questions.

3. Multi-Source Orchestration Evaluation

Multi-source smoke benchmark

Metric	Score
Cases	11
Overall pass	11/11
Status accuracy	11/11
Source-route accuracy	11/11
Source-route-status accuracy	11/11
Family-plan-status accuracy	11/11
Retrieval-family accuracy	3/3
Citation-policy pass	11/11
Citation-reference pass	11/11
Safety-note pass	11/11

Multi-source challenge benchmark

Metric	Score
Cases	19
Overall pass	19/19
Status accuracy	19/19
Source-route accuracy	19/19
Source-route-status accuracy	19/19
Family-plan-status accuracy	19/19
Retrieval-family accuracy	9/9
Citation-policy pass	19/19
Citation-reference pass	19/19
Safety-note pass	19/19

The challenge benchmark covers:

supported RxNorm identity queries,
unsupported identity queries requiring abstention,
brand-name clinical questions,
interaction and indication routing,
ambiguous clinical queries,
mixed-source identity + clinical questions,
composed answers requiring both R* and E* citations.

Grounding and Safety Design

Deterministic Insufficient-Evidence Response

When support is insufficient, MedLabelIQ returns:

{
  "status": "insufficient_evidence",
  "answer": "The retrieved drug-label evidence is not sufficient to answer this question reliably.",
  "citations": [],
  "evidence_summary": "No retrieved evidence directly established the requested claim."
}

Guardrail 1: Guarantee-Style Claim Suppression

Example:

Does metformin guarantee weight loss?

The system abstains unless retrieved label evidence explicitly supports guarantee-level certainty.

Guardrail 2: Unsupported Negative Treatment Claim Suppression

Example:

Does apixaban treat bacterial infections?

The system does not infer a negative claim merely because the retrieved label lists other uses. If the target claim is not explicitly established, it abstains.

Mixed-Source Composition Safety

For a mixed query, both branches must produce sufficient support:

Identity branch must be supported
+
Clinical label branch must be supported

Otherwise, the system returns insufficient_evidence rather than composing a partial answer.

API

The FastAPI backend exposes:

Method	Endpoint	Purpose
`GET`	`/`	Service overview
`GET`	`/health`	PostgreSQL, Qdrant, and LLM health
`GET`	`/drugs`	Indexed drug concept summaries
`GET`	`/families`	Retrieval-family summaries
`GET`	`/corpus/stats`	Corpus build and indexing statistics
`GET`	`/rxnorm/version`	RxNorm API version metadata
`POST`	`/normalize/drug`	Normalize a brand, generic, or noisy medication mention
`POST`	`/qa/answer`	Grounded medication QA
`POST`	`/retrieval/debug`	Retrieval-only evidence inspection

Example 1: DailyMed Clinical QA

$body = @{
    query = "Can metformin cause dangerous acid buildup in the blood?"
    drug = "metformin"
    family = "warnings_and_precautions"
    include_evidence = $true
    include_diagnostics = $true
} | ConvertTo-Json

Invoke-RestMethod `
    -Method Post `
    -Uri "http://127.0.0.1:8011/qa/answer" `
    -ContentType "application/json" `
    -Body $body |
    ConvertTo-Json -Depth 40

Example 2: RxNorm Identity QA

$body = @{
    query = "Is Eliquis the same as apixaban?"
    include_evidence = $true
    include_diagnostics = $true
} | ConvertTo-Json

Invoke-RestMethod `
    -Method Post `
    -Uri "http://127.0.0.1:8011/qa/answer" `
    -ContentType "application/json" `
    -Body $body |
    ConvertTo-Json -Depth 80

Expected high-level behavior:

planned_source = rxnorm_identity
result.status = answered
citations = R1, R2

Example 3: Mixed-Source Composed QA

$body = @{
    query = "Is Eliquis the same as apixaban and can it prevent stroke?"
    include_evidence = $true
    include_diagnostics = $true
} | ConvertTo-Json

Invoke-RestMethod `
    -Method Post `
    -Uri "http://127.0.0.1:8011/qa/answer" `
    -ContentType "application/json" `
    -Body $body |
    ConvertTo-Json -Depth 100

Expected high-level behavior:

planned_source = multi_source_composed
result.status = answered
citations = R1, R2, E1
identity_evidence = present
evidence = present
mixed_source_composition.status = composed_answered

Streamlit UI

The Streamlit front end includes:

backend health panel,
corpus snapshot,
drug and retrieval-family filters,
six example prompts,
grounded answer display,
status pills,
source-route badges,
citation chips,
citation legend:
- E* = DailyMed label evidence,
- R* = RxNorm identity evidence,
DailyMed evidence expanders,
RxNorm identity evidence expanders,
routing and source-plan expander,
mixed-source decomposition panel,
verifier and guardrail diagnostics,
raw diagnostics JSON,
retrieval-debug tab,
recent query history.

Local UI:

http://127.0.0.1:8501

Observability

Every QA request can be logged to PostgreSQL using:

qa_request_log
qa_evidence_log

Logged request fields include:

query text,
requested and resolved drug,
drug-resolution status,
detected drug mention,
drug-mention detection status,
requested and planned retrieval family,
family-plan status and intent,
planned source,
source-plan status and intent,
mixed-source composition status,
final answer status,
citations,
evidence summary,
safety note,
proposed answer status,
verifier verdict and rationale,
guardrail state,
DailyMed evidence count,
RxNorm identity evidence count,
API latency,
timestamp.

Analytics Generation

uv run python -m medlabeliq.observability.generate_qa_analytics

Outputs:

data/interim/qa_analytics/
outputs/qa_analytics/

Generated analytics include:

final answer status counts,
latency summary statistics,
intervention counts,
verifier verdict distribution,
requests by planned source,
source-plan status distribution,
family-plan status distribution,
mixed-source composition status distribution,
final answer status by source type,
latency by planned source,
identity-evidence count distribution,
total support-evidence count distribution,
evidence-family usage,
cited evidence-family usage,
daily request volume,
CSV exports,
PNG plots.

Dockerized Deployment

Launch the full stack:

docker compose up --build -d

Services:

Service	Port
PostgreSQL	`55432`
Qdrant	`6333`
FastAPI backend	`8011`
Streamlit UI	`8501`

After startup:

API:  http://127.0.0.1:8011
Docs: http://127.0.0.1:8011/docs
UI:   http://127.0.0.1:8501

Check health:

Invoke-RestMethod `
    -Method Get `
    -Uri "http://127.0.0.1:8011/health" |
    ConvertTo-Json -Depth 10

Local Development Setup

1. Clone the repository

git clone <YOUR_REPOSITORY_URL>
cd MedLabelIQ

2. Create environment variables

Copy-Item .env.example .env

Fill in:

LLM_API_KEY=<your-groq-api-key>

3. Install dependencies

uv sync

4. Start infrastructure

docker compose up -d postgres qdrant

5. Initialize / refresh observability schema

uv run python -m medlabeliq.db.create_observability_schema

6. Start the FastAPI backend

uv run uvicorn medlabeliq.api.app:app --host 127.0.0.1 --port 8011 --reload

7. Start the Streamlit UI

uv run streamlit run src\medlabeliq\ui\streamlit_app.py --server.port 8501

Core Validation Commands

Validate structured ingestion

uv run python -m medlabeliq.validation.validate_step3_artifacts

Parse the smoke-set labels

uv run python -m medlabeliq.parsing.parse_smoke_set

Validate section hierarchy

uv run python -m medlabeliq.validation.validate_section_hierarchy

Build section-aware chunks

uv run python -m medlabeliq.chunking.build_section_chunks

Validate chunking

uv run python -m medlabeliq.chunking.validate_section_chunks

Evaluate lexical retrieval

uv run python -m medlabeliq.evaluation.evaluate_lexical_retrieval

Evaluate grounded DailyMed QA smoke set

uv run python -m medlabeliq.evaluation.evaluate_grounded_qa

Evaluate grounded DailyMed QA challenge set

uv run python -m medlabeliq.evaluation.evaluate_grounded_qa `
  --eval-set data\evaluation\qa_generation_eval_challenge.yaml `
  --output data\interim\grounded_qa_eval_challenge_results.csv

Evaluate multi-source orchestration smoke set

uv run python -m medlabeliq.evaluation.evaluate_multisource_orchestration

Evaluate multi-source orchestration challenge set

uv run python -m medlabeliq.evaluation.evaluate_multisource_orchestration `
  --eval-set data\evaluation\multisource_orchestration_eval_challenge.yaml `
  --output data\interim\multisource_orchestration_eval_challenge_results.csv

Generate QA observability analytics

uv run python -m medlabeliq.observability.generate_qa_analytics

Tests and CI

Run tests locally:

uv run pytest

Current local test result:

64 passed

The repository includes GitHub Actions CI to automatically run tests on pushes and pull requests.

Project Structure

MedLabelIQ/
├── .github/
│   └── workflows/
│       └── ci.yml
├── data/
│   ├── evaluation/
│   ├── interim/
│   └── raw/
├── outputs/
├── src/
│   └── medlabeliq/
│       ├── api/
│       ├── chunking/
│       ├── config/
│       ├── db/
│       ├── evaluation/
│       ├── generation/
│       ├── ingestion/
│       ├── observability/
│       ├── orchestration/
│       ├── parsing/
│       ├── qdrant_store/
│       ├── retrieval/
│       ├── rxnorm/
│       ├── ui/
│       └── validation/
├── tests/
├── Dockerfile
├── docker-compose.yml
├── pyproject.toml
├── uv.lock
└── README.md

Technology Stack

Language: Python 3.12
Dependency management: uv
API: FastAPI
UI: Streamlit
Structured database: PostgreSQL
Vector database: Qdrant
Medication identity source: RxNorm
Medication label source: DailyMed SPL
LLM provider: Groq
Embeddings: sentence-transformer-based dense retrieval
Containerization: Docker, Docker Compose
Testing: pytest
CI: GitHub Actions

Limitations

The current DailyMed corpus is a curated 12-drug smoke set, not the full DailyMed universe.
RxNorm identity routing is deterministic and scoped to identity-style questions currently supported by the orchestration logic.
Mixed-source composition supports intentionally structured identity + clinical conjunction patterns rather than arbitrary multi-hop natural language decomposition.
The system summarizes official label evidence; it is not a diagnosis, prescribing, or clinical decision tool.
Evaluation sets are project benchmarks rather than large-scale clinician-authored gold standards.
Guardrails target observed failure modes and can be expanded further.

Future Work

Scale the DailyMed corpus beyond the 12-drug smoke set.
Add larger clinician-reviewed benchmark suites.
Broaden mixed-source decomposition patterns.
Add support for more complex multi-branch query plans.
Extend operational dashboards beyond CSV/PNG analytics outputs.
Introduce authentication, rate limiting, and deployment hardening.
Add continuous ingestion for updated DailyMed label versions.
Explore retrieval reranking and evidence sufficiency scoring improvements.

Disclaimer

MedLabelIQ is an educational and research-oriented medication question-answering system.

It summarizes retrieved medication identity relationships and official drug-label evidence and is not a substitute for medical advice from a qualified clinician or pharmacist.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.github/workflows		.github/workflows
data		data
docs		docs
src/medlabeliq		src/medlabeliq
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

MedLabelIQ

Full Technical Documentation

Key Capabilities

1. Multi-Knowledge-Source QA

2. Source-Aware Orchestration

3. Mixed-Source Query Decomposition and Synthesis

4. Official Structured Medication Knowledge

5. Section-Aware Knowledge Engineering

6. Hybrid Retrieval

7. Grounded Answer Generation

8. Abstention and Safety Controls

9. Production-Oriented Application Layer

Why This Project Matters

System Architecture

End-to-End Workflow

A. RxNorm Identity Workflow

B. DailyMed Label QA Workflow

C. Mixed-Source Composition Workflow

DailyMed Corpus and Chunking

Current Section-Aware Chunking Output

Retrieval Example

Evaluation Results

1. Lexical Retrieval Evaluation

Exact terminology smoke set

Paraphrase stress test

2. Grounded DailyMed QA Evaluation

QA smoke set

QA challenge set

3. Multi-Source Orchestration Evaluation

Multi-source smoke benchmark

Multi-source challenge benchmark

Grounding and Safety Design

Deterministic Insufficient-Evidence Response

Guardrail 1: Guarantee-Style Claim Suppression

Guardrail 2: Unsupported Negative Treatment Claim Suppression

Mixed-Source Composition Safety

API

Example 1: DailyMed Clinical QA

Example 2: RxNorm Identity QA

Example 3: Mixed-Source Composed QA

Streamlit UI

Observability

Logged request fields include:

Analytics Generation

Generated analytics include:

Dockerized Deployment

Local Development Setup

1. Clone the repository

2. Create environment variables

3. Install dependencies

4. Start infrastructure

5. Initialize / refresh observability schema

6. Start the FastAPI backend

7. Start the Streamlit UI

Core Validation Commands

Validate structured ingestion

Parse the smoke-set labels

Validate section hierarchy

Build section-aware chunks

Validate chunking

Evaluate lexical retrieval

Evaluate grounded DailyMed QA smoke set

Evaluate grounded DailyMed QA challenge set

Evaluate multi-source orchestration smoke set

Evaluate multi-source orchestration challenge set

Generate QA observability analytics

Tests and CI

Project Structure

Technology Stack

Limitations

Future Work

Disclaimer

About

Topics

Resources

Uh oh!

Packages