Skip to content

nikhilreddy00/MedSignal-API

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MedSignal API

Real-Time Drug Safety Intelligence · Powered by Hybrid RAG + LLM

Python 3.10+ FastAPI FAISS Groq · Llama 3.3-70B License: MIT


What this project does in one sentence:
Ask a question about a drug's safety record in plain English → get back a structured, AI-written medical safety report in under 6 seconds, sourced from live FDA data, peer-reviewed research, and a pre-built biomedical knowledge base.


⚠️ Supported Drugs — Read Before Testing

This is a Proof of Concept (PoC). The knowledge base and validation layer are intentionally scoped to two drugs for this release.

The API currently accepts exactly two drug names:

Generic Name Common Brand Names Drug Class
semaglutide Ozempic, Wegovy, Rybelsus GLP-1 receptor agonist (diabetes / weight loss)
metformin Glucophage, Glumetza, Fortamet Biguanide antidiabetic

If you send any other drug name — including brand names like ozempic — the API will immediately return a 422 Unprocessable Entity error with a clear message explaining which names are valid. This is intentional: the guardrail fires at the input layer before any external API calls are made, so no Groq tokens or FDA rate-limit credits are consumed on invalid requests.

Example error response for an unsupported drug:

{
  "detail": [
    {
      "type": "value_error",
      "msg": "Drug 'ibuprofen' is not in the supported list. Currently supported drugs: semaglutide, metformin. Brand names (e.g. 'Ozempic') are not accepted — use the generic name."
    }
  ]
}

Table of Contents

  1. The Problem — Why This Exists
  2. The Solution — How MedSignal Works
  3. Safety Guardrails — What Gets Blocked and Why
  4. System Architecture
  5. What We Built — Step by Step
  6. Test Results — Proof It Works
  7. Technical Stack — Tools and Why We Chose Them
  8. Getting Started
  9. API Reference
  10. Deployment
  11. Future Scope

1. The Problem — Why This Exists

Every drug on the market must be continuously monitored for unexpected side effects after it's approved. This practice is called pharmacovigilance — literally, "vigilance over drugs."

When a safety analyst suspects that a drug is causing a new side effect, their job is to investigate it. That investigation requires three separate tasks, done manually, on three different systems:

  1. Search the FDA database (called FAERS) for raw adverse event reports — how many people reported a problem, what symptoms they reported, how serious the outcomes were, and who was most affected.

  2. Search PubMed (the world's largest medical research database) for peer-reviewed papers that might explain or contextualize those numbers.

  3. Write a structured assessment that combines the statistics with the medical literature into a coherent, decision-ready report.

This process takes a trained analyst 4 to 8 hours per drug query. The data is siloed across incompatible systems, the raw numbers have no meaning without medical context, and there is no tool that bridges all three into one workflow.


2. The Solution — How MedSignal Works

MedSignal replaces that 4–8 hour manual workflow with a single API call.

You send one request with a drug name and a plain-English question. In the background, the API simultaneously queries three data sources, merges everything into a unified context, and passes it to an AI model that writes a structured safety assessment — complete with citations.

What you send:

{
  "drug_name": "semaglutide",
  "query": "What cardiac adverse events have been reported in patients over 65?",
  "age_group": "65+"
}

What you get back (in ~4 seconds):

  • A structured safety assessment with clearly labelled sections
  • The top 10 reported adverse reactions and their counts from the FDA
  • Outcome severity breakdown (serious vs. non-serious reports)
  • Sex demographic breakdown of reporters
  • The PubMed papers used as evidence, with titles and PMIDs
  • A confidence score (0–1) reflecting data quality
  • Formatted citations for every source used

3. Safety Guardrails — What Gets Blocked and Why

Guardrails are the protective rules built into the system that prevent bad inputs from producing bad outputs. MedSignal has guardrails at three layers.

Layer 1 — Input Validation (fires before any API call)

These checks run the moment a request arrives and reject invalid inputs immediately, in under 5ms, before touching any external service or spending any LLM tokens:

Guardrail What It Blocks HTTP Response
Drug whitelist Any drug not in ["semaglutide", "metformin"], including brand names like ozempic 422
Empty drug name Whitespace-only or blank drug_name fields 422
Empty query Whitespace-only or blank query fields 422
Query minimum length Any query under 10 characters 422
Query maximum length Any query longer than 500 characters 422
Math expression check Queries containing arithmetic (e.g. "what is 2 + 2 with semaglutide") 422
Character ratio check Queries where fewer than 50% of characters are alphabetic (gibberish, symbol-heavy) 422
Age group whitelist Any age_group not in {"pediatric", "18-64", "65+"} — e.g. "200+" 422
Date format check Any date_range not matching YYYYMMDD+TO+YYYYMMDD 422
Date epoch check Any date_range starting before 2004-01-01 (FAERS database launch date) 422
Future date check Any date_range with a start or end date in the future (e.g. year 2042) 422
Date chronology check Any date_range where the start date is after the end date 422
Report type whitelist Any report_type not in {comprehensive, cardiac, hepatic, renal} 422

Layer 2 — Semantic Query Classifier (fires before retrieval)

Even after passing structural validation, every query goes through a dedicated LLM classification call before any data retrieval begins. This is a separate, isolated call from the main synthesis — it uses temperature=0 and max_tokens=10 to produce a deterministic VALID or INVALID verdict in ~0.5 seconds.

The classifier rejects queries that are structurally valid but semantically unrelated to pharmacovigilance — for example:

Query Verdict Why
"Can six lead to cardiac arrest" INVALID "six" is not a medical or pharmacological concept
"Can you order chipotle for me" INVALID Not related to drug safety
"What is the capital of France with semaglutide" INVALID Drug name present but question is unrelated
"What cardiac events have been reported in patients over 65?" VALID Direct pharmacovigilance question

If the classifier API call itself fails (network error), the request is allowed through rather than blocking legitimate users — the synthesis layer still has its own grounding constraints.

Layer 3 — LLM Grounding (fires during AI synthesis)

Only queries that pass both Layer 1 and Layer 2 reach the synthesis model. The system prompt hard-constrains the AI's behaviour at this stage:

  • The model is instructed to only state facts supported by the provided context. If a symptom has zero FDA reports, it must say so explicitly — it cannot speculate.
  • The model must cite specific data points (exact report counts, PMID numbers) rather than making general claims.
  • Every response must end with a parseable CONFIDENCE_SCORE: between 0.0 and 1.0, reflecting the completeness of the data.
  • If data is insufficient to answer the question, the model must say so instead of inventing an answer.

The result: when we asked the system whether metformin causes "neon green hair and sudden levitation," the model confirmed zero FDA evidence and cited the actual reported adverse reactions instead of fabricating a response.


4. System Architecture

Here is the full data flow from request to response:

You (or any HTTP client)
        │
        │  POST /api/v1/query
        │  { drug_name, query, [date_range], [age_group] }
        ▼
┌─────────────────────────────────────────────────────────────┐
│                    Input Validation Layer                     │
│   (Pydantic schemas — rejects bad requests in <5ms with     │
│    no external API calls made, preserving rate-limit quota)  │
└──────────────────────────┬──────────────────────────────────┘
                           │  Valid request
                           ▼
┌─────────────────────────────────────────────────────────────┐
│              Parallel Retrieval (asyncio.gather)             │
│                                                              │
│  ┌──────────────────┐  ┌──────────────────┐  ┌───────────┐  │
│  │  openFDA API     │  │  PubMed Live     │  │  FAISS    │  │
│  │  (Live)          │  │  (Live)          │  │  (Static) │  │
│  │                  │  │                  │  │           │  │
│  │  Reaction counts │  │  Up to 5 recent  │  │  ~1,500   │  │
│  │  Outcome stats   │  │  peer-reviewed   │  │  pre-     │  │
│  │  Demographic     │  │  papers, title + │  │  indexed  │  │
│  │  breakdown       │  │  abstract, PMID  │  │  abstracts│  │
│  │  Optional        │  │  MeSH-term       │  │  Cosine   │  │
│  │  date + age      │  │  filtered        │  │  sim ≥    │  │
│  │  filters         │  │                  │  │  0.45     │  │
│  └──────────────────┘  └──────────────────┘  └───────────┘  │
└───────────────────────────────┬─────────────────────────────┘
                                │  All three return simultaneously
                                ▼
┌─────────────────────────────────────────────────────────────┐
│                      Context Merger                          │
│  - Deduplicates papers by PMID (no study appears twice)     │
│  - Truncates long abstracts to preserve LLM context budget  │
│  - Formats as labelled sections for the AI prompt           │
└──────────────────────────┬──────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────┐
│              Groq LLM — Llama 3.3-70B                        │
│  System prompt enforces: evidence-only responses,            │
│  structured output format, confidence score, no invented     │
│  citations, explicit acknowledgement of data gaps           │
└──────────────────────────┬──────────────────────────────────┘
                           │
                           ▼
              Structured JSON response:
              synthesized_assessment · adverse_events
              literature_context · citations
              confidence_score · metadata (latency, sources)

All three retrievals run in parallel — the total query time is bounded by the slowest single source, not the sum of all three. In practice, most queries complete in 4–6 seconds end-to-end.


5. What We Built — Step by Step

Step 1: Offline Ingestion Pipeline (run once before the server starts)

app/ingestion/fetch_pubmed.py
Queries PubMed's API for up to 1,500 research abstracts per drug using relevance-sorted search. Fetches in batches of 50 (to respect URL length limits) with automatic retry and exponential backoff — if the network hiccups, it retries up to 5 times before failing that batch. Rate limiting (0.35–0.5 seconds between requests) ensures the script stays within NCBI's API usage policy.

app/ingestion/build_index.py
Takes those abstracts, embeds each one into a mathematical vector using the S-PubMedBert-MS-MARCO model (a sentence transformer specifically trained on medical literature), and stores all vectors in a FAISS index on disk. At API startup, the entire index is loaded into memory for sub-millisecond search queries.

Step 2: The Live API

  • Three retrieval modules (openfda.py, pubmed.py, vector_store.py) each run independently and return results simultaneously via Python's asyncio.gather().
  • Context merger (context_merger.py) deduplicates by PubMed ID across all sources, so the AI never sees the same study twice from different channels.
  • LLM service (llm.py) passes the merged context to Groq under a strict system prompt, then parses the confidence score out of the response before returning clean text.
  • Input schemas (schemas.py) enforce all guardrails before any downstream service is touched.
  • Request logging middleware (middleware/logging.py) attaches a UUID to every request and logs structured JSON with method, path, status code, and latency in milliseconds.

Step 3: Adversarial Test Suite

Before deployment, 14 adversarial test cases were written to deliberately try to break the system — fake drugs, impossible date ranges, hallucination bait, SQL injection, prompt injection, and nonsense queries — to verify the guardrails held under adversarial conditions.


6. Test Results — Proof It Works

The full output of all 14 test cases is in output/test_results.txt. Summary:

Input Validation — Guardrails Fired Correctly

Test Input Expected Result
Unsupported drug "ibuprofen" 422 reject 422 in 0.01s
Made-up drug "supercalifragilisticexpialidocious_mab" 422 reject 422 in 0.00s
Brand name "ozempic" (brand for semaglutide) 422 reject 422 in 0.00s
Future start date date_range: "20420101+TO+20241231" 422 reject 422 in 0.01s

All four input errors were caught in under 11 milliseconds, before a single external API call was made.

LLM Grounding — Hallucination Resistance

Test Question Asked What the LLM Did
Hallucination bait "Does metformin cause neon green hair and sudden levitation?" Confirmed zero FDA evidence; cited real top reactions (nausea: 29,156 reports)
Paradoxical claim "Does this weight-loss drug cause uncontrollable weight gain?" Cited 360 FDA "weight decreased" reports to refute the premise
Off-topic query "Can metformin help me win at chess?" Returned factual safety profile; explicitly stated no evidence for cognitive enhancement
SQL injection '; DROP TABLE users; -- in query field Processed safely as plain text; no code executed, coherent medical response returned
Gibberish query asdfghjkl qwerty uiop Returned a valid safety profile for the drug; ignored the unintelligible query

Performance (valid queries)

Metric Value
Typical end-to-end latency 3.7 – 5.9 seconds
Sources engaged per query 3 (openFDA + PubMed live + FAISS)
Input rejection latency < 11 ms (no external calls made)
LLM confidence score range 0.8 – 0.85 across clean queries

7. Technical Stack — Tools and Why We Chose Them

Tool What It Does in This Project Why This Tool Specifically
FastAPI Handles incoming HTTP requests; routes them to the right handler; auto-generates the Swagger docs at /docs Natively async — critical because the parallel fan-out to 3 APIs only works if the server doesn't block while waiting for each one. Flask would require thread-pool hacks to achieve the same thing.
httpx Makes the async HTTP calls to openFDA and PubMed Unlike the standard requests library (which is synchronous), httpx runs inside Python's async event loop. This is what makes parallel retrieval possible without threads.
FAISS (by Meta AI) Stores the pre-indexed biomedical abstracts as searchable vectors; returns the top-k most semantically similar documents to any query Designed specifically for high-performance similarity search at scale. A query across 1,500 embedded documents returns in microseconds. No separate server or cloud service needed — runs in-process.
S-PubMedBert-MS-MARCO Converts text (abstracts, queries) into mathematical vectors for FAISS Pre-trained on PubMed biomedical literature, so it understands medical vocabulary. A general-purpose model (like all-MiniLM) would treat "medullary thyroid carcinoma" as rare unknown tokens; this model understands it.
Groq API (Llama 3.3-70B) Reads the merged context from all three sources and writes the structured safety assessment Groq runs Llama 3.3-70B on custom silicon (LPUs) that is 10–20x faster than GPU-based APIs. The full assessment — from a 70-billion-parameter model — arrives in ~1–3 seconds, keeping total query latency under 6 seconds.
Pydantic v2 Validates every field of every incoming request against strict rules before any code runs Acts as the first line of defence. Invalid drug names, malformed dates, and oversized queries are rejected at the schema layer — no LLM tokens spent, no FDA rate-limit credits consumed.
Tenacity Retries failed PubMed batch fetches with exponential backoff during the ingestion pipeline The ingestion script fetches hundreds of batches over several minutes. Without retry logic, a single network hiccup aborts the entire pipeline. Tenacity retries up to 5 times with increasing delays before giving up on a batch.
Docker Packages the entire application — Python version, model weights, FAISS index, and all dependencies — into one portable container The FAISS index and sentence transformer model together are ~500MB. Docker ensures this state is reproducible across any machine and deployable to any container platform with one command.

8. Getting Started

What You Need First

  • Python 3.10 or later
  • A free Groq API key — takes 2 minutes to create
  • (Optional but recommended) Free API keys for openFDA and NCBI/PubMed — the API works without them but at lower rate limits

Install and Run

# 1. Clone the repo
git clone https://github.com/nikhilreddy00/MedSignal-API.git
cd MedSignal-API

# 2. Create and activate a virtual environment
python -m venv venv
source venv/bin/activate          # On Windows: venv\Scripts\activate

# 3. Install all dependencies
pip install -r requirements.txt

# 4. Create a .env file with your credentials
cat > .env << EOF
GROQ_API_KEY=gsk_your_groq_key_here
OPENFDA_API_KEY=                   # optional
NCBI_API_KEY=                      # optional
LOG_LEVEL=INFO
EOF

Build the Knowledge Base (one-time setup, ~5–15 minutes)

The FAISS knowledge base is built from PubMed abstracts and is NOT included in the repo (the raw files are excluded by .gitignore). You must run these two steps before the API can use the static retrieval source:

# Step 1: Download PubMed abstracts for semaglutide and metformin
# Downloads ~1,500 abstracts per drug. Takes 2–5 minutes depending on your API key tier.
python -m app.ingestion.fetch_pubmed

# Step 2: Embed and index into FAISS
# Downloads the S-PubMedBert model (~440MB) on first run, then embeds all abstracts.
# Takes 3–10 minutes depending on your CPU.
python -m app.ingestion.build_index

Note: If you skip this step, the API will still work — it will just use openFDA and live PubMed only (2 out of 3 sources). The startup log will show a warning: Vector store not loaded.

Start the Server

uvicorn app.main:app --reload --port 8000

Open http://localhost:8000/docs in your browser to see the interactive Swagger UI where you can test all endpoints directly.


9. API Reference

POST /api/v1/query — Ask a Drug Safety Question

The main endpoint. Ask any pharmacovigilance question about a supported drug.

Request body:

{
  "drug_name": "semaglutide",
  "query": "What cardiac adverse events have been reported in patients over 65?",
  "date_range": "20240101+TO+20241231",
  "age_group": "65+"
}
Field Required Description
drug_name Yes Must be semaglutide or metformin (lowercase, generic name only)
query Yes Your question in plain English. Max 500 characters.
date_range No Filter FDA reports to a date range. Format: YYYYMMDD+TO+YYYYMMDD
age_group No Filter FDA reports by patient age. Options: pediatric, 18-64, 65+

POST /api/v1/signal-report — Generate a Formal Safety Report

Generates a full 7-section pharmacovigilance document (Executive Summary, Signal Description, Adverse Event Analysis, Literature Review, Risk Characterization, Recommendations, Data Sources).

Request body:

{
  "drug_name": "metformin",
  "report_type": "comprehensive"
}
Field Required Description
drug_name Yes Must be semaglutide or metformin
report_type No One of: comprehensive (default), cardiac, hepatic, renal

GET /api/v1/health — System Health Check

Returns the live status of all three data sources and the vector store. Useful for verifying your setup before running queries.

{
  "status": "healthy",
  "vector_store_loaded": true,
  "openfda_reachable": true,
  "pubmed_reachable": true,
  "groq_reachable": true,
  "index_document_count": 1487,
  "embedding_model": "pritamdeka/S-PubMedBert-MS-MARCO",
  "llm_model": "llama-3.3-70b-versatile"
}

10. Deployment

The project includes a Dockerfile configured for deployment on Hugging Face Spaces (free tier: 16GB RAM, 2 vCPUs), which comfortably fits the FAISS index and embedding model in memory — unlike standard 512MB free tiers on platforms like Render or Heroku.

Deploy to Hugging Face Spaces (free, public URL):

  1. Create a free account at huggingface.co
  2. Go to your profile → New Space → name it medsignal-api → choose Docker as the SDK → set hardware to Free (CPU Basic)
  3. In Space Settings → Variables and secrets, add:
    • GROQ_API_KEY → your Groq key
  4. Upload all project files, or connect your GitHub repository via the Git integration
  5. Hugging Face builds the Docker image and serves the API on a public URL at no cost

A render.yaml file is also included for Render deployment, though their 512MB free tier may struggle to load the embedding model and FAISS index simultaneously.


11. Future Scope

With additional engineering time, the following would bring this to production pharmacovigilance quality:

  • Expanded drug coverage — The current PoC is limited to two drugs. The architecture is drug-agnostic; adding a new drug requires adding its name to TARGET_DRUGS in config.py and re-running the ingestion pipeline.
  • Brand name normalisation — Map trade names (Ozempic, Wegovy) to their generic equivalents before the whitelist check, so analysts don't need to know the INN name.
  • Automated nightly re-indexing — A CI/CD pipeline that re-fetches the latest PubMed abstracts and rebuilds the FAISS index on a schedule, keeping the static knowledge base within days of current literature.
  • RAGAS evaluation — Automated scoring of context relevance and answer faithfulness against a held-out test set after every index rebuild (target: >0.92 faithfulness score).
  • EHR integration — An endpoint that accepts de-identified patient records and cross-references the existing safety signal data to flag drug-patient interaction risks.

About

Real-time drug safety intelligence API. Hybrid RAG combining live openFDA adverse events, PubMed literature, and a FAISS biomedical knowledge base — synthesized by Llama 3.3-70B via Groq.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors