MedSignal API

Real-Time Drug Safety Intelligence · Powered by Hybrid RAG + LLM

What this project does in one sentence:
Ask a question about a drug's safety record in plain English → get back a structured, AI-written medical safety report in under 6 seconds, sourced from live FDA data, peer-reviewed research, and a pre-built biomedical knowledge base.

⚠️ Supported Drugs — Read Before Testing

This is a Proof of Concept (PoC). The knowledge base and validation layer are intentionally scoped to two drugs for this release.

The API currently accepts exactly two drug names:

Generic Name	Common Brand Names	Drug Class
`semaglutide`	Ozempic, Wegovy, Rybelsus	GLP-1 receptor agonist (diabetes / weight loss)
`metformin`	Glucophage, Glumetza, Fortamet	Biguanide antidiabetic

If you send any other drug name — including brand names like ozempic — the API will immediately return a 422 Unprocessable Entity error with a clear message explaining which names are valid. This is intentional: the guardrail fires at the input layer before any external API calls are made, so no Groq tokens or FDA rate-limit credits are consumed on invalid requests.

Example error response for an unsupported drug:

{
  "detail": [
    {
      "type": "value_error",
      "msg": "Drug 'ibuprofen' is not in the supported list. Currently supported drugs: semaglutide, metformin. Brand names (e.g. 'Ozempic') are not accepted — use the generic name."
    }
  ]
}

1. The Problem — Why This Exists

Every drug on the market must be continuously monitored for unexpected side effects after it's approved. This practice is called pharmacovigilance — literally, "vigilance over drugs."

When a safety analyst suspects that a drug is causing a new side effect, their job is to investigate it. That investigation requires three separate tasks, done manually, on three different systems:

Search the FDA database (called FAERS) for raw adverse event reports — how many people reported a problem, what symptoms they reported, how serious the outcomes were, and who was most affected.
Search PubMed (the world's largest medical research database) for peer-reviewed papers that might explain or contextualize those numbers.
Write a structured assessment that combines the statistics with the medical literature into a coherent, decision-ready report.

This process takes a trained analyst 4 to 8 hours per drug query. The data is siloed across incompatible systems, the raw numbers have no meaning without medical context, and there is no tool that bridges all three into one workflow.

2. The Solution — How MedSignal Works

MedSignal replaces that 4–8 hour manual workflow with a single API call.

You send one request with a drug name and a plain-English question. In the background, the API simultaneously queries three data sources, merges everything into a unified context, and passes it to an AI model that writes a structured safety assessment — complete with citations.

What you send:

{
  "drug_name": "semaglutide",
  "query": "What cardiac adverse events have been reported in patients over 65?",
  "age_group": "65+"
}

What you get back (in ~4 seconds):

A structured safety assessment with clearly labelled sections
The top 10 reported adverse reactions and their counts from the FDA
Outcome severity breakdown (serious vs. non-serious reports)
Sex demographic breakdown of reporters
The PubMed papers used as evidence, with titles and PMIDs
A confidence score (0–1) reflecting data quality
Formatted citations for every source used

3. Safety Guardrails — What Gets Blocked and Why

Guardrails are the protective rules built into the system that prevent bad inputs from producing bad outputs. MedSignal has guardrails at three layers.

Layer 1 — Input Validation (fires before any API call)

These checks run the moment a request arrives and reject invalid inputs immediately, in under 5ms, before touching any external service or spending any LLM tokens:

Guardrail	What It Blocks	HTTP Response
Drug whitelist	Any drug not in `["semaglutide", "metformin"]`, including brand names like `ozempic`	`422`
Empty drug name	Whitespace-only or blank `drug_name` fields	`422`
Empty query	Whitespace-only or blank `query` fields	`422`
Query minimum length	Any `query` under 10 characters	`422`
Query maximum length	Any `query` longer than 500 characters	`422`
Math expression check	Queries containing arithmetic (e.g. "what is 2 + 2 with semaglutide")	`422`
Character ratio check	Queries where fewer than 50% of characters are alphabetic (gibberish, symbol-heavy)	`422`
Age group whitelist	Any `age_group` not in `{"pediatric", "18-64", "65+"}` — e.g. `"200+"`	`422`
Date format check	Any `date_range` not matching `YYYYMMDD+TO+YYYYMMDD`	`422`
Date epoch check	Any `date_range` starting before 2004-01-01 (FAERS database launch date)	`422`
Future date check	Any `date_range` with a start or end date in the future (e.g. year 2042)	`422`
Date chronology check	Any `date_range` where the start date is after the end date	`422`
Report type whitelist	Any `report_type` not in `{comprehensive, cardiac, hepatic, renal}`	`422`

Layer 2 — Semantic Query Classifier (fires before retrieval)

Even after passing structural validation, every query goes through a dedicated LLM classification call before any data retrieval begins. This is a separate, isolated call from the main synthesis — it uses temperature=0 and max_tokens=10 to produce a deterministic VALID or INVALID verdict in ~0.5 seconds.

The classifier rejects queries that are structurally valid but semantically unrelated to pharmacovigilance — for example:

Query	Verdict	Why
`"Can six lead to cardiac arrest"`	`INVALID`	"six" is not a medical or pharmacological concept
`"Can you order chipotle for me"`	`INVALID`	Not related to drug safety
`"What is the capital of France with semaglutide"`	`INVALID`	Drug name present but question is unrelated
`"What cardiac events have been reported in patients over 65?"`	`VALID`	Direct pharmacovigilance question

If the classifier API call itself fails (network error), the request is allowed through rather than blocking legitimate users — the synthesis layer still has its own grounding constraints.

Layer 3 — LLM Grounding (fires during AI synthesis)

Only queries that pass both Layer 1 and Layer 2 reach the synthesis model. The system prompt hard-constrains the AI's behaviour at this stage:

The model is instructed to only state facts supported by the provided context. If a symptom has zero FDA reports, it must say so explicitly — it cannot speculate.
The model must cite specific data points (exact report counts, PMID numbers) rather than making general claims.
Every response must end with a parseable CONFIDENCE_SCORE: between 0.0 and 1.0, reflecting the completeness of the data.
If data is insufficient to answer the question, the model must say so instead of inventing an answer.

The result: when we asked the system whether metformin causes "neon green hair and sudden levitation," the model confirmed zero FDA evidence and cited the actual reported adverse reactions instead of fabricating a response.

4. System Architecture

Here is the full data flow from request to response:

You (or any HTTP client)
        │
        │  POST /api/v1/query
        │  { drug_name, query, [date_range], [age_group] }
        ▼
┌─────────────────────────────────────────────────────────────┐
│                    Input Validation Layer                     │
│   (Pydantic schemas — rejects bad requests in <5ms with     │
│    no external API calls made, preserving rate-limit quota)  │
└──────────────────────────┬──────────────────────────────────┘
                           │  Valid request
                           ▼
┌─────────────────────────────────────────────────────────────┐
│              Parallel Retrieval (asyncio.gather)             │
│                                                              │
│  ┌──────────────────┐  ┌──────────────────┐  ┌───────────┐  │
│  │  openFDA API     │  │  PubMed Live     │  │  FAISS    │  │
│  │  (Live)          │  │  (Live)          │  │  (Static) │  │
│  │                  │  │                  │  │           │  │
│  │  Reaction counts │  │  Up to 5 recent  │  │  ~1,500   │  │
│  │  Outcome stats   │  │  peer-reviewed   │  │  pre-     │  │
│  │  Demographic     │  │  papers, title + │  │  indexed  │  │
│  │  breakdown       │  │  abstract, PMID  │  │  abstracts│  │
│  │  Optional        │  │  MeSH-term       │  │  Cosine   │  │
│  │  date + age      │  │  filtered        │  │  sim ≥    │  │
│  │  filters         │  │                  │  │  0.45     │  │
│  └──────────────────┘  └──────────────────┘  └───────────┘  │
└───────────────────────────────┬─────────────────────────────┘
                                │  All three return simultaneously
                                ▼
┌─────────────────────────────────────────────────────────────┐
│                      Context Merger                          │
│  - Deduplicates papers by PMID (no study appears twice)     │
│  - Truncates long abstracts to preserve LLM context budget  │
│  - Formats as labelled sections for the AI prompt           │
└──────────────────────────┬──────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────┐
│              Groq LLM — Llama 3.3-70B                        │
│  System prompt enforces: evidence-only responses,            │
│  structured output format, confidence score, no invented     │
│  citations, explicit acknowledgement of data gaps           │
└──────────────────────────┬──────────────────────────────────┘
                           │
                           ▼
              Structured JSON response:
              synthesized_assessment · adverse_events
              literature_context · citations
              confidence_score · metadata (latency, sources)

All three retrievals run in parallel — the total query time is bounded by the slowest single source, not the sum of all three. In practice, most queries complete in 4–6 seconds end-to-end.

5. What We Built — Step by Step

Step 1: Offline Ingestion Pipeline (run once before the server starts)

app/ingestion/fetch_pubmed.py
Queries PubMed's API for up to 1,500 research abstracts per drug using relevance-sorted search. Fetches in batches of 50 (to respect URL length limits) with automatic retry and exponential backoff — if the network hiccups, it retries up to 5 times before failing that batch. Rate limiting (0.35–0.5 seconds between requests) ensures the script stays within NCBI's API usage policy.

app/ingestion/build_index.py
Takes those abstracts, embeds each one into a mathematical vector using the S-PubMedBert-MS-MARCO model (a sentence transformer specifically trained on medical literature), and stores all vectors in a FAISS index on disk. At API startup, the entire index is loaded into memory for sub-millisecond search queries.

Step 2: The Live API

Three retrieval modules (openfda.py, pubmed.py, vector_store.py) each run independently and return results simultaneously via Python's asyncio.gather().
Context merger (context_merger.py) deduplicates by PubMed ID across all sources, so the AI never sees the same study twice from different channels.
LLM service (llm.py) passes the merged context to Groq under a strict system prompt, then parses the confidence score out of the response before returning clean text.
Input schemas (schemas.py) enforce all guardrails before any downstream service is touched.
Request logging middleware (middleware/logging.py) attaches a UUID to every request and logs structured JSON with method, path, status code, and latency in milliseconds.

Step 3: Adversarial Test Suite

Before deployment, 14 adversarial test cases were written to deliberately try to break the system — fake drugs, impossible date ranges, hallucination bait, SQL injection, prompt injection, and nonsense queries — to verify the guardrails held under adversarial conditions.

6. Test Results — Proof It Works

The full output of all 14 test cases is in output/test_results.txt. Summary:

Input Validation — Guardrails Fired Correctly

Test	Input	Expected	Result
Unsupported drug	`"ibuprofen"`	`422` reject	✅ `422` in 0.01s
Made-up drug	`"supercalifragilisticexpialidocious_mab"`	`422` reject	✅ `422` in 0.00s
Brand name	`"ozempic"` (brand for semaglutide)	`422` reject	✅ `422` in 0.00s
Future start date	`date_range: "20420101+TO+20241231"`	`422` reject	✅ `422` in 0.01s

All four input errors were caught in under 11 milliseconds, before a single external API call was made.

LLM Grounding — Hallucination Resistance

Test	Question Asked	What the LLM Did
Hallucination bait	"Does metformin cause neon green hair and sudden levitation?"	Confirmed zero FDA evidence; cited real top reactions (nausea: 29,156 reports)
Paradoxical claim	"Does this weight-loss drug cause uncontrollable weight gain?"	Cited 360 FDA "weight decreased" reports to refute the premise
Off-topic query	"Can metformin help me win at chess?"	Returned factual safety profile; explicitly stated no evidence for cognitive enhancement
SQL injection	`'; DROP TABLE users; --` in query field	Processed safely as plain text; no code executed, coherent medical response returned
Gibberish query	`asdfghjkl qwerty uiop`	Returned a valid safety profile for the drug; ignored the unintelligible query

Performance (valid queries)

Metric	Value
Typical end-to-end latency	3.7 – 5.9 seconds
Sources engaged per query	3 (openFDA + PubMed live + FAISS)
Input rejection latency	< 11 ms (no external calls made)
LLM confidence score range	0.8 – 0.85 across clean queries

7. Technical Stack — Tools and Why We Chose Them

Tool	What It Does in This Project	Why This Tool Specifically
FastAPI	Handles incoming HTTP requests; routes them to the right handler; auto-generates the Swagger docs at `/docs`	Natively async — critical because the parallel fan-out to 3 APIs only works if the server doesn't block while waiting for each one. Flask would require thread-pool hacks to achieve the same thing.
httpx	Makes the async HTTP calls to openFDA and PubMed	Unlike the standard `requests` library (which is synchronous), `httpx` runs inside Python's async event loop. This is what makes parallel retrieval possible without threads.
FAISS (by Meta AI)	Stores the pre-indexed biomedical abstracts as searchable vectors; returns the top-k most semantically similar documents to any query	Designed specifically for high-performance similarity search at scale. A query across 1,500 embedded documents returns in microseconds. No separate server or cloud service needed — runs in-process.
S-PubMedBert-MS-MARCO	Converts text (abstracts, queries) into mathematical vectors for FAISS	Pre-trained on PubMed biomedical literature, so it understands medical vocabulary. A general-purpose model (like `all-MiniLM`) would treat "medullary thyroid carcinoma" as rare unknown tokens; this model understands it.
Groq API (Llama 3.3-70B)	Reads the merged context from all three sources and writes the structured safety assessment	Groq runs Llama 3.3-70B on custom silicon (LPUs) that is 10–20x faster than GPU-based APIs. The full assessment — from a 70-billion-parameter model — arrives in ~1–3 seconds, keeping total query latency under 6 seconds.
Pydantic v2	Validates every field of every incoming request against strict rules before any code runs	Acts as the first line of defence. Invalid drug names, malformed dates, and oversized queries are rejected at the schema layer — no LLM tokens spent, no FDA rate-limit credits consumed.
Tenacity	Retries failed PubMed batch fetches with exponential backoff during the ingestion pipeline	The ingestion script fetches hundreds of batches over several minutes. Without retry logic, a single network hiccup aborts the entire pipeline. Tenacity retries up to 5 times with increasing delays before giving up on a batch.
Docker	Packages the entire application — Python version, model weights, FAISS index, and all dependencies — into one portable container	The FAISS index and sentence transformer model together are ~500MB. Docker ensures this state is reproducible across any machine and deployable to any container platform with one command.

8. Getting Started

What You Need First

Python 3.10 or later
A free Groq API key — takes 2 minutes to create
(Optional but recommended) Free API keys for openFDA and NCBI/PubMed — the API works without them but at lower rate limits

Install and Run

# 1. Clone the repo
git clone https://github.com/nikhilreddy00/MedSignal-API.git
cd MedSignal-API

# 2. Create and activate a virtual environment
python -m venv venv
source venv/bin/activate          # On Windows: venv\Scripts\activate

# 3. Install all dependencies
pip install -r requirements.txt

# 4. Create a .env file with your credentials
cat > .env << EOF
GROQ_API_KEY=gsk_your_groq_key_here
OPENFDA_API_KEY=                   # optional
NCBI_API_KEY=                      # optional
LOG_LEVEL=INFO
EOF

Build the Knowledge Base (one-time setup, ~5–15 minutes)

The FAISS knowledge base is built from PubMed abstracts and is NOT included in the repo (the raw files are excluded by .gitignore). You must run these two steps before the API can use the static retrieval source:

# Step 1: Download PubMed abstracts for semaglutide and metformin
# Downloads ~1,500 abstracts per drug. Takes 2–5 minutes depending on your API key tier.
python -m app.ingestion.fetch_pubmed

# Step 2: Embed and index into FAISS
# Downloads the S-PubMedBert model (~440MB) on first run, then embeds all abstracts.
# Takes 3–10 minutes depending on your CPU.
python -m app.ingestion.build_index

Note: If you skip this step, the API will still work — it will just use openFDA and live PubMed only (2 out of 3 sources). The startup log will show a warning: Vector store not loaded.

Start the Server

uvicorn app.main:app --reload --port 8000

Open http://localhost:8000/docs in your browser to see the interactive Swagger UI where you can test all endpoints directly.

9. API Reference

`POST /api/v1/query` — Ask a Drug Safety Question

The main endpoint. Ask any pharmacovigilance question about a supported drug.

Request body:

{
  "drug_name": "semaglutide",
  "query": "What cardiac adverse events have been reported in patients over 65?",
  "date_range": "20240101+TO+20241231",
  "age_group": "65+"
}

Field	Required	Description
`drug_name`	Yes	Must be `semaglutide` or `metformin` (lowercase, generic name only)
`query`	Yes	Your question in plain English. Max 500 characters.
`date_range`	No	Filter FDA reports to a date range. Format: `YYYYMMDD+TO+YYYYMMDD`
`age_group`	No	Filter FDA reports by patient age. Options: `pediatric`, `18-64`, `65+`

`POST /api/v1/signal-report` — Generate a Formal Safety Report

Generates a full 7-section pharmacovigilance document (Executive Summary, Signal Description, Adverse Event Analysis, Literature Review, Risk Characterization, Recommendations, Data Sources).

Request body:

{
  "drug_name": "metformin",
  "report_type": "comprehensive"
}

Field	Required	Description
`drug_name`	Yes	Must be `semaglutide` or `metformin`
`report_type`	No	One of: `comprehensive` (default), `cardiac`, `hepatic`, `renal`

`GET /api/v1/health` — System Health Check

Returns the live status of all three data sources and the vector store. Useful for verifying your setup before running queries.

{
  "status": "healthy",
  "vector_store_loaded": true,
  "openfda_reachable": true,
  "pubmed_reachable": true,
  "groq_reachable": true,
  "index_document_count": 1487,
  "embedding_model": "pritamdeka/S-PubMedBert-MS-MARCO",
  "llm_model": "llama-3.3-70b-versatile"
}

10. Deployment

The project includes a Dockerfile configured for deployment on Hugging Face Spaces (free tier: 16GB RAM, 2 vCPUs), which comfortably fits the FAISS index and embedding model in memory — unlike standard 512MB free tiers on platforms like Render or Heroku.

Deploy to Hugging Face Spaces (free, public URL):

Create a free account at huggingface.co
Go to your profile → New Space → name it medsignal-api → choose Docker as the SDK → set hardware to Free (CPU Basic)
In Space Settings → Variables and secrets, add:
- GROQ_API_KEY → your Groq key
Upload all project files, or connect your GitHub repository via the Git integration
Hugging Face builds the Docker image and serves the API on a public URL at no cost

A render.yaml file is also included for Render deployment, though their 512MB free tier may struggle to load the embedding model and FAISS index simultaneously.

11. Future Scope

With additional engineering time, the following would bring this to production pharmacovigilance quality:

Expanded drug coverage — The current PoC is limited to two drugs. The architecture is drug-agnostic; adding a new drug requires adding its name to TARGET_DRUGS in config.py and re-running the ingestion pipeline.
Brand name normalisation — Map trade names (Ozempic, Wegovy) to their generic equivalents before the whitelist check, so analysts don't need to know the INN name.
Automated nightly re-indexing — A CI/CD pipeline that re-fetches the latest PubMed abstracts and rebuilds the FAISS index on a schedule, keeping the static knowledge base within days of current literature.
RAGAS evaluation — Automated scoring of context relevance and answer faithfulness against a held-out test set after every index rebuild (target: >0.92 faithfulness score).
EHR integration — An endpoint that accepts de-identified patient records and cross-references the existing safety signal data to flag drug-patient interaction risks.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
app		app
output		output
tests		tests
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
render.yaml		render.yaml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

MedSignal API

Real-Time Drug Safety Intelligence · Powered by Hybrid RAG + LLM

⚠️ Supported Drugs — Read Before Testing

Table of Contents

1. The Problem — Why This Exists

2. The Solution — How MedSignal Works

3. Safety Guardrails — What Gets Blocked and Why

Layer 1 — Input Validation (fires before any API call)

Layer 2 — Semantic Query Classifier (fires before retrieval)

Layer 3 — LLM Grounding (fires during AI synthesis)

4. System Architecture

5. What We Built — Step by Step

Step 1: Offline Ingestion Pipeline (run once before the server starts)

Step 2: The Live API

Step 3: Adversarial Test Suite

6. Test Results — Proof It Works

Input Validation — Guardrails Fired Correctly

LLM Grounding — Hallucination Resistance

Performance (valid queries)

7. Technical Stack — Tools and Why We Chose Them

8. Getting Started

What You Need First

Install and Run

Build the Knowledge Base (one-time setup, ~5–15 minutes)

Start the Server

9. API Reference

POST /api/v1/query — Ask a Drug Safety Question

POST /api/v1/signal-report — Generate a Formal Safety Report

GET /api/v1/health — System Health Check

10. Deployment

11. Future Scope

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`POST /api/v1/query` — Ask a Drug Safety Question

`POST /api/v1/signal-report` — Generate a Formal Safety Report

`GET /api/v1/health` — System Health Check

Packages