A tool for analyzing clinical conversations using knowledge graphs. Extracts entities and relationships from mental health interviews, enabling structured analysis of unstructured dialogue.
Clinical interviews contain rich information — symptoms, history, relationships, context — but it's trapped in unstructured text. This project extracts that information into knowledge graphs that can be:
- Queried to find patterns across patients or sessions
- Visualized to see connections between symptoms, triggers, and treatments
- Analyzed to identify structural differences in conversation content
- Compared across conditions, time points, or clinicians
The goal is to make clinical conversation data more accessible for research and analysis.
# 1. Make sure Neo4j and Ollama are running
docker start neo4j
ollama serve
# 2. Activate environment
source venv/bin/activate
# 3. Extract entities from clinical interviews
python extract_clinical.py all
# 4. Analyze graph patterns by disorder
python analyze_graphs.py
# 5. Visualize in Neo4j Browser
open http://localhost:7474
# Run: MATCH (n)-[r]->(m) RETURN n, r, mSTEP 1: Clinical interview transcript
┌─────────────────────────────────────────────────────────────┐
│ "I've been feeling anxious for about 6 months now. │
│ It started after I lost my job. I worry constantly..." │
└─────────────────────────┬───────────────────────────────────┘
│
▼
STEP 2: LLM extracts clinical + semantic entities
┌─────────────────────────────────────────────────────────────┐
│ CLINICAL ENTITIES: │
│ • anxiety (symptom) │
│ • worry (symptom) │
│ • 6 months (criterion - duration) │
│ • job loss (trigger) │
│ │
│ SEMANTIC ENTITIES: │
│ • Sarah (person) │
│ • Dr. Grande (person) │
│ • work (place) │
└─────────────────────────┬───────────────────────────────────┘
│
▼
STEP 3: Store in Neo4j with entity type labels
┌─────────────────────────────────────────────────────────────┐
│ (:Episode {name: 'gad_sarah_001'}) │
│ │ │
│ ├──MENTIONS──>(:Clinical {name: 'anxiety'}) │
│ ├──MENTIONS──>(:Clinical {name: 'worry'}) │
│ ├──MENTIONS──>(:Semantic {name: 'Sarah'}) │
│ └──MENTIONS──>(:Semantic {name: 'Dr. Grande'}) │
│ │
│ (Sarah)──HAS_SYMPTOM──>(anxiety) │
│ (job loss)──TRIGGERS──>(anxiety) │
└─────────────────────────┬───────────────────────────────────┘
│
▼
STEP 4: Query, visualize, and analyze
┌─────────────────────────────────────────────────────────────┐
│ • Visualize in Neo4j Browser │
│ • Query patterns across episodes │
│ • Export metrics for analysis │
└─────────────────────────────────────────────────────────────┘
| Component | Purpose | How to Run |
|---|---|---|
| Ollama | Local LLM for entity extraction | ollama serve |
| Neo4j | Graph database for storage | docker start neo4j |
| extract_clinical.py | Extract entities from transcripts | python extract_clinical.py |
| analyze_graphs.py | Compare patterns by disorder | python analyze_graphs.py |
Clinical Entities (mental health specific):
- Symptoms: anxiety, worry, fatigue, sleep problems, attention issues
- DSM Criteria: duration ("6 months"), frequency ("more days than not")
- Treatments: medications, therapy types, coping strategies
- Triggers: life events, stressors
Semantic Entities (general concepts):
- People: patient, clinician, family members
- Places: school, work, home, clinic
- Objects: physical items mentioned
- Topics: abstract concepts discussed
| Tool | Purpose | Install |
|---|---|---|
| Python 3.10+ | Runtime | python3 --version |
| Docker | Runs Neo4j | Install Docker |
| Ollama | Local LLM | Install Ollama |
git clone https://github.com/povilaskarvelis/chat2graph.git
cd chat2graph
# Create virtual environment
python3 -m venv venv
source venv/bin/activate
# Install dependencies
pip install -r requirements.txtdocker run -d \
--name neo4j \
-p 7474:7474 \
-p 7687:7687 \
-e NEO4J_AUTH=neo4j/password123 \
neo4j:latestollama serve
ollama pull llama3.1:8b # Or llama3.2Create .env file:
# Ollama
OLLAMA_MODEL=llama3.1:8b
# Neo4j
NEO4J_URI=bolt://localhost:7687
NEO4J_USER=neo4j
NEO4J_PASSWORD=password123# Extract from all conversations
python extract_clinical.py all
# Or extract one at a time
python extract_clinical.py gad_sarah_001python analyze_graphs.pyOpen http://localhost:7474 and run:
MATCH (n)-[r]->(m) RETURN n, r, mThe empirical_conversations.py file contains transcripts from Dr. Todd Grande's educational videos.
Note: The "patients" are actors portraying scripted scenarios — NOT real patients.
| Conversation | Condition | Meets Criteria? |
|---|---|---|
gad_sarah_001 |
GAD | Yes |
gad_sarah_002 |
GAD (comorbidities) | Yes |
gad_sarah_003 |
GAD (subthreshold) | No |
adhd_elise_001 |
ADHD | No |
adhd_elise_002 |
ADHD Combined | Yes |
adhd_elise_003 |
ADHD | Yes |
wernickes_aphasia_byron_001 |
Wernicke's Aphasia | Yes |
Source video for gad_sarah_001:
Running python analyze_graphs.py on the example dataset produces:
| Disorder | Clinical Nodes | Semantic Nodes | Clinical Ratio |
|---|---|---|---|
| GAD | 11.3 | 5.0 | 69.4% |
| ADHD | 8.7 | 4.0 | 68.4% |
| Wernicke's Aphasia | 2.0 | 5.0 | 28.6% |
Interpretation:
- High clinical ratio (~69%) = patient describing clear symptoms (GAD, ADHD)
- Low clinical ratio (~29%) = speech without clinical content (Wernicke's)
Full graph (all conversations):
Graph for gad_sarah_001 (from video above):
Episode nodes (blue) link to Clinical entities (red: symptoms, criteria, treatments) and Semantic entities (beige: people, places, topics).
Results are also exported to results/analysis_latest.json.
| File | Purpose | How to Run |
|---|---|---|
extract_clinical.py |
Extract entities from transcripts | python extract_clinical.py all |
analyze_graphs.py |
Compare graph patterns by disorder | python analyze_graphs.py |
empirical_conversations.py |
Clinical interview data | — |
load_empirical.py |
Alternative loader using Graphiti | python load_empirical.py |
View full graph:
MATCH (n)-[r]->(m) RETURN n, r, mView one conversation:
MATCH (e:Episode {name: 'gad_sarah_001'})-[:MENTIONS]->(n)
OPTIONAL MATCH (n)-[r]->(m)
RETURN e, n, r, mView only clinical entities:
MATCH (e:Episode)-[:MENTIONS]->(n:Clinical)
RETURN e, nCompare Wernicke's vs GAD:
MATCH (e:Episode)-[:MENTIONS]->(n)
WHERE e.name IN ['gad_sarah_001', 'wernickes_aphasia_byron_001']
OPTIONAL MATCH (n)-[r]->(m)
RETURN e, n, r, mCount by entity type:
MATCH (e:Episode)-[:MENTIONS]->(n)
RETURN e.name,
sum(CASE WHEN n:Clinical THEN 1 ELSE 0 END) as clinical,
sum(CASE WHEN n:Semantic THEN 1 ELSE 0 END) as semanticdocker ps # Check if running
docker start neo4j # Start if stoppedollama serve # Make sure it's running
ollama list # Check available models# Clear database
python -c "
from neo4j import GraphDatabase
driver = GraphDatabase.driver('bolt://localhost:7687', auth=('neo4j', 'password123'))
with driver.session() as s: s.run('MATCH (n) DETACH DELETE n')
driver.close()
print('Cleared!')
"
# Re-extract
python extract_clinical.py allThis is a research and development tool. Clinical interview data is from educational demonstrations with actors, not real patients. Any clinical application would require appropriate validation, privacy safeguards, and regulatory compliance.


