Skip to content

povilaskarvelis/chat2graph

Repository files navigation

Chat2Graph 🧠➡️🕸️

A tool for analyzing clinical conversations using knowledge graphs. Extracts entities and relationships from mental health interviews, enabling structured analysis of unstructured dialogue.

Purpose

Clinical interviews contain rich information — symptoms, history, relationships, context — but it's trapped in unstructured text. This project extracts that information into knowledge graphs that can be:

  • Queried to find patterns across patients or sessions
  • Visualized to see connections between symptoms, triggers, and treatments
  • Analyzed to identify structural differences in conversation content
  • Compared across conditions, time points, or clinicians

The goal is to make clinical conversation data more accessible for research and analysis.


Quick Start

# 1. Make sure Neo4j and Ollama are running
docker start neo4j
ollama serve

# 2. Activate environment
source venv/bin/activate

# 3. Extract entities from clinical interviews
python extract_clinical.py all

# 4. Analyze graph patterns by disorder
python analyze_graphs.py

# 5. Visualize in Neo4j Browser
open http://localhost:7474
# Run: MATCH (n)-[r]->(m) RETURN n, r, m

How It Works

STEP 1: Clinical interview transcript
┌─────────────────────────────────────────────────────────────┐
│  "I've been feeling anxious for about 6 months now.         │
│   It started after I lost my job. I worry constantly..."    │
└─────────────────────────┬───────────────────────────────────┘
                          │
                          ▼
STEP 2: LLM extracts clinical + semantic entities
┌─────────────────────────────────────────────────────────────┐
│  CLINICAL ENTITIES:                                         │
│   • anxiety (symptom)                                       │
│   • worry (symptom)                                         │
│   • 6 months (criterion - duration)                         │
│   • job loss (trigger)                                      │
│                                                             │
│  SEMANTIC ENTITIES:                                         │
│   • Sarah (person)                                          │
│   • Dr. Grande (person)                                     │
│   • work (place)                                            │
└─────────────────────────┬───────────────────────────────────┘
                          │
                          ▼
STEP 3: Store in Neo4j with entity type labels
┌─────────────────────────────────────────────────────────────┐
│  (:Episode {name: 'gad_sarah_001'})                         │
│       │                                                     │
│       ├──MENTIONS──>(:Clinical {name: 'anxiety'})           │
│       ├──MENTIONS──>(:Clinical {name: 'worry'})             │
│       ├──MENTIONS──>(:Semantic {name: 'Sarah'})             │
│       └──MENTIONS──>(:Semantic {name: 'Dr. Grande'})        │
│                                                             │
│  (Sarah)──HAS_SYMPTOM──>(anxiety)                           │
│  (job loss)──TRIGGERS──>(anxiety)                           │
└─────────────────────────┬───────────────────────────────────┘
                          │
                          ▼
STEP 4: Query, visualize, and analyze
┌─────────────────────────────────────────────────────────────┐
│  • Visualize in Neo4j Browser                               │
│  • Query patterns across episodes                           │
│  • Export metrics for analysis                              │
└─────────────────────────────────────────────────────────────┘

Architecture

Component Purpose How to Run
Ollama Local LLM for entity extraction ollama serve
Neo4j Graph database for storage docker start neo4j
extract_clinical.py Extract entities from transcripts python extract_clinical.py
analyze_graphs.py Compare patterns by disorder python analyze_graphs.py

Entity Types

Clinical Entities (mental health specific):

  • Symptoms: anxiety, worry, fatigue, sleep problems, attention issues
  • DSM Criteria: duration ("6 months"), frequency ("more days than not")
  • Treatments: medications, therapy types, coping strategies
  • Triggers: life events, stressors

Semantic Entities (general concepts):

  • People: patient, clinician, family members
  • Places: school, work, home, clinic
  • Objects: physical items mentioned
  • Topics: abstract concepts discussed

Setup Guide

Prerequisites

Tool Purpose Install
Python 3.10+ Runtime python3 --version
Docker Runs Neo4j Install Docker
Ollama Local LLM Install Ollama

Step 1: Clone and Setup

git clone https://github.com/povilaskarvelis/chat2graph.git
cd chat2graph

# Create virtual environment
python3 -m venv venv
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

Step 2: Start Neo4j

docker run -d \
  --name neo4j \
  -p 7474:7474 \
  -p 7687:7687 \
  -e NEO4J_AUTH=neo4j/password123 \
  neo4j:latest

Step 3: Start Ollama

ollama serve
ollama pull llama3.1:8b  # Or llama3.2

Step 4: Configure

Create .env file:

# Ollama
OLLAMA_MODEL=llama3.1:8b

# Neo4j
NEO4J_URI=bolt://localhost:7687
NEO4J_USER=neo4j
NEO4J_PASSWORD=password123

Step 5: Run Extraction

# Extract from all conversations
python extract_clinical.py all

# Or extract one at a time
python extract_clinical.py gad_sarah_001

Step 6: Analyze

python analyze_graphs.py

Step 7: Visualize

Open http://localhost:7474 and run:

MATCH (n)-[r]->(m) RETURN n, r, m

Example Dataset

The empirical_conversations.py file contains transcripts from Dr. Todd Grande's educational videos.

Note: The "patients" are actors portraying scripted scenarios — NOT real patients.

Conversation Condition Meets Criteria?
gad_sarah_001 GAD Yes
gad_sarah_002 GAD (comorbidities) Yes
gad_sarah_003 GAD (subthreshold) No
adhd_elise_001 ADHD No
adhd_elise_002 ADHD Combined Yes
adhd_elise_003 ADHD Yes
wernickes_aphasia_byron_001 Wernicke's Aphasia Yes

Source video for gad_sarah_001:

GAD Interview - gad_sarah_001


Example Results

Running python analyze_graphs.py on the example dataset produces:

Disorder Clinical Nodes Semantic Nodes Clinical Ratio
GAD 11.3 5.0 69.4%
ADHD 8.7 4.0 68.4%
Wernicke's Aphasia 2.0 5.0 28.6%

Interpretation:

  • High clinical ratio (~69%) = patient describing clear symptoms (GAD, ADHD)
  • Low clinical ratio (~29%) = speech without clinical content (Wernicke's)

Graph Visualization

Full graph (all conversations):

Full graph visualization

Graph for gad_sarah_001 (from video above):

Single conversation graph

Episode nodes (blue) link to Clinical entities (red: symptoms, criteria, treatments) and Semantic entities (beige: people, places, topics).

Results are also exported to results/analysis_latest.json.


Project Files

File Purpose How to Run
extract_clinical.py Extract entities from transcripts python extract_clinical.py all
analyze_graphs.py Compare graph patterns by disorder python analyze_graphs.py
empirical_conversations.py Clinical interview data
load_empirical.py Alternative loader using Graphiti python load_empirical.py

Cypher Queries

View full graph:

MATCH (n)-[r]->(m) RETURN n, r, m

View one conversation:

MATCH (e:Episode {name: 'gad_sarah_001'})-[:MENTIONS]->(n)
OPTIONAL MATCH (n)-[r]->(m)
RETURN e, n, r, m

View only clinical entities:

MATCH (e:Episode)-[:MENTIONS]->(n:Clinical)
RETURN e, n

Compare Wernicke's vs GAD:

MATCH (e:Episode)-[:MENTIONS]->(n)
WHERE e.name IN ['gad_sarah_001', 'wernickes_aphasia_byron_001']
OPTIONAL MATCH (n)-[r]->(m)
RETURN e, n, r, m

Count by entity type:

MATCH (e:Episode)-[:MENTIONS]->(n)
RETURN e.name, 
       sum(CASE WHEN n:Clinical THEN 1 ELSE 0 END) as clinical,
       sum(CASE WHEN n:Semantic THEN 1 ELSE 0 END) as semantic

Troubleshooting

Neo4j won't connect

docker ps          # Check if running
docker start neo4j # Start if stopped

Ollama errors

ollama serve       # Make sure it's running
ollama list        # Check available models

Clear and re-extract

# Clear database
python -c "
from neo4j import GraphDatabase
driver = GraphDatabase.driver('bolt://localhost:7687', auth=('neo4j', 'password123'))
with driver.session() as s: s.run('MATCH (n) DETACH DELETE n')
driver.close()
print('Cleared!')
"

# Re-extract
python extract_clinical.py all

Disclaimer

This is a research and development tool. Clinical interview data is from educational demonstrations with actors, not real patients. Any clinical application would require appropriate validation, privacy safeguards, and regulatory compliance.

About

A basic AI workflow for translating clinical interviews into semantic knowledge graphs

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors