An autonomous multi-agent system for generating safe, empathetic CBT (Cognitive Behavioral Therapy) exercises using LangGraph, PostgreSQL persistence, and Model Context Protocol (MCP) integration.
Cerina is not just a chatbotβit's a clinical foundry powered by autonomous AI agents that:
- Draft CBT exercises based on user queries
- Validate content for safety (no self-harm, medical advice, or triggering content)
- Critique for empathy, clarity, and clinical correctness
- Iterate autonomously until ready for human review
- Pause for human-in-the-loop approval before finalizing
- β Multi-Agent Architecture: Supervisor-Worker pattern with autonomous loops
- β PostgreSQL Checkpointing: Crash-resistant, resume-anywhere execution
- β Real-Time Streaming UI: Watch agents debate and refine in real-time
- β Human-in-the-Loop: Edit, approve, or reject drafts before saving
- β MCP Integration: Expose workflow as a tool for Claude Desktop and other MCP clients
- β Session History: Track all past queries and generated exercises
βββββββββββββββ
β User β
β Query β
ββββββββ¬βββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β SUPERVISOR β
β (Routes tasks, decides when "good enough") β
ββββ¬βββββββββββββββββββββββββββββββββββββββββββ¬ββββ
β β
βΌ β
ββββββββββββ ββββββββββββ ββββββββββββ β
β DRAFTER βββββΆβ SAFETY βββββΆβ CRITIC βββ
β β β GUARDIAN β β REVIEWER β
ββββββββββββ ββββββββββββ ββββββββββββ
β β
β βββββββββ Loop if failed ββββββββββ
β
βΌ
βββββββββββββββββββ
β HUMAN APPROVAL β βββ Graph pauses here
β (Edit/Approve) β
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββ
β SAVE β
β TO DB β
βββββββββββ
- Supervisor: Orchestrates workflow, routes to agents, decides when to halt
- Drafter: Creates CBT exercises, learns from rejected drafts
- Safety Guardian: Checks for unsafe content (self-harm, medical advice)
- Clinical Critic: Reviews empathy, clarity, and CBT correctness
AgentState:
- user_query: str
- draft: str # Current working draft
- previous_drafts: List[str] # Version history
- safety_notes: List[str] # Safety agent scratchpad
- critic_notes: List[str] # Critic agent scratchpad
- metadata:
- iterations: int
- safety_pass: bool
- critic_pass: bool
- user_rejected: bool
- edited_by_user: bool
- final_output: str # Approved & saved- Python 3.13+
- PostgreSQL database (local or cloud, e.g., Neon)
- API keys: Groq or OpenAI (for LLM)
- Clone the repository
git clone Cerina-Foundry
cd Cerina-Foundry- Install dependencies
# Using uv (recommended)
uv sync
# OR using pip
uv add -r requirements.txt- Set up environment variables
Create a
.envfile:
# Database (PostgreSQL connection string)
DATABASE_URL=postgresql://user:password@host:5432/dbname
# LLM API Keys (choose one)
GROQ_API_KEY=your_groq_api_key
# OR
OPENAI_API_KEY=your_openai_api_key
# LangSmith (optional, for tracing)
LANGSMITH_API_KEY=your_langsmith_key
LANGSMITH_PROJECT=cerina-foundry- Initialize the database The app will auto-create tables on first run, but you can verify:
python db_test.py- Run the Flask app
python main.pyThe dashboard will be available at: http://localhost:5000
- Click "New Session" in the sidebar
- Enter a query (e.g., "Create an exposure hierarchy for social anxiety")
- Click "Start Generation"
- Agent Stream Panel: See real-time logs of each agent's actions
- Status Badges: Monitor iterations, safety checks, and critic reviews
- Clinical Notes Panel: View detailed feedback from Safety and Critic agents
When the draft is ready, the Action Bar appears:
- Reject: Discard draft, agents create a new one from scratch
- Edit: Modify the draft manually, then re-validate through Safety/Critic
- Approve & Save: Finalize and save to database
- Click any session in the sidebar to view its history
- Delete sessions with the trash icon
The Model Context Protocol allows AI assistants (like Claude Desktop) to use your LangGraph workflow as a tool.
-
Locate Claude Desktop config:
- macOS:
~/Library/Application Support/Claude/claude_desktop_config.json - Windows:
%APPDATA%\Claude\claude_desktop_config.json
- macOS:
-
Add Cerina server:
{
"mcpServers": {
"cerina_foundry": {
"command": "uv",
"args": [
"--directory",
"C:/Users/rouna/PycharmProjects/Cerina", # (use absolute path were all requiremnts are installed)
"run",
"python",
"MCP/cerina_mcp_tools.py"
]
}
}
}-
Restart Claude Desktop
-
Test it:
- In Claude Desktop, type: "Ask Cerina Foundry to create a sleep hygiene protocol"
- Claude will invoke your multi-agent workflow and return the generated draft
generate_cbt_exercise(
topic: str, # e.g., "Sleep Hygiene"
instructions: str # Optional details
) -> str # Returns generated CBT exerciseCerina/
βββ agent/
β βββ drafter_agent.py # Creates CBT exercises
β βββ safety_agent.py # Safety validation
| |-- stream utils.py # for seeing live genration of response
β βββ critic_agent.py # Quality review
β βββ supervisor_agent.py # Orchestration
β βββ prompts.py # Agent prompts
β βββ llm_client.py # LLM wrapper
βββ MCP/
β βββ cerina_mcp_tools.py # MCP server
βββ db/
β βββ config.py # Database config
βββ templates/
β βββ index.html # Web dashboard
βββ checkpoint_store.py # PostgreSQL checkpointer
βββ graph_builder.py # LangGraph definition
βββ state.py # AgentState schema
βββ main.py # Flask API
βββ requirements.txt # Dependencies
βββ README.md # This file
-
Test the web UI:
python main.py # Visit http://localhost:5000 -
Test MCP integration:
# Run MCP server standalone python MCP/cerina_mcp_tools.py or run python mcp_caller.py # Use mcp-use CLI to test mcp-use cerina_foundry generate_cbt_exercise --topic "Sleep Hygiene"
-
Test database persistence:
python db_test.py
- Start a generation
- Kill the Flask process mid-execution
- Restart Flask
- Load the sessionβit should resume from the last checkpoint
Edit agent/llm_client.py to switch between Groq/OpenAI:
# Current: OpenAI
client = OpenAI(model="gpt-4o-mini")
# Switch to Groq:
# from lgroq import ChatGroq
# llm = ChatGroq(model="openai/gpt-oss-120b")Update DATABASE_URL in .env to use:
- Local PostgreSQL:
postgresql://user:pass@localhost:5432/cerina - Neon (cloud):
postgresql://user:pass@ep-xxx.neon.tech/cerina?sslmode=require
Customize agent behavior in agent/prompts.py:
DRAFTER_PROMPT: How the drafter creates exercisesSAFETY_PROMPT: What safety checks to performCRITIC_PROMPT: Quality review criteria
Stores graph execution state for crash recovery.
CREATE TABLE saved_exercises (
id SERIAL PRIMARY KEY,
thread_id TEXT,
user_query TEXT,
final_output TEXT,
created_at TIMESTAMP DEFAULT NOW()
);CREATE TABLE session_metadata (
id TEXT PRIMARY KEY,
user_query TEXT,
created_at TIMESTAMP DEFAULT NOW()
);Contents:
- React UI demo: Agents debating, human-in-the-loop approval
- MCP demo: Claude Desktop triggering workflow
- Code walkthrough: State definition and checkpointer logic
This is a sprint assignment project. For production use:
- Add comprehensive error handling
- Implement rate limiting
- Add authentication/authorization
- Write unit and integration tests
- Add monitoring and logging
MIT License
- LangGraph: For the agent orchestration framework
- Model Context Protocol: For AI interoperability standards
- Anthropic: For Claude and MCP documentation
- OpenAI: For LLM inference ## we can also use grog for fast llm inference
Name: Rounak Raj
Email: rajrounak366@gmail.com
Built with β€οΈ for the Cerina "Agentic Architect" Sprint