Skip to content

WebSocket Protocol

Daedalus edited this page Dec 9, 2025 · 1 revision

WebSocket Protocol

The WebSocket connection provides real-time bidirectional communication between clients and the Cass backend. It handles chat messages, status updates, tool execution, and audio delivery.

Connection

Endpoint

ws://localhost:8000/ws

Authentication

Two authentication methods:

Query Parameter (preferred):

ws://localhost:8000/ws?token=<jwt_access_token>

First Message:

{
  "type": "auth",
  "token": "<jwt_access_token>"
}

Localhost Bypass: Connections from 127.0.0.1, ::1, or localhost automatically use DEFAULT_LOCALHOST_USER_ID if ALLOW_LOCALHOST_BYPASS=true (default).

Connection Response

On successful connection:

{
  "type": "connected",
  "message": "Cass vessel connected",
  "sdk_mode": true,
  "user_id": "user-uuid-here",
  "timestamp": "2025-12-09T10:00:00"
}

Message Types

Client → Server

chat

Send a message to Cass:

{
  "type": "chat",
  "message": "Hello Cass!",
  "conversation_id": "conv-uuid",
  "image": "base64-encoded-data",
  "image_media_type": "image/png"
}
Field Required Description
message Yes The user's message text
conversation_id No Conversation to continue (creates new if omitted)
image No Base64-encoded image data
image_media_type No MIME type (e.g., "image/png")

auth

Authenticate after connection:

{
  "type": "auth",
  "token": "jwt_token_here"
}

onboarding_intro

Trigger onboarding introduction:

{
  "type": "onboarding_intro",
  "user_id": "user-uuid",
  "profile": {
    "display_name": "Name",
    "relationship": "researcher"
  }
}

onboarding_demo

Trigger collaborative demo:

{
  "type": "onboarding_demo",
  "user_id": "user-uuid",
  "message": "optional response",
  "profile": {
    "relationship": "researcher",
    "background": {"context": "..."},
    "communication": {"style": "..."}
  }
}

Server → Client

connected

Connection established:

{
  "type": "connected",
  "message": "Cass vessel connected",
  "sdk_mode": true,
  "user_id": "user-uuid",
  "timestamp": "2025-12-09T10:00:00"
}

auth_success

Authentication successful:

{
  "type": "auth_success",
  "user_id": "user-uuid"
}

auth_error

Authentication failed:

{
  "type": "auth_error",
  "message": "Invalid token"
}

thinking

Status update during processing:

{
  "type": "thinking",
  "status": "Retrieving memories...",
  "memories": {
    "summaries_count": 3,
    "details_count": 5,
    "project_docs_count": 2,
    "user_context_count": 4,
    "wiki_pages_count": 1,
    "has_context": true
  },
  "timestamp": "2025-12-09T10:00:01"
}

Status messages progress through:

  1. "Retrieving memories..."
  2. "Generating response (Claude/OpenAI/local model)..."
  3. "Using tool: [tool_name]..." (if tools used)

response

Cass's response:

{
  "type": "response",
  "text": "Hello! <gesture:wave>",
  "animations": [
    {"type": "gesture", "name": "wave", "position": 7}
  ],
  "input_tokens": 1500,
  "output_tokens": 150,
  "provider": "anthropic",
  "model": "claude-sonnet-4-20250514",
  "conversation_id": "conv-uuid",
  "timestamp": "2025-12-09T10:00:02"
}
Field Description
text Response text with gesture/emote tags
animations Parsed gesture/emote data
input_tokens Tokens used for input
output_tokens Tokens generated
provider "anthropic", "openai", or "local"
model Specific model used
conversation_id ID of the conversation

audio

TTS audio available:

{
  "type": "audio",
  "url": "/audio/abc123.wav",
  "timestamp": "2025-12-09T10:00:03"
}

system

System message:

{
  "type": "system",
  "message": "Memory summarization complete",
  "timestamp": "2025-12-09T10:00:04"
}

title_update

Conversation title changed:

{
  "type": "title_update",
  "conversation_id": "conv-uuid",
  "title": "New conversation title",
  "timestamp": "2025-12-09T10:00:05"
}

debug

Debug information (development):

{
  "type": "debug",
  "message": "[Tool Loop #1] stop_reason=tool_use, tools=['recall_journal']",
  "timestamp": "2025-12-09T10:00:06"
}

error

Error occurred:

{
  "type": "error",
  "message": "Failed to generate response",
  "timestamp": "2025-12-09T10:00:07"
}

Chat Flow

Standard Flow

Client                          Server
  |                               |
  |-- chat (message) ------------>|
  |                               |
  |<-- thinking (memories) -------|
  |<-- thinking (generating) -----|
  |                               |
  |<-- response -----------------|
  |<-- audio (optional) ----------|

With Tool Use

Client                          Server
  |                               |
  |-- chat (message) ------------>|
  |                               |
  |<-- thinking (memories) -------|
  |<-- thinking (generating) -----|
  |<-- thinking (using tool) -----|
  |<-- debug (tool loop) ---------|
  |<-- thinking (continuing) -----|
  |                               |
  |<-- response -----------------|

Tool Loop

When Cass uses tools, the server loops:

  1. LLM returns stop_reason: "tool_use"
  2. Server executes requested tool(s)
  3. Server sends tool results back to LLM
  4. LLM generates next response (may use more tools)
  5. Repeat until stop_reason: "end_turn"

Context Building

For each chat message, the server builds context:

  1. Hierarchical Memory - Summaries and recent details
  2. User Context - Profile and observations for the user
  3. Project Context - Documents if conversation is in a project
  4. Self-Model - Cass's self-observations
  5. Wiki Context - Auto-injected relevant wiki pages

Context is formatted and prepended to the conversation.

Connection Management

ConnectionManager

The server maintains active connections:

class ConnectionManager:
    active_connections: List[WebSocket]
    connection_users: Dict[WebSocket, str]  # websocket -> user_id

Disconnect Handling

On WebSocketDisconnect, the connection is removed:

except WebSocketDisconnect:
    manager.disconnect(websocket)

Multi-LLM Support

The WebSocket supports multiple LLM providers:

Provider Variable Description
anthropic Claude Anthropic Claude models
openai OpenAI OpenAI GPT models
local Ollama Local Ollama models

Provider is selected via:

  • /llm TUI command
  • Ctrl+O TUI shortcut
  • API endpoint

Response includes provider and model fields.

Image Support

Images can be sent with messages:

{
  "type": "chat",
  "message": "What's in this image?",
  "image": "base64-encoded-image-data",
  "image_media_type": "image/png"
}

The image is passed to the LLM for multimodal understanding.

Error Handling

Errors are sent as error type messages:

{
  "type": "error",
  "message": "Detailed error message"
}

Common errors:

  • Authentication failures
  • LLM API errors
  • Tool execution failures
  • User not found

Key Files

  • backend/main_sdk.py:4799-5830 - WebSocket endpoint
  • backend/main_sdk.py:4799-4830 - ConnectionManager class
  • tui-frontend/tui.py - TUI WebSocket client
  • backend/auth.py - Token handling

Clone this wiki locally