Skip to content

Karma-234/llm-consensus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

50 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLM-Consensus

Multi-agent debate orchestration behind an OpenAI-compatible chat endpoint.

LLM-Consensus fans a request out to multiple agents, runs a structured debate loop (draft, critique, synthesize, vote, revise), and returns one final response.

It aims to improve answer quality and reasoning via multi-agent critique and voting. It can reduce some hallucinations from single-pass generation, but it is complementary to RAG rather than a replacement for retrieval-grounded answers.

How It Works

Client                        LLM-Consensus                        Providers
  |                                 |                                  |
  | POST /v1/chat/completions       |                                  |
  |-------------------------------->|                                  |
  |                                 | 1) Draft (all agents)            |
  |                                 |--------------------------------->|
  |                                 |<---------------------------------|
  |                                 | 2) Critique (all agents)         |
  |                                 |--------------------------------->|
  |                                 |<---------------------------------|
  |                                 | 3) Synthesize candidate          |
  |                                 |--------------------------------->|
  |                                 |<---------------------------------|
  |                                 | 4) Vote (JSON)                   |
  |                                 |--------------------------------->|
  |                                 |<---------------------------------|
  |                                 | 5) Consensus? revise/fallback    |
  |                                 |                                  |
  | Final response                  |                                  |
  |<--------------------------------|                                  |

Quick Start

Prerequisites

  • Go 1.25+
  • At least one provider API key

Environment setup

Copy and edit env template:

cp .env.example .env

Example .env values:

OPENAI_API_KEY=your-openai-key
ANTHROPIC_API_KEY=your-anthropic-key
GROK_API_KEY=your-grok-key

Start server

make start

Default address from config:

  • http://127.0.0.1:8080

Test request (non-streaming)

curl -X POST http://127.0.0.1:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llm-consensus-balanced",
    "messages": [
      {"role": "user", "content": "Give me a 5-step plan to learn Go."}
    ],
    "stream": false
  }'

Test request (streaming)

curl -N http://127.0.0.1:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llm-consensus-balanced",
    "messages": [
      {"role": "user", "content": "Explain Go interfaces simply."}
    ],
    "stream": true
  }'

Presets and Virtual Models

Virtual aliases configured in config.yaml:

  • llm-consensus-fast -> fast
  • llm-consensus-balanced -> balanced
  • llm-consensus-paranoid -> paranoid
  • llm-consensus-debug -> debug

Preset fields:

  • max_rounds: max vote/revise loops
  • strict_unanimity: unanimous or majority acceptance
  • output_mode: clean, debug, or audit
  • max_total_tokens: token budget across all phases (0 = unlimited); debate halts early and returns best candidate when exceeded
  • max_retries: per-agent retry attempts on transient errors (exponential backoff: 250 ms → 500 ms → 1 s)

Output Modes

  • clean: final answer text only
  • debug: human-readable phase transcript summary
  • audit: structured JSON transcript

output.default_mode in config.yaml controls default behavior unless a preset overrides it.

API Reference

GET /health

Basic liveness check.

Response:

{ "status": "ok" }

GET /metrics

Prometheus metrics endpoint. Exposes:

Metric Type Labels
llm_consensus_debate_total counter model, status (consensus/fallback/error)
llm_consensus_phase_duration_seconds histogram phase
llm_consensus_tokens_total counter phase, token_type (prompt/completion)
llm_consensus_active_debates gauge
llm_consensus_consensus_reached_total counter

GET /v1/models

Returns an OpenAI-compatible model list derived from config.yaml. Includes all virtual model aliases and raw agent model names.

GET /v1/debate/{id}/transcript

Retrieves the stored transcript for a completed debate by its ID. The ID is returned in the X-Debate-ID response header of the corresponding /v1/chat/completions call.

Transcripts are held in memory for 1 hour then evicted.

Returns 404 if the ID is unknown or expired.

POST /v1/chat/completions

OpenAI-compatible chat endpoint.

Request fields:

  • model (required)
  • messages (required)
  • stream (optional)
  • callback_url (optional): HTTP/HTTPS URL to receive a POST with the full DebateResult JSON once the debate completes (fired asynchronously, does not affect the response)

Response headers:

  • X-Debate-ID: unique ID for the debate; use with GET /v1/debate/{id}/transcript

Non-streaming returns one completion object.

Streaming returns a sequence of named Server-Sent Events followed by a terminal marker:

event: stage_start
data: {"phase":"draft"}

event: stage_start
data: {"phase":"critique"}

event: stage_start
data: {"phase":"synthesize"}

event: stage_start
data: {"phase":"vote"}

event: stage_complete
data: {"phase":"draft","usage":{"prompt_tokens":120,"completion_tokens":80,"total_tokens":200}}

... (one stage_complete per completed phase)

event: answer_chunk
data: {"index":0,"delta":{"content":"word "}}

... (one answer_chunk per word of the final answer)

event: answer_chunk
data: {"index":0,"delta":{"content":"last "},"finish_reason":"stop"}

event: usage_summary
data: {
  "total":{"prompt_tokens":480,"completion_tokens":320,"total_tokens":800},
  "per_phase":[{"phase":"draft","usage":{...}}, ...],
  "per_agent":[{"agent":"analyst","phase":"draft","usage":{...}}, ...],
  "consensus_confidence":0.75,
  "token_budget_exceeded":false
}

data: [DONE]

Event types:

Event When emitted Payload
stage_start Immediately before each debate phase begins phase name
stage_complete After each phase completes phase name + usage counters
answer_chunk Each word of the final answer index, delta.content, optional finish_reason
usage_summary After all answer chunks total, per_phase, per_agent token counts, consensus_confidence (0–1), token_budget_exceeded
error (terminating) On fatal debate failure message (sanitized), followed by [DONE]

Terminal marker:

data: [DONE]

Configuration

All runtime config lives in config.yaml:

  • server: host/port
  • agents: debate participants (name, role, model, provider, base_url, api_key)
  • debate: global defaults (max_rounds, strict_unanimity)
  • output: default output mode
  • virtual_models: alias mapping to presets
  • presets: named behavior profiles (each supports max_total_tokens, max_retries)
  • otel.endpoint: OTLP HTTP endpoint for tracing (e.g. http://localhost:4318); leave empty to disable

Project Structure

cmd/llm-consensus/
  main.go                    # server bootstrap

internal/
  config/config.go           # YAML + env loading, preset resolution
  handler/chat.go            # /health, /v1/models, /v1/chat/completions
  debate/orchestrator.go     # phase orchestration
  debate/prompts.go          # phase prompt templates
  debate/consensus.go        # vote parsing and consensus rules
  debate/transcript.go       # debug/audit transcript formatting
  provider/client.go         # provider factory
  provider/openai.go         # OpenAI-compatible provider
  provider/anthropic.go      # Anthropic provider
  store/transcript_store.go  # TTL in-memory transcript store
  metrics/metrics.go         # Prometheus metric definitions
  telemetry/tracer.go        # OpenTelemetry tracer init
  types/types.go             # shared request/response interfaces

Troubleshooting

Model does not exist

  • Verify model ID against provider account models API.
  • Update agent model values in config.yaml.
  • Restart the service.

Missing API keys

  • Ensure .env exists and has required keys.
  • Ensure api_key placeholders in config.yaml match variable names.
  • Start via make start so .env is loaded.

Provider 400 during draft phase

  • Validate model name/provider pairing in config.yaml.
  • Check API key permissions and quota.
  • Check server logs for provider-specific error payload.

What This Is Not

  • Not a database-backed system
  • Not a web dashboard
  • Not a tool-execution runtime
  • Not long-term memory across requests

License

MIT

About

Lightweight Go proxy that runs multi-LLM debates until consensus is reached. OpenAI-compatible API with presets (fast/balanced/paranoid).

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages