LLM-Consensus

Multi-agent debate orchestration behind an OpenAI-compatible chat endpoint.

LLM-Consensus fans a request out to multiple agents, runs a structured debate loop (draft, critique, synthesize, vote, revise), and returns one final response.

It aims to improve answer quality and reasoning via multi-agent critique and voting. It can reduce some hallucinations from single-pass generation, but it is complementary to RAG rather than a replacement for retrieval-grounded answers.

How It Works

Client                        LLM-Consensus                        Providers
  |                                 |                                  |
  | POST /v1/chat/completions       |                                  |
  |-------------------------------->|                                  |
  |                                 | 1) Draft (all agents)            |
  |                                 |--------------------------------->|
  |                                 |<---------------------------------|
  |                                 | 2) Critique (all agents)         |
  |                                 |--------------------------------->|
  |                                 |<---------------------------------|
  |                                 | 3) Synthesize candidate          |
  |                                 |--------------------------------->|
  |                                 |<---------------------------------|
  |                                 | 4) Vote (JSON)                   |
  |                                 |--------------------------------->|
  |                                 |<---------------------------------|
  |                                 | 5) Consensus? revise/fallback    |
  |                                 |                                  |
  | Final response                  |                                  |
  |<--------------------------------|                                  |

Quick Start

Prerequisites

Go 1.25+
At least one provider API key

Environment setup

Copy and edit env template:

cp .env.example .env

Example .env values:

OPENAI_API_KEY=your-openai-key
ANTHROPIC_API_KEY=your-anthropic-key
GROK_API_KEY=your-grok-key

Start server

make start

Default address from config:

http://127.0.0.1:8080

Test request (non-streaming)

curl -X POST http://127.0.0.1:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llm-consensus-balanced",
    "messages": [
      {"role": "user", "content": "Give me a 5-step plan to learn Go."}
    ],
    "stream": false
  }'

Test request (streaming)

curl -N http://127.0.0.1:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llm-consensus-balanced",
    "messages": [
      {"role": "user", "content": "Explain Go interfaces simply."}
    ],
    "stream": true
  }'

Presets and Virtual Models

Virtual aliases configured in config.yaml:

llm-consensus-fast -> fast
llm-consensus-balanced -> balanced
llm-consensus-paranoid -> paranoid
llm-consensus-debug -> debug

Preset fields:

max_rounds: max vote/revise loops
strict_unanimity: unanimous or majority acceptance
output_mode: clean, debug, or audit
max_total_tokens: token budget across all phases (0 = unlimited); debate halts early and returns best candidate when exceeded
max_retries: per-agent retry attempts on transient errors (exponential backoff: 250 ms → 500 ms → 1 s)

Output Modes

clean: final answer text only
debug: human-readable phase transcript summary
audit: structured JSON transcript

output.default_mode in config.yaml controls default behavior unless a preset overrides it.

API Reference

GET /health

Basic liveness check.

Response:

{ "status": "ok" }

GET /metrics

Prometheus metrics endpoint. Exposes:

Metric	Type	Labels
`llm_consensus_debate_total`	counter	`model`, `status` (`consensus`/`fallback`/`error`)
`llm_consensus_phase_duration_seconds`	histogram	`phase`
`llm_consensus_tokens_total`	counter	`phase`, `token_type` (`prompt`/`completion`)
`llm_consensus_active_debates`	gauge	—
`llm_consensus_consensus_reached_total`	counter	—

GET /v1/models

Returns an OpenAI-compatible model list derived from config.yaml. Includes all virtual model aliases and raw agent model names.

GET /v1/debate/{id}/transcript

Retrieves the stored transcript for a completed debate by its ID. The ID is returned in the X-Debate-ID response header of the corresponding /v1/chat/completions call.

Transcripts are held in memory for 1 hour then evicted.

Returns 404 if the ID is unknown or expired.

POST /v1/chat/completions

OpenAI-compatible chat endpoint.

Request fields:

model (required)
messages (required)
stream (optional)
callback_url (optional): HTTP/HTTPS URL to receive a POST with the full DebateResult JSON once the debate completes (fired asynchronously, does not affect the response)

Response headers:

X-Debate-ID: unique ID for the debate; use with GET /v1/debate/{id}/transcript

Non-streaming returns one completion object.

Streaming returns a sequence of named Server-Sent Events followed by a terminal marker:

event: stage_start
data: {"phase":"draft"}

event: stage_start
data: {"phase":"critique"}

event: stage_start
data: {"phase":"synthesize"}

event: stage_start
data: {"phase":"vote"}

event: stage_complete
data: {"phase":"draft","usage":{"prompt_tokens":120,"completion_tokens":80,"total_tokens":200}}

... (one stage_complete per completed phase)

event: answer_chunk
data: {"index":0,"delta":{"content":"word "}}

... (one answer_chunk per word of the final answer)

event: answer_chunk
data: {"index":0,"delta":{"content":"last "},"finish_reason":"stop"}

event: usage_summary
data: {
  "total":{"prompt_tokens":480,"completion_tokens":320,"total_tokens":800},
  "per_phase":[{"phase":"draft","usage":{...}}, ...],
  "per_agent":[{"agent":"analyst","phase":"draft","usage":{...}}, ...],
  "consensus_confidence":0.75,
  "token_budget_exceeded":false
}

data: [DONE]

Event types:

Event	When emitted	Payload
`stage_start`	Immediately before each debate phase begins	`phase` name
`stage_complete`	After each phase completes	`phase` name + `usage` counters
`answer_chunk`	Each word of the final answer	`index`, `delta.content`, optional `finish_reason`
`usage_summary`	After all answer chunks	`total`, `per_phase`, `per_agent` token counts, `consensus_confidence` (0–1), `token_budget_exceeded`
`error` (terminating)	On fatal debate failure	`message` (sanitized), followed by `[DONE]`

Terminal marker:

data: [DONE]

Configuration

All runtime config lives in config.yaml:

server: host/port
agents: debate participants (name, role, model, provider, base_url, api_key)
debate: global defaults (max_rounds, strict_unanimity)
output: default output mode
virtual_models: alias mapping to presets
presets: named behavior profiles (each supports max_total_tokens, max_retries)
otel.endpoint: OTLP HTTP endpoint for tracing (e.g. http://localhost:4318); leave empty to disable

Project Structure

cmd/llm-consensus/
  main.go                    # server bootstrap

internal/
  config/config.go           # YAML + env loading, preset resolution
  handler/chat.go            # /health, /v1/models, /v1/chat/completions
  debate/orchestrator.go     # phase orchestration
  debate/prompts.go          # phase prompt templates
  debate/consensus.go        # vote parsing and consensus rules
  debate/transcript.go       # debug/audit transcript formatting
  provider/client.go         # provider factory
  provider/openai.go         # OpenAI-compatible provider
  provider/anthropic.go      # Anthropic provider
  store/transcript_store.go  # TTL in-memory transcript store
  metrics/metrics.go         # Prometheus metric definitions
  telemetry/tracer.go        # OpenTelemetry tracer init
  types/types.go             # shared request/response interfaces

Troubleshooting

Model does not exist

Verify model ID against provider account models API.
Update agent model values in config.yaml.
Restart the service.

Missing API keys

Ensure .env exists and has required keys.
Ensure api_key placeholders in config.yaml match variable names.
Start via make start so .env is loaded.

Provider 400 during draft phase

Validate model name/provider pairing in config.yaml.
Check API key permissions and quota.
Check server logs for provider-specific error payload.

What This Is Not

Not a database-backed system
Not a web dashboard
Not a tool-execution runtime
Not long-term memory across requests

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
cmd/llm-consensus		cmd/llm-consensus
internal		internal
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
config.yaml		config.yaml
feature.md		feature.md
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM-Consensus

How It Works

Quick Start

Prerequisites

Environment setup

Start server

Test request (non-streaming)

Test request (streaming)

Presets and Virtual Models

Output Modes

API Reference

GET /health

GET /metrics

GET /v1/models

GET /v1/debate/{id}/transcript

POST /v1/chat/completions

Configuration

Project Structure

Troubleshooting

Model does not exist

Missing API keys

Provider 400 during draft phase

What This Is Not

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LLM-Consensus

How It Works

Quick Start

Prerequisites

Environment setup

Start server

Test request (non-streaming)

Test request (streaming)

Presets and Virtual Models

Output Modes

API Reference

GET /health

GET /metrics

GET /v1/models

GET /v1/debate/{id}/transcript

POST /v1/chat/completions

Configuration

Project Structure

Troubleshooting

Model does not exist

Missing API keys

Provider 400 during draft phase

What This Is Not

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages