Multi-agent debate orchestration behind an OpenAI-compatible chat endpoint.
LLM-Consensus fans a request out to multiple agents, runs a structured debate loop (draft, critique, synthesize, vote, revise), and returns one final response.
It aims to improve answer quality and reasoning via multi-agent critique and voting. It can reduce some hallucinations from single-pass generation, but it is complementary to RAG rather than a replacement for retrieval-grounded answers.
Client LLM-Consensus Providers
| | |
| POST /v1/chat/completions | |
|-------------------------------->| |
| | 1) Draft (all agents) |
| |--------------------------------->|
| |<---------------------------------|
| | 2) Critique (all agents) |
| |--------------------------------->|
| |<---------------------------------|
| | 3) Synthesize candidate |
| |--------------------------------->|
| |<---------------------------------|
| | 4) Vote (JSON) |
| |--------------------------------->|
| |<---------------------------------|
| | 5) Consensus? revise/fallback |
| | |
| Final response | |
|<--------------------------------| |
- Go 1.25+
- At least one provider API key
Copy and edit env template:
cp .env.example .envExample .env values:
OPENAI_API_KEY=your-openai-key
ANTHROPIC_API_KEY=your-anthropic-key
GROK_API_KEY=your-grok-keymake startDefault address from config:
http://127.0.0.1:8080
curl -X POST http://127.0.0.1:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "llm-consensus-balanced",
"messages": [
{"role": "user", "content": "Give me a 5-step plan to learn Go."}
],
"stream": false
}'curl -N http://127.0.0.1:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "llm-consensus-balanced",
"messages": [
{"role": "user", "content": "Explain Go interfaces simply."}
],
"stream": true
}'Virtual aliases configured in config.yaml:
llm-consensus-fast->fastllm-consensus-balanced->balancedllm-consensus-paranoid->paranoidllm-consensus-debug->debug
Preset fields:
max_rounds: max vote/revise loopsstrict_unanimity: unanimous or majority acceptanceoutput_mode:clean,debug, orauditmax_total_tokens: token budget across all phases (0 = unlimited); debate halts early and returns best candidate when exceededmax_retries: per-agent retry attempts on transient errors (exponential backoff: 250 ms → 500 ms → 1 s)
clean: final answer text onlydebug: human-readable phase transcript summaryaudit: structured JSON transcript
output.default_mode in config.yaml controls default behavior unless a preset overrides it.
Basic liveness check.
Response:
{ "status": "ok" }Prometheus metrics endpoint. Exposes:
| Metric | Type | Labels |
|---|---|---|
llm_consensus_debate_total |
counter | model, status (consensus/fallback/error) |
llm_consensus_phase_duration_seconds |
histogram | phase |
llm_consensus_tokens_total |
counter | phase, token_type (prompt/completion) |
llm_consensus_active_debates |
gauge | — |
llm_consensus_consensus_reached_total |
counter | — |
Returns an OpenAI-compatible model list derived from config.yaml. Includes all virtual model aliases and raw agent model names.
Retrieves the stored transcript for a completed debate by its ID. The ID is returned in the X-Debate-ID response header of the corresponding /v1/chat/completions call.
Transcripts are held in memory for 1 hour then evicted.
Returns 404 if the ID is unknown or expired.
OpenAI-compatible chat endpoint.
Request fields:
model(required)messages(required)stream(optional)callback_url(optional): HTTP/HTTPS URL to receive a POST with the fullDebateResultJSON once the debate completes (fired asynchronously, does not affect the response)
Response headers:
X-Debate-ID: unique ID for the debate; use withGET /v1/debate/{id}/transcript
Non-streaming returns one completion object.
Streaming returns a sequence of named Server-Sent Events followed by a terminal marker:
event: stage_start
data: {"phase":"draft"}
event: stage_start
data: {"phase":"critique"}
event: stage_start
data: {"phase":"synthesize"}
event: stage_start
data: {"phase":"vote"}
event: stage_complete
data: {"phase":"draft","usage":{"prompt_tokens":120,"completion_tokens":80,"total_tokens":200}}
... (one stage_complete per completed phase)
event: answer_chunk
data: {"index":0,"delta":{"content":"word "}}
... (one answer_chunk per word of the final answer)
event: answer_chunk
data: {"index":0,"delta":{"content":"last "},"finish_reason":"stop"}
event: usage_summary
data: {
"total":{"prompt_tokens":480,"completion_tokens":320,"total_tokens":800},
"per_phase":[{"phase":"draft","usage":{...}}, ...],
"per_agent":[{"agent":"analyst","phase":"draft","usage":{...}}, ...],
"consensus_confidence":0.75,
"token_budget_exceeded":false
}
data: [DONE]
Event types:
| Event | When emitted | Payload |
|---|---|---|
stage_start |
Immediately before each debate phase begins | phase name |
stage_complete |
After each phase completes | phase name + usage counters |
answer_chunk |
Each word of the final answer | index, delta.content, optional finish_reason |
usage_summary |
After all answer chunks | total, per_phase, per_agent token counts, consensus_confidence (0–1), token_budget_exceeded |
error (terminating) |
On fatal debate failure | message (sanitized), followed by [DONE] |
Terminal marker:
data: [DONE]
All runtime config lives in config.yaml:
server: host/portagents: debate participants (name,role,model,provider,base_url,api_key)debate: global defaults (max_rounds,strict_unanimity)output: default output modevirtual_models: alias mapping to presetspresets: named behavior profiles (each supportsmax_total_tokens,max_retries)otel.endpoint: OTLP HTTP endpoint for tracing (e.g.http://localhost:4318); leave empty to disable
cmd/llm-consensus/
main.go # server bootstrap
internal/
config/config.go # YAML + env loading, preset resolution
handler/chat.go # /health, /v1/models, /v1/chat/completions
debate/orchestrator.go # phase orchestration
debate/prompts.go # phase prompt templates
debate/consensus.go # vote parsing and consensus rules
debate/transcript.go # debug/audit transcript formatting
provider/client.go # provider factory
provider/openai.go # OpenAI-compatible provider
provider/anthropic.go # Anthropic provider
store/transcript_store.go # TTL in-memory transcript store
metrics/metrics.go # Prometheus metric definitions
telemetry/tracer.go # OpenTelemetry tracer init
types/types.go # shared request/response interfaces
- Verify model ID against provider account models API.
- Update agent
modelvalues inconfig.yaml. - Restart the service.
- Ensure
.envexists and has required keys. - Ensure
api_keyplaceholders inconfig.yamlmatch variable names. - Start via
make startso.envis loaded.
- Validate model name/provider pairing in
config.yaml. - Check API key permissions and quota.
- Check server logs for provider-specific error payload.
- Not a database-backed system
- Not a web dashboard
- Not a tool-execution runtime
- Not long-term memory across requests
MIT