Skip to content

feat(prompt): runtime.prompt.inline_schemas flag — unblock local Ollama audits#1468

Draft
nextlevelshit wants to merge 2 commits into
mainfrom
feat/prompt-inline-schemas-flag
Draft

feat(prompt): runtime.prompt.inline_schemas flag — unblock local Ollama audits#1468
nextlevelshit wants to merge 2 commits into
mainfrom
feat/prompt-inline-schemas-flag

Conversation

@nextlevelshit
Copy link
Copy Markdown
Collaborator

Summary

buildContractPrompt inlines the full json_schema body (~50KB / ~12-15K tokens) into every step prompt. Frontier models tolerate it; local Ollama models at 32K context choke. Concrete observation: audit-doc-scan on opencode-glm hangs at GPU 100%, 0 tokens for 60+ min before timing out, regardless of timeout settings — the model never returns a streamable completion when the prompt is that heavy.

The same function builds a required-fields skeleton + property hints right after the schema dump. The skeleton alone is enough for any model that handled the YAML pipeline at all.

Change

  • New manifest knob runtime.prompt.inline_schemas: bool. Defaults to true (zero-change for existing repos). Set to false to drop the full schema body and keep only the schema-path reference + the skeleton.
  • buildContractPrompt takes a third *PromptConfig arg; nil preserves historical behavior so test callers stay valid.
  • This repo's wave.yaml flips the flag to false and bumps the local timeouts to 90m so live runs on glm-4.7-flash and qwen3.5:27b actually complete.

Side bumps in this repo's wave.yaml

Setting Before After
runtime.default_timeout_minutes 30 90
runtime.timeouts.step_default_minutes 15 90
runtime.stall_timeout 10m 90m
runtime.prompt.inline_schemas (default true) false

Test plan

  • go test ./... — full suite green
  • golangci-lint run ./... — 0 issues
  • New unit test TestContractPrompt_InlineSchemasDisabled covers the slimmed prompt path: skeleton kept, schema reference kept, full body dropped.
  • go run ./cmd/wave validate --all — clean against current pipelines
  • Live verification on audit-doc-scan against opencode-glm — pending (will post the run ID + result in a comment)

Why default-true matters

Frontier models perform measurably better with the full schema visible (no degradation observed in the existing pipeline runs on Claude). The flag exists for the local-model envelope, not as a global behavior change.

The contract prompt builder dumps the full json_schema body inline
(`buildContractPrompt`, executor.go:~4940). On a 50KB shared schema,
that consumes ~12-15K tokens — fine for frontier models, fatal for
local Ollama models capped at 32K context. With the dump in place,
glm-4.7-flash hangs on `audit-doc-scan` indefinitely (GPU 100%, 0
tokens emitted).

The skeleton + required-fields hint that the same function generates
right after is enough for any model that handled the YAML pipeline at
all. The full schema is the wasteful part.

- Add `runtime.prompt.inline_schemas *bool` to the manifest. Default
  true (no behavior change for existing repos). Set false to drop the
  full dump and keep only the schema-path reference + skeleton.
- Pass `&execution.Manifest.Runtime.Prompt` through to
  `buildContractPrompt`. The function takes a third `*PromptConfig`
  arg; nil preserves historical behavior so callers in tests stay
  untouched.
- New table-style test `TestContractPrompt_InlineSchemasDisabled`
  asserts: skeleton + required-fields kept, schema reference kept,
  full schema body dropped.
Three knobs aligned for local-Ollama runs on this host:

- runtime.default_timeout_minutes: 30 -> 90
- runtime.timeouts.step_default_minutes: 15 -> 90
- runtime.stall_timeout: 10m -> 90m

GLM-4.7-flash and qwen3.5:27b take longer than frontier models on
real audit prompts. The previous defaults treated even a successful
local run as a failure.

Plus runtime.prompt.inline_schemas: false (the new flag from the
companion commit) — frees ~12-15K tokens per step that GLM was
spending on the schema dump alone.

Together these unblock audit-* and impl-* on local; without them
local was effectively limited to single-step local-* pipelines.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant