feat(prompt): runtime.prompt.inline_schemas flag — unblock local Ollama audits#1468
Draft
nextlevelshit wants to merge 2 commits into
Draft
feat(prompt): runtime.prompt.inline_schemas flag — unblock local Ollama audits#1468nextlevelshit wants to merge 2 commits into
nextlevelshit wants to merge 2 commits into
Conversation
The contract prompt builder dumps the full json_schema body inline (`buildContractPrompt`, executor.go:~4940). On a 50KB shared schema, that consumes ~12-15K tokens — fine for frontier models, fatal for local Ollama models capped at 32K context. With the dump in place, glm-4.7-flash hangs on `audit-doc-scan` indefinitely (GPU 100%, 0 tokens emitted). The skeleton + required-fields hint that the same function generates right after is enough for any model that handled the YAML pipeline at all. The full schema is the wasteful part. - Add `runtime.prompt.inline_schemas *bool` to the manifest. Default true (no behavior change for existing repos). Set false to drop the full dump and keep only the schema-path reference + skeleton. - Pass `&execution.Manifest.Runtime.Prompt` through to `buildContractPrompt`. The function takes a third `*PromptConfig` arg; nil preserves historical behavior so callers in tests stay untouched. - New table-style test `TestContractPrompt_InlineSchemasDisabled` asserts: skeleton + required-fields kept, schema reference kept, full schema body dropped.
Three knobs aligned for local-Ollama runs on this host: - runtime.default_timeout_minutes: 30 -> 90 - runtime.timeouts.step_default_minutes: 15 -> 90 - runtime.stall_timeout: 10m -> 90m GLM-4.7-flash and qwen3.5:27b take longer than frontier models on real audit prompts. The previous defaults treated even a successful local run as a failure. Plus runtime.prompt.inline_schemas: false (the new flag from the companion commit) — frees ~12-15K tokens per step that GLM was spending on the schema dump alone. Together these unblock audit-* and impl-* on local; without them local was effectively limited to single-step local-* pipelines.
Draft
6 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
buildContractPromptinlines the full json_schema body (~50KB / ~12-15K tokens) into every step prompt. Frontier models tolerate it; local Ollama models at 32K context choke. Concrete observation:audit-doc-scanonopencode-glmhangs at GPU 100%, 0 tokens for 60+ min before timing out, regardless of timeout settings — the model never returns a streamable completion when the prompt is that heavy.The same function builds a required-fields skeleton + property hints right after the schema dump. The skeleton alone is enough for any model that handled the YAML pipeline at all.
Change
runtime.prompt.inline_schemas: bool. Defaults totrue(zero-change for existing repos). Set tofalseto drop the full schema body and keep only the schema-path reference + the skeleton.buildContractPrompttakes a third*PromptConfigarg; nil preserves historical behavior so test callers stay valid.wave.yamlflips the flag tofalseand bumps the local timeouts to 90m so live runs on glm-4.7-flash and qwen3.5:27b actually complete.Side bumps in this repo's wave.yaml
runtime.default_timeout_minutesruntime.timeouts.step_default_minutesruntime.stall_timeoutruntime.prompt.inline_schemasTest plan
go test ./...— full suite greengolangci-lint run ./...— 0 issuesTestContractPrompt_InlineSchemasDisabledcovers the slimmed prompt path: skeleton kept, schema reference kept, full body dropped.go run ./cmd/wave validate --all— clean against current pipelinesaudit-doc-scanagainstopencode-glm— pending (will post the run ID + result in a comment)Why default-true matters
Frontier models perform measurably better with the full schema visible (no degradation observed in the existing pipeline runs on Claude). The flag exists for the local-model envelope, not as a global behavior change.