Proposal: eval_defaults on Step
Problem
Steps with dynamic prompts (system input[:system_message]) force evals to provide the full system_message in default_input. This creates two sources of truth — the production method that builds the system_message (in a service/concern) and a hardcoded copy in the eval. When the prompt changes, the eval drifts silently.
Example incident: an eval for a link-insertion step had a stripped-down system_message missing "when to skip" rules. The model always inserted links, even in unrelated comments. Live optimize reported 0.00 while manual tests with the production prompt passed 5/5. Root cause took 30 minutes to find — the eval prompt had drifted from production.
Proposed API
```ruby
class InsertLink < RubyLLM::Contract::Step::Base
prompt do |input|
system input[:system_message]
user input[:prompt_text]
end
eval_defaults do
{ system_message: MyApp::Prompts.link_insertion_system_message }
end
end
```
Eval definitions inherit eval_defaults merged into default_input:
```ruby
InsertLink.define_eval("smoke") do
system_message automatically provided by eval_defaults — no duplication
default_input({
prompt_text: "[ORIGINAL COMMENT]\n...",
original_comment: "...",
allowed_urls: ["https://example.com/page"]
})
sample_response({ comment: "...", link_inserted: true, ... })
verify "link inserted", expect: ->(o) { o[:link_inserted] }
end
```
Eval can still override system_message in default_input if needed (explicit wins over default).
When this helps
- Step has `system input[:system_message]` — prompt comes from a service, not from the step itself. The service builds it from persona, language, voice rules, etc. Eval needs the same prompt but has no access to the service.
- Multiple evals per step — each eval would otherwise duplicate the same system_message. With eval_defaults, it's defined once on the step.
- Prompt iteration — when you change the production prompt, evals automatically pick up the change. No manual sync.
When this is unnecessary
- Step has a static prompt — `system "You classify tickets..."` or `system RUBRIC_CONSTANT`. The prompt lives on the step, not in external services. Eval already tests the real prompt without needing eval_defaults.
- Step has `prompt "Classify: {input}"` — simple string prompt, no system_message in input. Nothing to default.
- One eval per step — the duplication cost is low. A support module (current workaround) is fine.
Data from a real project
11 steps total. Prompt patterns:
| pattern |
count |
eval_defaults needed? |
| `system input[:system_message]` (dynamic from service) |
4 |
yes |
| `system <<~SYS` (inline static) |
3 |
no |
| `system CONSTANT` |
2 |
no |
| `system "string"` (one-liner static) |
1 |
no |
4/11 steps would benefit. The 3 that already have evals use a workaround — a support module that includes the production prompts concern and delegates. It works but is boilerplate that eval_defaults would eliminate.
Current workaround
```ruby
module EvalSupport
class PromptHost
include MyApp::Prompts
def self.system_message
new.system_message_for_link_insertion
end
end
end
In eval:
InsertLink.define_eval("smoke") do
default_input({ system_message: EvalSupport::PromptHost.system_message, ... })
end
```
Works, but eval authors must know to use it instead of hardcoding. Easy to forget — as the incident showed.
Implementation sketch
```ruby
In Step::Base
def self.eval_defaults(&block)
@eval_defaults_block = block
end
def self.resolved_eval_defaults
@eval_defaults_block&.call || {}
end
In EvalDefinition#build_dataset
def effective_default_input
step.resolved_eval_defaults.merge(@default_input || {})
end
```
Lazy evaluation (block, not hash) so production methods are called at eval time, not at class load time.
Decision
Not blocking — workaround exists and is used in production. Consider for 0.7 if more projects report the same drift issue.
Proposal: eval_defaults on Step
Problem
Steps with dynamic prompts (
system input[:system_message]) force evals to provide the full system_message indefault_input. This creates two sources of truth — the production method that builds the system_message (in a service/concern) and a hardcoded copy in the eval. When the prompt changes, the eval drifts silently.Example incident: an eval for a link-insertion step had a stripped-down system_message missing "when to skip" rules. The model always inserted links, even in unrelated comments. Live
optimizereported 0.00 while manual tests with the production prompt passed 5/5. Root cause took 30 minutes to find — the eval prompt had drifted from production.Proposed API
```ruby
class InsertLink < RubyLLM::Contract::Step::Base
prompt do |input|
system input[:system_message]
user input[:prompt_text]
end
eval_defaults do
{ system_message: MyApp::Prompts.link_insertion_system_message }
end
end
```
Eval definitions inherit eval_defaults merged into default_input:
```ruby
InsertLink.define_eval("smoke") do
system_message automatically provided by eval_defaults — no duplication
default_input({
prompt_text: "[ORIGINAL COMMENT]\n...",
original_comment: "...",
allowed_urls: ["https://example.com/page"]
})
sample_response({ comment: "...", link_inserted: true, ... })
verify "link inserted", expect: ->(o) { o[:link_inserted] }
end
```
Eval can still override system_message in default_input if needed (explicit wins over default).
When this helps
When this is unnecessary
Data from a real project
11 steps total. Prompt patterns:
4/11 steps would benefit. The 3 that already have evals use a workaround — a support module that includes the production prompts concern and delegates. It works but is boilerplate that eval_defaults would eliminate.
Current workaround
```ruby
module EvalSupport
class PromptHost
include MyApp::Prompts
end
end
In eval:
InsertLink.define_eval("smoke") do
default_input({ system_message: EvalSupport::PromptHost.system_message, ... })
end
```
Works, but eval authors must know to use it instead of hardcoding. Easy to forget — as the incident showed.
Implementation sketch
```ruby
In Step::Base
def self.eval_defaults(&block)
@eval_defaults_block = block
end
def self.resolved_eval_defaults
@eval_defaults_block&.call || {}
end
In EvalDefinition#build_dataset
def effective_default_input
step.resolved_eval_defaults.merge(@default_input || {})
end
```
Lazy evaluation (block, not hash) so production methods are called at eval time, not at class load time.
Decision
Not blocking — workaround exists and is used in production. Consider for 0.7 if more projects report the same drift issue.