Proposal: eval_defaults on Step

# Proposal: eval_defaults on Step

## Problem

Steps with dynamic prompts (`system input[:system_message]`) force evals to provide the full system_message in `default_input`. This creates two sources of truth — the production method that builds the system_message (in a service/concern) and a hardcoded copy in the eval. When the prompt changes, the eval drifts silently.

**Example incident:** an eval for a link-insertion step had a stripped-down system_message missing "when to skip" rules. The model always inserted links, even in unrelated comments. Live `optimize` reported 0.00 while manual tests with the production prompt passed 5/5. Root cause took 30 minutes to find — the eval prompt had drifted from production.

## Proposed API

\`\`\`ruby
class InsertLink < RubyLLM::Contract::Step::Base
  prompt do |input|
    system input[:system_message]
    user input[:prompt_text]
  end

  eval_defaults do
    { system_message: MyApp::Prompts.link_insertion_system_message }
  end
end
\`\`\`

Eval definitions inherit eval_defaults merged into default_input:

\`\`\`ruby
InsertLink.define_eval("smoke") do
  # system_message automatically provided by eval_defaults — no duplication
  default_input({
    prompt_text: "[ORIGINAL COMMENT]\n...",
    original_comment: "...",
    allowed_urls: ["https://example.com/page"]
  })

  sample_response({ comment: "...", link_inserted: true, ... })
  verify "link inserted", expect: ->(o) { o[:link_inserted] }
end
\`\`\`

Eval can still override system_message in default_input if needed (explicit wins over default).

## When this helps

- **Step has \`system input[:system_message]\`** — prompt comes from a service, not from the step itself. The service builds it from persona, language, voice rules, etc. Eval needs the same prompt but has no access to the service.
- **Multiple evals per step** — each eval would otherwise duplicate the same system_message. With eval_defaults, it's defined once on the step.
- **Prompt iteration** — when you change the production prompt, evals automatically pick up the change. No manual sync.

## When this is unnecessary

- **Step has a static prompt** — \`system "You classify tickets..."\` or \`system RUBRIC_CONSTANT\`. The prompt lives on the step, not in external services. Eval already tests the real prompt without needing eval_defaults.
- **Step has \`prompt "Classify: {input}"\`** — simple string prompt, no system_message in input. Nothing to default.
- **One eval per step** — the duplication cost is low. A support module (current workaround) is fine.

## Data from a real project

11 steps total. Prompt patterns:

| pattern | count | eval_defaults needed? |
|---|---|---|
| \`system input[:system_message]\` (dynamic from service) | 4 | yes |
| \`system <<~SYS\` (inline static) | 3 | no |
| \`system CONSTANT\` | 2 | no |
| \`system "string"\` (one-liner static) | 1 | no |

4/11 steps would benefit. The 3 that already have evals use a workaround — a support module that includes the production prompts concern and delegates. It works but is boilerplate that eval_defaults would eliminate.

## Current workaround

\`\`\`ruby
module EvalSupport
  class PromptHost
    include MyApp::Prompts

    def self.system_message
      new.system_message_for_link_insertion
    end
  end
end

# In eval:
InsertLink.define_eval("smoke") do
  default_input({ system_message: EvalSupport::PromptHost.system_message, ... })
end
\`\`\`

Works, but eval authors must know to use it instead of hardcoding. Easy to forget — as the incident showed.

## Implementation sketch

\`\`\`ruby
# In Step::Base
def self.eval_defaults(&block)
  @eval_defaults_block = block
end

def self.resolved_eval_defaults
  @eval_defaults_block&.call || {}
end

# In EvalDefinition#build_dataset
def effective_default_input
  step.resolved_eval_defaults.merge(@default_input || {})
end
\`\`\`

Lazy evaluation (block, not hash) so production methods are called at eval time, not at class load time.

## Decision

Not blocking — workaround exists and is used in production. Consider for 0.7 if more projects report the same drift issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: eval_defaults on Step — single source of truth for dynamic prompts #5

Problem

Proposed API

system_message automatically provided by eval_defaults — no duplication

When this helps

When this is unnecessary

Data from a real project

Current workaround

In eval:

Implementation sketch

In Step::Base

In EvalDefinition#build_dataset

Decision

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

pattern	count	eval_defaults needed?
`system input[:system_message]` (dynamic from service)	4	yes
`system <<~SYS` (inline static)	3	no
`system CONSTANT`	2	no
`system "string"` (one-liner static)	1	no

Proposal: eval_defaults on Step — single source of truth for dynamic prompts #5

Description

Proposal: eval_defaults on Step

Problem

Proposed API

system_message automatically provided by eval_defaults — no duplication

When this helps

When this is unnecessary

Data from a real project

Current workaround

In eval:

Implementation sketch

In Step::Base

In EvalDefinition#build_dataset

Decision

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions