Skip to content

feat: add --dotenv flag to override model profile with .env values#59

Open
haroldship wants to merge 12 commits into
mainfrom
feat/dotenv-model-override
Open

feat: add --dotenv flag to override model profile with .env values#59
haroldship wants to merge 12 commits into
mainfrom
feat/dotenv-model-override

Conversation

@haroldship

@haroldship haroldship commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Adds --dotenv flag to all eval.sh and compare.sh scripts (top-level dispatchers and all four per-benchmark scripts).
  • When --dotenv is set, after applying the named model profile, the entire .env is re-read and force-exported — so .env always wins over the profile for model configuration.
  • If no --model-profile is given alongside --dotenv, defaults to gpt-oss as the base profile.
  • CLI overrides (--model-name, --openai-base-url) still take highest precedence.

Precedence order (lowest → highest):

load_env.sh (.env, no-override)
→ apply_model_profile (force-exports profile vars)
→ apply_dotenv_model_overrides (.env, force-exports ALL vars)   ← new, only when --dotenv
→ CLI overrides (--model-name, --openai-base-url)

Example usage:

# .env entirely controls model config (gpt-oss as default base)
./scripts/eval.sh --benchmark bpo --dotenv

# gpt4o as base, .env overrides specific vars on top
./scripts/eval.sh --benchmark bpo --model-profile gpt4o --dotenv

# Existing behaviour unchanged
./scripts/eval.sh --benchmark bpo --model-profile gpt-oss

Closes #58

Test plan

  • 8/8 bash unit tests pass (benchmarks/helpers/tests/test_model_config.sh)
    • apply_dotenv_model_overrides: overrides vars, no-op when .env missing, strips quotes/comments, handles export-prefixed lines
    • apply_model_config: USE_DOTENV=false behaves like profile-only; USE_DOTENV=true with profile → .env wins; .env omits var → profile value kept; no profile + USE_DOTENV=true → defaults to gpt-oss
  • Dry-run smoke test: --dotenv alone shows gpt-oss profile applied then .env force-exports on top
  • Dry-run smoke test: --model-profile gpt4o --dotenv shows gpt4o profile applied first, then .env overrides MODEL_NAME
  • Dry-run smoke test: --model-profile gpt-oss (no --dotenv) shows no .env overrides section — existing behaviour unchanged

Summary by CodeRabbit

  • New Features

    • Added --dotenv command-line flag to eval and comparison scripts for using .env file values to override model profile configuration.
  • Documentation

    • Updated README with --dotenv flag documentation and usage examples.
  • Tests

    • Added test suite for model configuration logic, including .env override behavior.

@coderabbitai

coderabbitai Bot commented Jun 9, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@haroldship, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 13 minutes and 23 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1ebfb4c1-c93d-4e8c-a83c-c286bbf5c3d7

📥 Commits

Reviewing files that changed from the base of the PR and between 6c0c8e9 and 6f4be7f.

📒 Files selected for processing (1)
  • benchmarks/helpers/tests/test_model_config.sh
📝 Walkthrough

Walkthrough

This PR implements a --dotenv flag that allows .env file variables to override model profile settings in benchmark evaluation and comparison flows. The feature adds a new precedence layer: .env can override profile defaults while still allowing CLI arguments to take final precedence. When no explicit profile is provided with --dotenv, the system defaults to the gpt-oss base profile.

Changes

--dotenv Feature Implementation

Layer / File(s) Summary
Core .env override infrastructure and tests
benchmarks/helpers/common.sh, benchmarks/helpers/tests/test_model_config.sh
Introduces USE_DOTENV global variable, apply_dotenv_model_overrides() function that parses .env files (supporting export KEY=VALUE syntax, quote stripping, inline comment removal), and updates apply_model_config() to layer .env overrides after profile application. Tests validate parsing logic and precedence order (profile → .env → CLI).
Top-level script integration
scripts/eval.sh, scripts/compare.sh
Both scripts add --dotenv to help text and export USE_DOTENV variable for downstream use.
Per-benchmark compare.sh integration
benchmarks/appworld/compare.sh, benchmarks/bpo/compare.sh, benchmarks/m3/compare.sh, benchmarks/oak_health_insurance/compare.sh
Each benchmark's compare script adds USE_DOTENV variable initialization, --dotenv flag parsing, and replaces apply_model_profile "$model" with apply_model_config "$model" to activate the new override logic.
Documentation
README.md
Documents --dotenv flag behavior and provides usage examples showing interaction with --model-profile and .env value precedence.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding a --dotenv flag to override model profiles with .env values, which is the primary feature across all modified scripts.
Linked Issues check ✅ Passed The PR fully implements all coding requirements from issue #58: new functions (apply_dotenv_model_overrides, apply_model_config), --dotenv flag parsing in all scripts, proper precedence handling, per-benchmark script updates, and comprehensive unit tests.
Out of Scope Changes check ✅ Passed All changes are directly related to implementing the --dotenv feature: helper functions, argument parsing, per-benchmark updates, tests, and documentation. No unrelated modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/dotenv-model-override

Warning

Review ran into problems

🔥 Problems

Stopped waiting for pipeline failures after 30000ms. One of your pipelines takes longer than our 30000ms fetch window to run, so review may not consider pipeline-failure results for inline comments if any failures occurred after the fetch window. Increase the timeout if you want to wait longer or run a @coderabbit review after the pipeline has finished.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@haroldship haroldship requested a review from Sergey-Zeltyn June 9, 2026 14:11

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
benchmarks/helpers/tests/test_model_config.sh (1)

30-30: 💤 Low value

Consider single-quoting trap commands to defer variable expansion.

Lines 30, 50, 60, 83, 94, and 105 use trap "rm -f $TMP" EXIT, which expands $TMP at trap-set time. While functionally correct here (since $TMP doesn't change), shellcheck SC2064 recommends trap 'rm -f "$TMP"' EXIT to make the deferred expansion explicit and avoid potential pitfalls if the variable were modified later.

♻️ Recommended fix for shell hygiene
-    TMP=$(mktemp); trap "rm -f $TMP" EXIT
+    TMP=$(mktemp); trap 'rm -f "$TMP"' EXIT

Apply this pattern to all six occurrences.

Also applies to: 50-50, 60-60, 83-83, 94-94, 105-105

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@benchmarks/helpers/tests/test_model_config.sh` at line 30, The trap commands
use double quotes causing $TMP to be expanded immediately; update each
occurrence of TMP created via mktemp where you call trap "rm -f $TMP" EXIT to
use single-quoted deferred expansion (e.g., change to trap 'rm -f "$TMP"' EXIT)
so the removal happens with the variable evaluated at trap execution time—apply
this change for all six occurrences that reference TMP.

Source: Linters/SAST tools

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@benchmarks/helpers/tests/test_model_config.sh`:
- Line 30: The trap commands use double quotes causing $TMP to be expanded
immediately; update each occurrence of TMP created via mktemp where you call
trap "rm -f $TMP" EXIT to use single-quoted deferred expansion (e.g., change to
trap 'rm -f "$TMP"' EXIT) so the removal happens with the variable evaluated at
trap execution time—apply this change for all six occurrences that reference
TMP.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: bce075a7-17f0-49d1-b06c-8b6ead0a4486

📥 Commits

Reviewing files that changed from the base of the PR and between 142c29a and 6c0c8e9.

📒 Files selected for processing (9)
  • README.md
  • benchmarks/appworld/compare.sh
  • benchmarks/bpo/compare.sh
  • benchmarks/helpers/common.sh
  • benchmarks/helpers/tests/test_model_config.sh
  • benchmarks/m3/compare.sh
  • benchmarks/oak_health_insurance/compare.sh
  • scripts/compare.sh
  • scripts/eval.sh

@haroldship

Copy link
Copy Markdown
Collaborator Author

Fixed in 6f4be7f — all six trap commands now use single-quoted deferred expansion (trap 'rm -f "$TMP"' EXIT). 8/8 tests still pass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Add --dotenv flag to layer .env overrides on top of model profiles

1 participant