[Feature]: Add --dotenv flag to layer .env overrides on top of model profiles

## Feature Request

Add a `--dotenv` flag to `eval.sh` and `compare.sh` (both top-level and per-benchmark) that, after applying a named model profile, re-reads `.env` and force-exports every variable found there — so `.env` always takes precedence over the profile for model configuration.

## Motivation / Problem

`apply_model_profile` unconditionally force-exports `AGENT_SETTING_CONFIG`, `MODEL_NAME`, `OPENAI_BASE_URL`, `OPENAI_API_VERSION`, etc., overwriting any values loaded from `.env` by `load_env.sh`. There is currently no way to say "use my `.env` values for model config" — and no way to use a named profile as a default while letting `.env` override specific variables (e.g. to point a standard profile at a different endpoint or API key).

## Use Case

A developer working against a local vLLM instance or a non-standard LiteLLM/WatsonX endpoint wants to:

1. Run `./scripts/eval.sh --benchmark bpo --dotenv` — use whatever `MODEL_NAME`, `OPENAI_BASE_URL`, `WATSONX_PROJECT_ID`, etc. are in `.env`, without specifying a profile at all (defaults to `gpt-oss` as the base).
2. Run `./scripts/eval.sh --benchmark bpo --model-profile gpt4o --dotenv` — use the `gpt4o` profile for everything it sets, but let `.env` override specific variables (e.g. `OPENAI_BASE_URL` to point to a different gateway).
3. Have this work consistently in both `eval.sh` and `compare.sh` flows, including per-benchmark compare loops that call `apply_model_profile` directly.

This is especially useful when adding new inference services (Groq, LiteLLM, WatsonX) or testing against non-standard endpoints without modifying `model_profiles.sh`.

## Proposed Solution

**New flag:** `--dotenv` (recognized by `parse_common_args` in `benchmarks/helpers/common.sh` and by each per-benchmark `compare.sh`)

**Precedence order (lowest → highest):**
```
load_env.sh (.env, no-override)
→ apply_model_profile (force-exports profile vars)
→ apply_dotenv_model_overrides (.env, force-exports ALL vars)   ← new, only when --dotenv
→ CLI overrides (--model-name, --openai-base-url)
```

**New functions in `benchmarks/helpers/common.sh`:**

- `apply_dotenv_model_overrides([env_file])` — re-reads `.env` with override semantics, force-exporting every variable found. Accepts an optional path argument for testability; defaults to `<project_root>/.env` derived from `BASH_SOURCE[0]`.
- `apply_model_config(profile, [env_file])` — wraps `apply_model_profile` + `apply_dotenv_model_overrides`. Defaults to `gpt-oss` when `USE_DOTENV=true` and no profile is given.

**Updated `finalize_model_config`** delegates to `apply_model_config`.

**Per-benchmark `compare.sh` scripts** (bpo, m3, appworld, oak_health_insurance) replace their bare `apply_model_profile "$model"` calls with `apply_model_config "$model"`.

**Examples:**
```bash
# Use .env entirely (gpt-oss as default base)
./scripts/eval.sh --benchmark bpo --dotenv

# gpt4o profile as base, .env overrides on top
./scripts/eval.sh --benchmark bpo --model-profile gpt4o --dotenv

# Existing behaviour unchanged (no --dotenv)
./scripts/eval.sh --benchmark bpo --model-profile gpt-oss
```

## Alternatives Considered

- **Hard-coded list of model-config vars to override** — more surgical but requires maintenance every time a new service (Groq, WatsonX, etc.) is added. Rejected in favour of re-reading all `.env` vars so future service vars work automatically.
- **`dotenv` as a pseudo-profile name** — doesn't support the merge case (`--model-profile gpt4o --dotenv`) and is confusing alongside real profile names.
- **Modifying `apply_model_profile` to respect pre-set vars** — invasive change to internals; less transparent.

## Priority

High - Important for my workflow

## Additional Context

**Test plan:**
- Bash unit tests for `apply_dotenv_model_overrides`: overrides existing vars, no-op when `.env` missing, strips quotes/inline comments, handles `export`-prefixed lines.
- Bash unit tests for `apply_model_config`: `USE_DOTENV=false` behaves like profile only; `USE_DOTENV=true` with profile → `.env` wins; `.env` omits var → profile value kept; no profile + `USE_DOTENV=true` → defaults to `gpt-oss`.
- Smoke tests: `--dotenv` alone, `--model-profile gpt4o --dotenv`, and existing behaviour without `--dotenv`.

**Files affected:** `benchmarks/helpers/common.sh`, `scripts/eval.sh`, `scripts/compare.sh`, `benchmarks/bpo/compare.sh`, `benchmarks/m3/compare.sh`, `benchmarks/appworld/compare.sh`, `benchmarks/oak_health_insurance/compare.sh`, `README.md`.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Add --dotenv flag to layer .env overrides on top of model profiles #58

Feature Request

Motivation / Problem

Use Case

Proposed Solution

Alternatives Considered

Priority

Additional Context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Feature]: Add --dotenv flag to layer .env overrides on top of model profiles #58

Description

Feature Request

Motivation / Problem

Use Case

Proposed Solution

Alternatives Considered

Priority

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions