Rename --reasoning → --reasoning-packs (disambiguate pack-set vs thinking-mode)

## Problem

The `run` subcommand has two conceptually-different axes that currently share confusable naming:

- **Pack-set selectors** (the *what*): `--quick` / `--medium` / `--full` / `--reasoning`
- **Thinking-mode flags** (the *how*): `--enable-thinking` / `--no-thinking`

`--reasoning` is a **pack-set selector** (it runs the reasoning-heavy suite — HE+, LCB v6, GPQA, GSM-Symbolic), but its name reads like a *mode* ("evaluate with reasoning"), and it sits right next to `--enable-thinking`/`--no-thinking`. In practice this is genuinely ambiguous: when you want "with-reasoning vs without-reasoning" you actually want `--full --enable-thinking` vs `--full --no-thinking` — **not** `--reasoning`. The shared word makes that easy to get wrong.

## Proposal

Rename the flag to make it clearly a *pack category*, parallel to `--full`:

- Add **`--reasoning-packs`** as the primary flag.
- Keep **`--reasoning`** as a **hidden, deprecated alias** (back-compat — anything scripted against it keeps working), with a one-line deprecation note in `--help`.
- Update `--help` so the pack-set group reads `--quick | --medium | --full | --reasoning-packs` and stays visually separate from the `--enable-thinking`/`--no-thinking` mode group.

This keeps the mental model crisp: **pick the packs, then pick the mode.** The canonical quality eval stays `--full --no-thinking` + `--full --enable-thinking`; `--reasoning-packs` is the separate, deliberate deep-dive.

Downstream: the `quality-test.sh` wrapper (in `noonghunna/club-3090`) forwards the same token, so mirror the rename there + its usage block.

## Optional (bigger, later)

If we ever want the split to be structural rather than by-convention, make the axes explicit: `--packs {quick,medium,full,reasoning}` + `--thinking {on,off}`. That's a breaking change though — the `--reasoning → --reasoning-packs` alias rename gets most of the clarity for a fraction of the churn, so start there.

## Context

Hit this while running a KV-cache quality A/B: reached for `--medium` + `--sandboxed-only` (which, separately, means *all* sandboxed packs incl. the slow HE+/LCB) when the right call was simply `--full` in both thinking modes. The naming nudged toward the wrong flag.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rename --reasoning → --reasoning-packs (disambiguate pack-set vs thinking-mode) #65

Problem

Proposal

Optional (bigger, later)

Context

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Rename --reasoning → --reasoning-packs (disambiguate pack-set vs thinking-mode) #65

Description

Problem

Proposal

Optional (bigger, later)

Context

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions