Skip to content

[Feature]: Add vakra small training data and make it the default --m3-data source #61

@haroldship

Description

@haroldship

Feature Request

Bundle the vakra small M3 training data with the repository and make it the default data source when --m3-data is passed without a path argument.

Motivation / Problem

Today, ./benchmarks/m3/eval.sh --m3-data requires an explicit path to a zip or directory:

if [[ -z "${2:-}" || "$2" == --* ]]; then
    echo "Error: --m3-data requires a path (zip file or directory)" >&2
    exit 2
fi

There is no out-of-the-box default, so every user must know the location of the M3 data before they can run --m3-data mode. The comment in m3_registry_m3_data.yaml already anticipates a "default zip" invocation (./benchmarks/m3/eval.sh --m3-data # default zip), but that path does not exist yet.

Use Case

  • Developers and CI jobs want to run ./benchmarks/m3/eval.sh --m3-data against a well-known, representative dataset without specifying an external path every time.
  • The vakra small dataset is compact enough to ship alongside the repo and covers the capability domains already listed in m3_registry_m3_data.yaml.
  • Having a canonical default lowers the barrier to entry for new contributors running M3 evals locally.

Proposed Solution

  1. Add the vakra small data — commit (or reference via a just download-m3-data task) the vakra small zip/directory under benchmarks/m3/data/vakra_small/ (or as a tracked zip artifact).

  2. Wire it as the default — in eval.sh, change the --m3-data argument parsing so that omitting a path falls back to the bundled vakra small data:

    --m3-data)
        M3_DATA=true
        if [[ -z "${2:-}" || "$2" == --* ]]; then
            # No path supplied — use bundled vakra small data
            M3_DATA_PATH="$SCRIPT_DIR/data/vakra_small"
        else
            M3_DATA_PATH="$2"
            shift
        fi
        shift
        ;;
  3. Update help text — document that --m3-data (with no path) runs against the bundled vakra small dataset.

  4. Update m3_registry_m3_data.yaml — align the domains lists with whatever capabilities and domains vakra small actually contains.

Alternatives Considered

  • Keeping the required-path behavior and adding a separate --m3-data-default flag. Rejected: adds flag surface with no benefit; the implicit default is cleaner.
  • Downloading the data at eval time via setup_m3.sh. This works but requires network access and makes the no-arg invocation slower and less reproducible.

Priority

Medium

Additional Context

  • Related files: benchmarks/m3/eval.sh, benchmarks/m3/config/m3_registry_m3_data.yaml, benchmarks/m3/m3_data_loader.py
  • The vakra scoring pipeline (benchmarks/m3/m3_vakra_score.py) already consumes the data shape that M3DataLoader produces, so no scoring-side changes are expected.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions