[Feature]: Add Vakra LangGraph agent as a comparison target in compare.sh

## Feature Request

Add Vakra's built-in simple LangGraph agent as a first-class comparison target in `compare.sh` (and the per-benchmark `compare.sh` scripts), alongside the existing `cuga` and `react` agents.

**Parent epic:** cuga-project/cuga-agent#239

---

## Motivation / Problem

Today, `compare.sh` accepts only `--agent cuga` or `--agent react` (enforced by a hard validation check in `scripts/compare.sh`). Evaluators who want to benchmark cuga-agent against Vakra's own LangGraph-based agent must run Vakra separately and reconcile results by hand. There is no automated, reproducible head-to-head comparison between cuga and the Vakra LangGraph agent inside the cuga-eval harness.

---

## Use Case

An evaluator running the M3 or BPO benchmark wants to understand where cuga-agent adds value over a plain LangGraph ReAct loop. They run:

```bash
./scripts/compare.sh --benchmark m3 --agents cuga,langgraph --runs 5
```

and get a single comparison report that shows pass-rate, token usage, and tool-call breakdown for both agents side by side — exactly as today's `cuga` vs `react` comparison works.

---

## Proposed Solution

1. **Extend agent validation** — remove the hard `cuga|react` guard in `scripts/compare.sh` (line ~68) and replace it with a list that includes `langgraph`. Propagate this to the per-benchmark `compare.sh` scripts (`benchmarks/bpo/compare.sh`, `benchmarks/m3/compare.sh`, etc.).

2. **Add a LangGraph agent runner** — create `benchmarks/helpers/langgraph_agent.py` (analogous to `benchmarks/helpers/react_agent.py`) that wraps Vakra's simple LangGraph agent and exposes the same interface used by the existing eval scripts (`eval_m3.py`, `eval_bench_sdk_react.py`, etc.).

3. **Wire into eval scripts** — add `langgraph` as a valid `--agent` choice in `benchmarks/*/eval_*.py` (currently hard-coded to `choices=["cuga", "react"]`) and route to the new runner.

4. **Update `--compare-agents` shorthand** — decide whether `--compare-agents` expands to `cuga,react,langgraph` or stays as `cuga,react` with a separate `--compare-all-agents` flag. A new `--compare-all-agents` flag is the lower-risk option.

5. **Setup / dependencies** — Vakra is already cloned and vendored by `setup_m3.sh`. Document that the langgraph agent target requires `setup_m3.sh` to have been run first.

6. **Report integration** — ensure `benchmarks/helpers/compare_report.py` renders a langgraph column in the comparison Markdown table.

---

## Alternatives Considered

- **Keep Vakra comparison out-of-band** — evaluators continue to run Vakra separately. Rejected because it prevents automated, reproducible, apples-to-apples comparisons inside cuga-eval's reproducibility bundles.
- **Reuse `react` agent label** — map `langgraph` internally to the react harness. Rejected because Vakra's LangGraph agent differs from the bare `react_agent.py` implementation and conflating them would obscure meaningful performance differences.

---

## Priority

Medium

---

## Additional Context

- Vakra is already cloned and installed by `setup_m3.sh` into `vendor/vakra`; the M3 benchmark uses it as a scorer/judge today, not as an agent under test.
- Existing analysis comparing cuga vs react on M3 (Vakra): `docs/m3-vakra-analysis-20260428/cuga_vs_react_full_analysis.md`
- Relevant files to modify:
  - `scripts/compare.sh` — agent validation list
  - `benchmarks/helpers/common.sh` — `--compare-agents` expansion
  - `benchmarks/m3/compare.sh`, `benchmarks/bpo/compare.sh`, etc.
  - `benchmarks/helpers/react_agent.py` — reference implementation
  - `benchmarks/*/eval_*.py` — `choices=["cuga", "react"]` argparse args

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Add Vakra LangGraph agent as a comparison target in compare.sh #57

Feature Request

Motivation / Problem

Use Case

Proposed Solution

Alternatives Considered

Priority

Additional Context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Feature]: Add Vakra LangGraph agent as a comparison target in compare.sh #57

Description

Feature Request

Motivation / Problem

Use Case

Proposed Solution

Alternatives Considered

Priority

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions