G-Eval Standalone

Standalone native G-Eval benchmark package for Eval Protocol.

This repo includes:

native G-Eval reward implementation (geval/reward.py)
SummEval-aligned benchmark entry (benchmarks/test_geval.py)
SummEval sample dataset (data/summeval_sample_100.jsonl)
dataset preparation script (geval/prepare_data.py)

Quickstart

uv sync --extra dev

Set one key:

export OPENAI_API_KEY=...
# or Fireworks (OpenAI-compatible)
export FIREWORKS_API_KEY=...

Run benchmark

uv run ep local-test --entry benchmarks/test_geval.py::test_geval_benchmark --ignore-docker

or

uv run pytest benchmarks/test_geval.py::test_geval_benchmark -q -s

Key env vars

GEVAL_DATA_PATH: path to JSONL dataset (default: data/summeval_sample_100.jsonl)
GEVAL_DIMENSION: coherence|consistency|relevance|fluency (default: consistency)
GEVAL_JUDGE_MODEL: judge model id (default: gpt-4.1-mini)
GEVAL_TOP_LOGPROBS: top logprobs count (default: 5)
GEVAL_SAMPLING_FALLBACK_N: sampling fallback count when logprobs unsupported (default: 20)

Refresh dataset

uv run geval-build-data --limit 100 --output data/summeval_sample_100.jsonl

Notes

The reward computes a discrete score and, when possible, a probability-weighted expected score.
Probability weighting prefers token logprobs and falls back to sampling-based estimation.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
benchmarks		benchmarks
data		data
geval		geval
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

G-Eval Standalone

Quickstart

Run benchmark

Key env vars

Refresh dataset

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

G-Eval Standalone

Quickstart

Run benchmark

Key env vars

Refresh dataset

Notes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages