lilac adapter by shreymodi1 · Pull Request #389 · eval-protocol/python-sdk

shreymodi1 · 2025-12-29T18:11:58Z

Note

Adds pandas-based data interchange for evaluation rows and serialization helpers.

New adapters/dataframe.py with evaluation_rows_to_dataframe and dataframe_to_evaluation_rows converting via EvaluationRow.to_dict()/from_dict()
EvaluationRow gains to_dict (JSON under data_json plus convenience fields) and from_dict (validates/deserializes) for stable round-tripping
Exposes DataFrame adapter functions in adapters/__init__.py behind optional pandas import

^{Written by Cursor Bugbot for commit e529f7f. This will update automatically on new commits. Configure here.}

eval_protocol/adapters/lilac.py

dphuang2 · 2025-12-29T19:22:55Z

this PR isn't specific to lilac in any way—its an adapter for pandas dataframe. Can we rename the module to reflect that?

dphuang2 · 2025-12-29T19:24:10Z

eval_protocol/adapters/lilac.py

+def _evaluation_row_to_dict(row: EvaluationRow) -> dict[str, Any]:
+    """Convert a single EvaluationRow to a dictionary.
+
+    The output contains JSON-serialized fields that can be reconstructed back
+    to EvaluationRow. Users can add their own text columns for clustering.
+    """
+    return {
+        # Identifiers
+        "row_id": row.input_metadata.row_id if row.input_metadata else None,
+        # Full data as JSON (for reconstruction)
+        "messages_json": json.dumps([_serialize_message(m) for m in row.messages]),
+        "tools_json": json.dumps(row.tools) if row.tools else None,
+        "ground_truth_json": json.dumps(row.ground_truth) if row.ground_truth else None,
+        "input_metadata_json": row.input_metadata.model_dump_json() if row.input_metadata else None,
+        "execution_metadata_json": row.execution_metadata.model_dump_json() if row.execution_metadata else None,
+        "evaluation_result_json": row.evaluation_result.model_dump_json() if row.evaluation_result else None,
+        # Scalar fields for filtering
+        "score": row.evaluation_result.score if row.evaluation_result else None,
+        "message_count": len(row.messages),
+        "has_tools": bool(row.tools),
+    }


should this helper just live on the EvaluationRow for discovery? Also, how was the final set of keys decided? Why don't we just generically serialize the entire EvaluationRow?

lilac adapter

c3203f5

cursor bot reviewed Dec 29, 2025

View reviewed changes

eval_protocol/adapters/lilac.py Outdated Show resolved Hide resolved

eval_protocol/adapters/lilac.py Outdated Show resolved Hide resolved

updated adapter

74cca5d

cursor bot reviewed Dec 29, 2025

View reviewed changes

eval_protocol/adapters/lilac.py Outdated Show resolved Hide resolved

dphuang2 reviewed Dec 29, 2025

View reviewed changes

updated adapters

e529f7f

shreymodi1 merged commit 3322e5f into main Jan 5, 2026
31 of 32 checks passed

shreymodi1 deleted the shrey/lilac branch January 5, 2026 19:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lilac adapter#389

lilac adapter#389
shreymodi1 merged 3 commits intomainfrom
shrey/lilac

shreymodi1 commented Dec 29, 2025 •

edited by cursor bot

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dphuang2 commented Dec 29, 2025

Uh oh!

dphuang2 Dec 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

shreymodi1 commented Dec 29, 2025 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dphuang2 commented Dec 29, 2025

Uh oh!

dphuang2 Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

shreymodi1 commented Dec 29, 2025 •

edited by cursor bot

Loading