[data] feat: add support for swe rebench v2#56
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces support for the nebius/SWE-rebench-V2 dataset (Python slice), including a preprocessing script, a dedicated reward spec, and test-log parsers. Feedback on these changes highlights several robustness issues: a potential NameError in the preprocessing script when no languages are specified, potential KeyError and IndexError exceptions when parsing metadata, and a potential AttributeError if a patch is not a string. Additionally, immediately stripping whitespace from command observations in env.py could strip crucial formatting like indentation or tracebacks, so it is recommended to only strip during the empty check.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| if languages: | ||
| wanted = {lang.lower() for lang in languages} | ||
| dataset = dataset.filter(lambda ex: (ex.get("language") or "").lower() in wanted) | ||
| dataset = dataset.filter(lambda ex: ex["instance_id"] not in SKIP_SAMPLES) | ||
|
|
||
| print(f"Kept {len(dataset)} instances after language filter {sorted(wanted)}", flush=True) |
There was a problem hiding this comment.
If languages is falsy or None, the wanted variable is never defined. This will raise a NameError: name 'wanted' is not defined when printing the log on line 194. Defining wanted beforehand or conditionally formatting the log message resolves this issue.
wanted = None
if languages:
wanted = {lang.lower() for lang in languages}
dataset = dataset.filter(lambda ex: (ex.get("language") or "").lower() in wanted)
dataset = dataset.filter(lambda ex: ex["instance_id"] not in SKIP_SAMPLES)
filter_desc = sorted(wanted) if wanted else "None"
print(f"Kept {len(dataset)} instances after language filter {filter_desc}", flush=True)| "log_parser": install_config["log_parser"], | ||
| "test_cmd": install_config["test_cmd"], |
There was a problem hiding this comment.
If example["install_config"] is None, install_config defaults to an empty dictionary {}. Accessing install_config["log_parser"] and install_config["test_cmd"] directly will then raise a KeyError. Using .get() is safer.
| "log_parser": install_config["log_parser"], | |
| "test_cmd": install_config["test_cmd"], | |
| "log_parser": install_config.get("log_parser"), | |
| "test_cmd": install_config.get("test_cmd"), |
| # Upstream clones into /{repo.split('/')[1]} (see combine.Dockerfile.j2). | ||
| self.repo_dir = f"/{self.repo.split('/')[1]}" |
There was a problem hiding this comment.
If self.repo is not in the expected owner/repo format (e.g., if it is just a single repository name), self.repo.split('/') will only have one element, causing an IndexError when accessing index 1. Adding a defensive check prevents potential crashes.
| # Upstream clones into /{repo.split('/')[1]} (see combine.Dockerfile.j2). | |
| self.repo_dir = f"/{self.repo.split('/')[1]}" | |
| # Upstream clones into /{repo.split('/')[1]} (see combine.Dockerfile.j2). | |
| repo_parts = self.repo.split("/") | |
| self.repo_dir = f"/{repo_parts[1]}" if len(repo_parts) > 1 else f"/{self.repo}" |
| def _get_modified_files(patch: str) -> list[str]: | ||
| """Target (``b/``) paths touched by a unified diff, order-preserved & deduped. | ||
|
|
||
| A tiny self-contained replacement for ``swebench.harness.utils.get_modified_files`` | ||
| (only used to reset the test files before/after applying the test patch). | ||
| Files deleted by the patch (``+++ /dev/null``) are skipped. | ||
| """ | ||
| files: list[str] = [] | ||
| for line in patch.split("\n"): |
There was a problem hiding this comment.
If patch is None or not a string, calling patch.split("\n") will raise an AttributeError. Adding a defensive type check ensures the function handles unexpected or empty inputs gracefully.
| def _get_modified_files(patch: str) -> list[str]: | |
| """Target (``b/``) paths touched by a unified diff, order-preserved & deduped. | |
| A tiny self-contained replacement for ``swebench.harness.utils.get_modified_files`` | |
| (only used to reset the test files before/after applying the test patch). | |
| Files deleted by the patch (``+++ /dev/null``) are skipped. | |
| """ | |
| files: list[str] = [] | |
| for line in patch.split("\n"): | |
| def _get_modified_files(patch: str) -> list[str]: | |
| """Target (b/) paths touched by a unified diff, order-preserved & deduped. | |
| A tiny self-contained replacement for swebench.harness.utils.get_modified_files | |
| (only used to reset the test files before/after applying the test patch). | |
| Files deleted by the patch (+++ /dev/null) are skipped. | |
| """ | |
| if not patch or not isinstance(patch, str): | |
| return [] | |
| files: list[str] = [] | |
| for line in patch.split("\n"): |
| observation = re.sub(r"\x1b\[[0-9;]*m|\r", "", observation).strip() | ||
| if observation == "": |
There was a problem hiding this comment.
Calling .strip() immediately on the observation modifies the returned command output by removing all leading and trailing whitespace/newlines. This can affect formatting (e.g., indentation in file content or traceback outputs) when presented to the agent. It is safer to preserve the original whitespace and only strip it for the empty check.
| observation = re.sub(r"\x1b\[[0-9;]*m|\r", "", observation).strip() | |
| if observation == "": | |
| observation = re.sub(r"\x1b\[[0-9;]*m|\r", "", observation) | |
| if observation.strip() == "": |
What does this PR do?
as title
Checklist Before Starting
[{modules}] {type}: {description}(checked by CI){modules}may includecore,interaction,model,env,tools,deployment,reward,dashboard,docs,examples,data,train,ci,build,deps,misc,like[interaction, tools, docs]{type}must be one offeat,fix,refactor,chore,test[BREAKING]to the beginning of the title[1/N][BREAKING][deployment, docs] feat: simplify runtime env configurationTest
API and Usage Example
# Add a short example here when the PR changes public behaviorDesign & Code Changes
Checklist Before Submitting
pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always