fix: FINAL() callable in REPL; parser ignores FINAL/FINAL_VAR inside code fences#115
fix: FINAL() callable in REPL; parser ignores FINAL/FINAL_VAR inside code fences#115jkbrooks wants to merge 1 commit into
Conversation
…code fences Two bugs caused the RLM to return wrong answers when a model called FINAL() or FINAL_VAR() inside a ```repl``` code block: 1. Parser bug (utils/parsing.py): find_final_answer() ran its regex over the raw response including fenced code blocks, so FINAL(x) in a repl block was parsed as the terminal answer, returning the literal string "final_answer" instead of its value. Fix: strip all fenced code blocks before running the FINAL/FINAL_VAR regex. FINAL/FINAL_VAR in prose still work as before. 2. Runtime bug (environments/local_repl.py, environments/base_env.py): The system prompt advertises FINAL(value) as option 1 for submitting a final answer, but FINAL was never injected into the REPL globals - only FINAL_VAR was. Models calling FINAL() inside a repl block got a NameError. Fix: add _final() method to LocalREPL (mirrors _final_var for direct values), inject it as globals["FINAL"], restore it in _restore_scaffold, and add "FINAL" to RESERVED_TOOL_NAMES. Tests added to TestFindFinalAnswer: - test_final_inside_repl_code_block_not_parsed_as_terminal - test_final_var_inside_repl_code_block_not_parsed_as_terminal - test_final_in_prose_still_works_alongside_repl_block - test_final_callable_in_repl_environment Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
Pull request overview
Fixes incorrect termination behavior when models emit FINAL(...) / FINAL_VAR(...) inside fenced REPL blocks, and makes FINAL(...) callable inside the local REPL to align runtime behavior with the system prompt.
Changes:
- Update
find_final_answer()to ignoreFINAL(...)/FINAL_VAR(...)that appear inside fenced code blocks. - Inject
FINALintoLocalREPLglobals and restore it after executions; reserveFINALas a non-overridable tool name. - Add parsing + REPL runtime tests covering the regressions.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
tests/test_parsing.py |
Adds regression tests ensuring fenced FINAL/FINAL_VAR are ignored and FINAL() is callable in LocalREPL. |
rlm/utils/parsing.py |
Strips fenced code blocks before scanning for terminal FINAL/FINAL_VAR markers. |
rlm/environments/local_repl.py |
Adds _final() helper and injects/restores FINAL in REPL globals. |
rlm/environments/base_env.py |
Adds FINAL to RESERVED_TOOL_NAMES to prevent override by custom tools/user code. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| """ | ||
| # Remove fenced code blocks first so FINAL()/FINAL_VAR() inside ```repl``` code | ||
| # does not get parsed as a terminal answer. | ||
| text_no_code = re.sub(r"```[\s\S]*?```", "", text) |
There was a problem hiding this comment.
text_no_code = re.sub(r"```[\s\S]*?```", "", text) strips all fenced code blocks from the entire response before searching for FINAL(...). This can silently corrupt a legitimate final answer if the payload inside FINAL(...) includes a markdown code fence (e.g., returning a code snippet), because the fenced content will be removed before extraction. Consider narrowing the stripping to only repl blocks (the ones the runtime executes), or performing code-fence removal only outside the matched FINAL(...)/FINAL_VAR(...) span so the final payload is preserved verbatim.
| text_no_code = re.sub(r"```[\s\S]*?```", "", text) | |
| text_no_code = re.sub(r"```repl[\s\S]*?```", "", text) |
Problem
Two bugs caused the RLM to silently return wrong answers when a model chose to call
FINAL()orFINAL_VAR()inside a```repl ```code block — a natural pattern given the system prompt examples.Bug 1 – Parser (
utils/parsing.py)find_final_answer()ran itsFINAL(...)/FINAL_VAR(...)regex over the raw assistant response, including inside fenced code blocks. So when the model wrote:```repl
FINAL(final_answer)
```
the parser matched
FINAL(final_answer)and returned the literal string"final_answer"as the completion response instead of the variable's value. No error was raised — the RLM just silently returned a wrong string.Bug 2 – Runtime (
environments/local_repl.py,environments/base_env.py)The system prompt offers
FINAL(value)as option 1 for submitting a final answer:But
FINALwas never injected into the REPL globals — onlyFINAL_VARwas. So when the model calledFINAL(x)inside a repl block it got aNameError, and the REPL stderr was shown to the model, wasting an iteration. The only reason the run didn't always fail is that the parser still picked upFINAL(...)from the response text — leading to bug 1 above.Fix
utils/parsing.py— strip all fenced code blocks from the response before running theFINAL/FINAL_VARregex. This ensures only prose-levelFINAL(...)signals termination.environments/local_repl.py— add_final(value)method (mirrors_final_varfor direct values), inject it asglobals["FINAL"], and restore it in_restore_scaffold().environments/base_env.py— add"FINAL"toRESERVED_TOOL_NAMESso it can't be overwritten by user code.Tests added (
tests/test_parsing.py)test_final_inside_repl_code_block_not_parsed_as_terminalFINAL()in code fencetest_final_var_inside_repl_code_block_not_parsed_as_terminalFINAL_VAR()in code fencetest_final_in_prose_still_works_alongside_repl_blockFINAL()still terminates after code fencetest_final_callable_in_repl_environmentFINAL(x)callable in REPL setsfinal_answerAll 16
TestFindFinalAnswertests pass.How to reproduce (before fix)
Made with Cursor