Skip to content

fix: FINAL() callable in REPL; parser ignores FINAL/FINAL_VAR inside code fences#115

Open
jkbrooks wants to merge 1 commit into
alexzhang13:mainfrom
jkbrooks:fix/final-in-repl-block
Open

fix: FINAL() callable in REPL; parser ignores FINAL/FINAL_VAR inside code fences#115
jkbrooks wants to merge 1 commit into
alexzhang13:mainfrom
jkbrooks:fix/final-in-repl-block

Conversation

@jkbrooks

Copy link
Copy Markdown

Problem

Two bugs caused the RLM to silently return wrong answers when a model chose to call FINAL() or FINAL_VAR() inside a ```repl ``` code block — a natural pattern given the system prompt examples.

Bug 1 – Parser (utils/parsing.py)

find_final_answer() ran its FINAL(...) / FINAL_VAR(...) regex over the raw assistant response, including inside fenced code blocks. So when the model wrote:

```repl
FINAL(final_answer)
```

the parser matched FINAL(final_answer) and returned the literal string "final_answer" as the completion response instead of the variable's value. No error was raised — the RLM just silently returned a wrong string.

Bug 2 – Runtime (environments/local_repl.py, environments/base_env.py)

The system prompt offers FINAL(value) as option 1 for submitting a final answer:

  1. Use FINAL(your final answer here) to provide the answer directly

But FINAL was never injected into the REPL globals — only FINAL_VAR was. So when the model called FINAL(x) inside a repl block it got a NameError, and the REPL stderr was shown to the model, wasting an iteration. The only reason the run didn't always fail is that the parser still picked up FINAL(...) from the response text — leading to bug 1 above.

Fix

utils/parsing.py — strip all fenced code blocks from the response before running the FINAL/FINAL_VAR regex. This ensures only prose-level FINAL(...) signals termination.

environments/local_repl.py — add _final(value) method (mirrors _final_var for direct values), inject it as globals["FINAL"], and restore it in _restore_scaffold().

environments/base_env.py — add "FINAL" to RESERVED_TOOL_NAMES so it can't be overwritten by user code.

Tests added (tests/test_parsing.py)

Test Covers
test_final_inside_repl_code_block_not_parsed_as_terminal Parser ignores FINAL() in code fence
test_final_var_inside_repl_code_block_not_parsed_as_terminal Parser ignores FINAL_VAR() in code fence
test_final_in_prose_still_works_alongside_repl_block Prose FINAL() still terminates after code fence
test_final_callable_in_repl_environment FINAL(x) callable in REPL sets final_answer

All 16 TestFindFinalAnswer tests pass.

How to reproduce (before fix)

from rlm import RLM
rlm = RLM(backend="openai", backend_kwargs={"model_name": "gpt-4o-mini"}, environment="local", max_depth=1)
result = rlm.completion("Compute the sum of integers from 1 to 10 and print only the number.")
print(result.response)  # prints "final_answer" instead of "55"

Made with Cursor

…code fences

Two bugs caused the RLM to return wrong answers when a model called
FINAL() or FINAL_VAR() inside a ```repl``` code block:

1. Parser bug (utils/parsing.py): find_final_answer() ran its regex
   over the raw response including fenced code blocks, so FINAL(x) in
   a repl block was parsed as the terminal answer, returning the literal
   string "final_answer" instead of its value.
   Fix: strip all fenced code blocks before running the FINAL/FINAL_VAR
   regex. FINAL/FINAL_VAR in prose still work as before.

2. Runtime bug (environments/local_repl.py, environments/base_env.py):
   The system prompt advertises FINAL(value) as option 1 for submitting
   a final answer, but FINAL was never injected into the REPL globals -
   only FINAL_VAR was. Models calling FINAL() inside a repl block got a
   NameError.
   Fix: add _final() method to LocalREPL (mirrors _final_var for direct
   values), inject it as globals["FINAL"], restore it in _restore_scaffold,
   and add "FINAL" to RESERVED_TOOL_NAMES.

Tests added to TestFindFinalAnswer:
- test_final_inside_repl_code_block_not_parsed_as_terminal
- test_final_var_inside_repl_code_block_not_parsed_as_terminal
- test_final_in_prose_still_works_alongside_repl_block
- test_final_callable_in_repl_environment

Co-authored-by: Cursor <cursoragent@cursor.com>
Copilot AI review requested due to automatic review settings February 19, 2026 05:05

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes incorrect termination behavior when models emit FINAL(...) / FINAL_VAR(...) inside fenced REPL blocks, and makes FINAL(...) callable inside the local REPL to align runtime behavior with the system prompt.

Changes:

  • Update find_final_answer() to ignore FINAL(...) / FINAL_VAR(...) that appear inside fenced code blocks.
  • Inject FINAL into LocalREPL globals and restore it after executions; reserve FINAL as a non-overridable tool name.
  • Add parsing + REPL runtime tests covering the regressions.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
tests/test_parsing.py Adds regression tests ensuring fenced FINAL/FINAL_VAR are ignored and FINAL() is callable in LocalREPL.
rlm/utils/parsing.py Strips fenced code blocks before scanning for terminal FINAL/FINAL_VAR markers.
rlm/environments/local_repl.py Adds _final() helper and injects/restores FINAL in REPL globals.
rlm/environments/base_env.py Adds FINAL to RESERVED_TOOL_NAMES to prevent override by custom tools/user code.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread rlm/utils/parsing.py
"""
# Remove fenced code blocks first so FINAL()/FINAL_VAR() inside ```repl``` code
# does not get parsed as a terminal answer.
text_no_code = re.sub(r"```[\s\S]*?```", "", text)

Copilot AI Feb 19, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

text_no_code = re.sub(r"```[\s\S]*?```", "", text) strips all fenced code blocks from the entire response before searching for FINAL(...). This can silently corrupt a legitimate final answer if the payload inside FINAL(...) includes a markdown code fence (e.g., returning a code snippet), because the fenced content will be removed before extraction. Consider narrowing the stripping to only repl blocks (the ones the runtime executes), or performing code-fence removal only outside the matched FINAL(...)/FINAL_VAR(...) span so the final payload is preserved verbatim.

Suggested change
text_no_code = re.sub(r"```[\s\S]*?```", "", text)
text_no_code = re.sub(r"```repl[\s\S]*?```", "", text)

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants