feat(peek): vendor peek-ai with score-decay patch, tracing, and bug fix#34
feat(peek): vendor peek-ai with score-decay patch, tracing, and bug fix#34apenab wants to merge 2 commits into
Conversation
|
Warning Review limit reached
More reviews will be available in 40 minutes and 35 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (28)
📝 WalkthroughWalkthroughThis PR adds optional PEEK (MIT CSAIL) integration to pyrlm-runtime, enabling persistent context caching across RLM runs. It includes a vendored peek-ai library, PeekSession wrapper for ModelAdapter, 412 integration tests, two benchmark scripts (703 and 453 lines), and comprehensive experimental documentation showing REJECT verdict on oolong-synth dataset. ChangesOptional PEEK Orientation-Cache Integration
🎯 4 (Complex) | ⏱️ ~60 minutes
Estimated code review effort: 🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 10
🧹 Nitpick comments (1)
src/pyrlm_runtime/peek_integration.py (1)
307-328: 💤 Low value
load()dropstrace_dircapability.The
load()classmethod doesn't accept or forward atrace_dirparameter, so loaded sessions cannot produce per-update JSON snapshots. If this is intentional (tracing only for fresh sessions), consider documenting it; otherwise, add the parameter for parity withcreate().Proposed fix to add trace_dir support
`@classmethod` def load( cls, path: str | Path, adapter: ModelAdapter, *, max_tokens: int = 2048, + trace_dir: str | Path | None = None, ) -> PeekSession: """Load a previously saved session. Args: path: Path to the JSON file created by ``save``. adapter: ModelAdapter for future Distiller/Cartographer calls. max_tokens: Max completion tokens for those calls. + trace_dir: If set, write a JSON trace per ``update_from_run`` call. """ _require_peek() from peek import CachePolicy lm_client = _PeekLMClientAdapter(adapter, max_tokens=max_tokens) policy = CachePolicy.load(path, client=lm_client) - return cls(policy) + return cls(policy, trace_dir=trace_dir)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/pyrlm_runtime/peek_integration.py` around lines 307 - 328, The load() classmethod currently omits the trace_dir option so reloaded PeekSession instances cannot emit per-update JSON traces; update PeekSession.load to accept a trace_dir: str | Path | None parameter (matching create()), pass it through when constructing the session/policy, and ensure CachePolicy.load(...) or the returned policy is wired to use trace_dir (e.g., by forwarding trace_dir into CachePolicy.load or into cls(policy) initialization); reference the existing symbols load(), create(), PeekSession, CachePolicy.load, _PeekLMClientAdapter, and cls(policy) when making the change so behavior matches create() and tracing is preserved for loaded sessions.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/test_peek_integration.py`:
- Around line 17-22: The current PEEK_AVAILABLE check only tests import peek and
misses that peek's optional dependency tiktoken may be absent; update the check
in tests/test_peek_integration.py so PEEK_AVAILABLE is only True if both import
peek and import tiktoken succeed (e.g., try: import peek; import tiktoken;
PEEK_AVAILABLE = True except ImportError: PEEK_AVAILABLE = False), referencing
the PEEK_AVAILABLE symbol and the import block to locate where to change it.
In `@vendor/peek/__init__.py`:
- Around line 1-3: Add the future annotations import at the very top of the
module so type annotations are postponed; specifically insert "from __future__
import annotations" before the module docstring and before the existing import
block (the line starting with "from peek.core import (") in
vendor/peek/__init__.py to satisfy the repo convention.
In `@vendor/peek/_io.py`:
- Around line 14-33: The extract_json function currently returns whatever
json.loads yields (which can be lists or scalars) despite its declared return
type dict[str, Any] | None; update extract_json to validate the parsed value is
a dict before returning: in the initial json.loads(s) attempt and in each
successful json.loads(...) inside the loops over _scan_fenced_blocks and
_scan_balanced_braces, only return the value if isinstance(result, dict),
otherwise continue scanning and ultimately return None if no dict is found;
reference the extract_json function and the helper generators
_scan_fenced_blocks and _scan_balanced_braces when making this change.
In `@vendor/peek/core/__init__.py`:
- Line 1: Add the required future annotations import at the top of the module:
insert "from __future__ import annotations" as the very first statement in
vendor/peek/core/__init__.py (before the existing "from peek.core.cartographer
import Cartographer") so the module-level type annotations use postponed
evaluation per project guidelines.
In `@vendor/peek/core/cartographer.py`:
- Around line 62-63: Update the guard that checks inputs so whitespace-only
strings are rejected: change the checks around the variables section and content
(the conditional currently using "if not isinstance(section, str) or not
isinstance(content, str) or not content") to also verify content.strip() is
non-empty before proceeding; apply the same change to the similar check at the
later occurrence (the block around lines handling section/content at the other
location). This ensures you skip creating edits for content that is only
whitespace while still allowing valid non-empty strings.
In `@vendor/peek/core/policy.py`:
- Around line 26-30: _cache_policy currently calls _default_tokenizer() which
unconditionally imports tiktoken and can raise ModuleNotFoundError when the
optional dependency isn't installed; modify _default_tokenizer to catch
ImportError (and any tiktoken import/usage errors) and return a safe fallback
TokenCounter (e.g., a function that returns len(s) or len(s.encode("utf-8")) as
an approximate token count) so CachePolicy.__init__ (where self.token_counter =
_default_tokenizer() is set when token_counter is None) will not crash if
tiktoken is absent. Ensure the function still uses
tiktoken.get_encoding("o200k_base") when available, but falls back cleanly on
exception.
In `@vendor/peek/llm/__init__.py`:
- Line 1: Add the future annotations import as the very first line of this
module: insert "from __future__ import annotations" before the existing "from
peek.llm.base import LMClient" import in the peek.llm.__init__ module so all
type annotations are deferred.
In `@vendor/peek/llm/anthropic_client.py`:
- Around line 43-50: The code currently only sets self._last when resp.usage is
present, leaving stale values otherwise; update the response-handling in
anthropic_client.py (look around the usage = getattr(resp, "usage", None) block
and the return path of the function that constructs text_blocks) to explicitly
clear or reset self._last when resp.usage is None (e.g., set self._last = None
or a Usage with zeroed tokens) so last_usage() cannot return previous-call
counts after a response omits usage metadata.
In `@vendor/peek/llm/gemini_client.py`:
- Around line 59-65: The code currently leaves self._last unchanged when
getattr(resp, "usage_metadata", None) is None, causing prior token counts to
leak; update the block in the gemini response handling so that if usage is None
you explicitly reset self._last = Usage(input_tokens=0, output_tokens=0) (or an
equivalent empty Usage) instead of skipping assignment; keep using getattr(resp,
"usage_metadata", None), the Usage class and self._last identifier so the latest
completion always reflects zeroed usage when metadata is absent and return
resp.text or "" as before.
In `@vendor/peek/llm/openai_client.py`:
- Around line 46-52: The code currently leaves self._last set to prior values
when resp.usage is None; update the call handling in openai_client.py (the
resp/usage branch around getattr(resp, "usage", None)) so that when usage is
None you explicitly reset self._last to an empty Usage (e.g., input_tokens=0,
output_tokens=0) instead of leaving the previous value, ensuring last_usage()
reflects the current call; keep the existing branch that populates self._last
when usage is present (and leave the final return of
resp.choices[0].message.content unchanged).
---
Nitpick comments:
In `@src/pyrlm_runtime/peek_integration.py`:
- Around line 307-328: The load() classmethod currently omits the trace_dir
option so reloaded PeekSession instances cannot emit per-update JSON traces;
update PeekSession.load to accept a trace_dir: str | Path | None parameter
(matching create()), pass it through when constructing the session/policy, and
ensure CachePolicy.load(...) or the returned policy is wired to use trace_dir
(e.g., by forwarding trace_dir into CachePolicy.load or into cls(policy)
initialization); reference the existing symbols load(), create(), PeekSession,
CachePolicy.load, _PeekLMClientAdapter, and cls(policy) when making the change
so behavior matches create() and tracing is preserved for loaded sessions.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 83b6d173-13d0-4f66-8e4c-eadc0d4a1c8a
📒 Files selected for processing (29)
.gitignoreCLAUDE.mddocs/peek-bench/PEEK-DIAGNOSIS.mddocs/peek-bench/PEEK-EXPERIMENTS.mdexamples/peek_bench/__init__.pyexamples/peek_bench/analyze_peek_trace.pyexamples/peek_bench/run_oolong_rlm_vs_peek.pypyproject.tomlsrc/pyrlm_runtime/peek_integration.pytests/test_peek_integration.pyvendor/peek/LICENSEvendor/peek/VENDOR.mdvendor/peek/__init__.pyvendor/peek/_io.pyvendor/peek/core/__init__.pyvendor/peek/core/cartographer.pyvendor/peek/core/context_map.pyvendor/peek/core/distiller.pyvendor/peek/core/evictor.pyvendor/peek/core/policy.pyvendor/peek/core/types.pyvendor/peek/llm/__init__.pyvendor/peek/llm/anthropic_client.pyvendor/peek/llm/base.pyvendor/peek/llm/gemini_client.pyvendor/peek/llm/openai_client.pyvendor/peek/prompts/cartographer.txtvendor/peek/prompts/distiller.txtvendor/peek/prompts/initial_context_map.txt
| try: | ||
| import peek # noqa: F401 | ||
|
|
||
| PEEK_AVAILABLE = True | ||
| except ImportError: | ||
| PEEK_AVAILABLE = False |
There was a problem hiding this comment.
PEEK_AVAILABLE check is insufficient — CI fails because tiktoken is missing.
The pipeline failures show that import peek succeeds but CachePolicy(...) raises ModuleNotFoundError: No module named 'tiktoken'. The tiktoken package is part of the peek extra but isn't installed in the base test environment, so all tests decorated with @pytest.mark.skipif(not PEEK_AVAILABLE, ...) run and then fail.
Either extend the availability check to include tiktoken, or add tiktoken to the test dependencies.
Proposed fix: extend PEEK_AVAILABLE to include tiktoken
try:
import peek # noqa: F401
+ import tiktoken # noqa: F401
PEEK_AVAILABLE = True
except ImportError:
PEEK_AVAILABLE = False📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| try: | |
| import peek # noqa: F401 | |
| PEEK_AVAILABLE = True | |
| except ImportError: | |
| PEEK_AVAILABLE = False | |
| try: | |
| import peek # noqa: F401 | |
| import tiktoken # noqa: F401 | |
| PEEK_AVAILABLE = True | |
| except ImportError: | |
| PEEK_AVAILABLE = False |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/test_peek_integration.py` around lines 17 - 22, The current
PEEK_AVAILABLE check only tests import peek and misses that peek's optional
dependency tiktoken may be absent; update the check in
tests/test_peek_integration.py so PEEK_AVAILABLE is only True if both import
peek and import tiktoken succeed (e.g., try: import peek; import tiktoken;
PEEK_AVAILABLE = True except ImportError: PEEK_AVAILABLE = False), referencing
the PEEK_AVAILABLE symbol and the import block to locate where to change it.
| """PEEK: Context Map as an Orientation Cache for Long-Context LLM Agents.""" | ||
|
|
||
| from peek.core import ( |
There was a problem hiding this comment.
Add from __future__ import annotations at module top.
Required by repo Python conventions, but currently missing.
Suggested fix
+from __future__ import annotations
+
"""PEEK: Context Map as an Orientation Cache for Long-Context LLM Agents."""As per coding guidelines: **/*.py: Include from __future__ import annotations in all Python files.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| """PEEK: Context Map as an Orientation Cache for Long-Context LLM Agents.""" | |
| from peek.core import ( | |
| from __future__ import annotations | |
| """PEEK: Context Map as an Orientation Cache for Long-Context LLM Agents.""" | |
| from peek.core import ( |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@vendor/peek/__init__.py` around lines 1 - 3, Add the future annotations
import at the very top of the module so type annotations are postponed;
specifically insert "from __future__ import annotations" before the module
docstring and before the existing import block (the line starting with "from
peek.core import (") in vendor/peek/__init__.py to satisfy the repo convention.
| def extract_json(text: str) -> dict[str, Any] | None: | ||
| """Best-effort JSON extraction from an LLM response.""" | ||
| s = text.strip() | ||
| try: | ||
| return json.loads(s) | ||
| except json.JSONDecodeError: | ||
| pass | ||
|
|
||
| for block in _scan_fenced_blocks(text): | ||
| try: | ||
| return json.loads(block.strip()) | ||
| except json.JSONDecodeError: | ||
| continue | ||
|
|
||
| for blob in _scan_balanced_braces(text): | ||
| try: | ||
| return json.loads(blob) | ||
| except json.JSONDecodeError: | ||
| continue | ||
| return None |
There was a problem hiding this comment.
Enforce dict-only parsing to match extract_json’s declared return type.
json.loads may return lists/scalars, but this function promises dict[str, Any] | None. Returning non-dicts can break callers expecting mapping semantics.
Suggested fix
def extract_json(text: str) -> dict[str, Any] | None:
"""Best-effort JSON extraction from an LLM response."""
+ def _parse_object(candidate: str) -> dict[str, Any] | None:
+ try:
+ parsed = json.loads(candidate)
+ except json.JSONDecodeError:
+ return None
+ return parsed if isinstance(parsed, dict) else None
+
s = text.strip()
- try:
- return json.loads(s)
- except json.JSONDecodeError:
- pass
+ parsed = _parse_object(s)
+ if parsed is not None:
+ return parsed
for block in _scan_fenced_blocks(text):
- try:
- return json.loads(block.strip())
- except json.JSONDecodeError:
- continue
+ parsed = _parse_object(block.strip())
+ if parsed is not None:
+ return parsed
for blob in _scan_balanced_braces(text):
- try:
- return json.loads(blob)
- except json.JSONDecodeError:
- continue
+ parsed = _parse_object(blob)
+ if parsed is not None:
+ return parsed
return None📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def extract_json(text: str) -> dict[str, Any] | None: | |
| """Best-effort JSON extraction from an LLM response.""" | |
| s = text.strip() | |
| try: | |
| return json.loads(s) | |
| except json.JSONDecodeError: | |
| pass | |
| for block in _scan_fenced_blocks(text): | |
| try: | |
| return json.loads(block.strip()) | |
| except json.JSONDecodeError: | |
| continue | |
| for blob in _scan_balanced_braces(text): | |
| try: | |
| return json.loads(blob) | |
| except json.JSONDecodeError: | |
| continue | |
| return None | |
| def extract_json(text: str) -> dict[str, Any] | None: | |
| """Best-effort JSON extraction from an LLM response.""" | |
| def _parse_object(candidate: str) -> dict[str, Any] | None: | |
| try: | |
| parsed = json.loads(candidate) | |
| except json.JSONDecodeError: | |
| return None | |
| return parsed if isinstance(parsed, dict) else None | |
| s = text.strip() | |
| parsed = _parse_object(s) | |
| if parsed is not None: | |
| return parsed | |
| for block in _scan_fenced_blocks(text): | |
| parsed = _parse_object(block.strip()) | |
| if parsed is not None: | |
| return parsed | |
| for blob in _scan_balanced_braces(text): | |
| parsed = _parse_object(blob) | |
| if parsed is not None: | |
| return parsed | |
| return None |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@vendor/peek/_io.py` around lines 14 - 33, The extract_json function currently
returns whatever json.loads yields (which can be lists or scalars) despite its
declared return type dict[str, Any] | None; update extract_json to validate the
parsed value is a dict before returning: in the initial json.loads(s) attempt
and in each successful json.loads(...) inside the loops over _scan_fenced_blocks
and _scan_balanced_braces, only return the value if isinstance(result, dict),
otherwise continue scanning and ultimately return None if no dict is found;
reference the extract_json function and the helper generators
_scan_fenced_blocks and _scan_balanced_braces when making this change.
| @@ -0,0 +1,32 @@ | |||
| from peek.core.cartographer import Cartographer | |||
There was a problem hiding this comment.
Add from __future__ import annotations at module top.
This file misses the required future-annotations import.
Suggested fix
+from __future__ import annotations
+
from peek.core.cartographer import Cartographer
from peek.core.context_map import ContextMap, normalize_sectionAs per coding guidelines: **/*.py: Include from __future__ import annotations in all Python files.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| from peek.core.cartographer import Cartographer | |
| from __future__ import annotations | |
| from peek.core.cartographer import Cartographer |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@vendor/peek/core/__init__.py` at line 1, Add the required future annotations
import at the top of the module: insert "from __future__ import annotations" as
the very first statement in vendor/peek/core/__init__.py (before the existing
"from peek.core.cartographer import Cartographer") so the module-level type
annotations use postponed evaluation per project guidelines.
| def _default_tokenizer() -> TokenCounter: | ||
| import tiktoken | ||
|
|
||
| enc = tiktoken.get_encoding("o200k_base") | ||
| return lambda s: len(enc.encode(s)) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Call sites in policy ==="
rg -n -C2 'def _default_tokenizer|import tiktoken|self\.token_counter = _default_tokenizer' vendor/peek/core/policy.py
echo
echo "=== Dependency declarations ==="
rg -n -C3 'tiktoken|optional-dependencies|peek' pyproject.tomlRepository: apenab/pyrlm-runtime
Length of output: 1046
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Call sites in policy ==="
rg -n -C2 'def _default_tokenizer|import tiktoken|token_counter\s*=\s*_default_tokenizer' vendor/peek/core/policy.py
echo
echo "=== Dependency declarations ==="
rg -n -C3 'tiktoken|optional-dependencies|peek' pyproject.tomlRepository: apenab/pyrlm-runtime
Length of output: 1046
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Call sites in policy ==="
rg -n -C2 'def _default_tokenizer|import tiktoken|token_counter\s*=\s*_default_tokenizer' vendor/peek/core/policy.py
echo
echo "=== Dependency declarations ==="
rg -n -C3 'tiktoken|optional-dependencies|peek' pyproject.tomlRepository: apenab/pyrlm-runtime
Length of output: 1046
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Call sites in policy ==="
rg -n -C2 'def _default_tokenizer|import tiktoken|token_counter\s*=\s*_default_tokenizer' vendor/peek/core/policy.py
echo
echo "=== Dependency declarations ==="
rg -n -C3 'tiktoken|optional-dependencies|peek' pyproject.tomlRepository: apenab/pyrlm-runtime
Length of output: 1046
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Call sites in policy ==="
rg -n -C2 'def _default_tokenizer|import tiktoken|token_counter\s*=\s*_default_tokenizer' vendor/peek/core/policy.py
echo
echo "=== Dependency declarations ==="
rg -n -C3 'tiktoken|optional-dependencies|peek' pyproject.tomlRepository: apenab/pyrlm-runtime
Length of output: 1046
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Call sites in policy ==="
rg -n -C2 'def _default_tokenizer|import tiktoken|self\.token_counter = _default_tokenizer' vendor/peek/core/policy.py
echo
echo "=== Dependency declarations ==="
rg -n -C3 'tiktoken|optional-dependencies|peek' pyproject.tomlRepository: apenab/pyrlm-runtime
Length of output: 1046
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Call sites in policy ==="
rg -n -C2 'def _default_tokenizer|import tiktoken|token_counter\s*=\s*_default_tokenizer' vendor/peek/core/policy.py
echo
echo "=== Dependency declarations ==="
rg -n -C3 'tiktoken|optional-dependencies|peek' pyproject.tomlRepository: apenab/pyrlm-runtime
Length of output: 1046
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Call sites in policy ==="
rg -n -C2 'def _default_tokenizer|import tiktoken|token_counter\s*=\s*_default_tokenizer' vendor/peek/core/policy.py
echo
echo "=== Dependency declarations ==="
rg -n -C3 'tiktoken|optional-dependencies|peek' pyproject.tomlRepository: apenab/pyrlm-runtime
Length of output: 1046
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Call sites in policy ==="
rg -n -C2 'def _default_tokenizer|import tiktoken|token_counter\s*=\s*_default_tokenizer' vendor/peek/core/policy.py
echo
echo "=== Dependency declarations ==="
rg -n -C3 'tiktoken|optional-dependencies|peek' pyproject.tomlRepository: apenab/pyrlm-runtime
Length of output: 1046
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Call sites in policy ==="
rg -n -C2 'def _default_tokenizer|import tiktoken|self\.token_counter = _default_tokenizer' vendor/peek/core/policy.py
echo
echo "=== Dependency declarations ==="
rg -n -C3 'tiktoken|optional-dependencies|peek' pyproject.tomlRepository: apenab/pyrlm-runtime
Length of output: 1046
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Call sites in policy ==="
rg -n -C2 'def _default_tokenizer|import tiktoken|self\.token_counter\s*=\s*_default_tokenizer' vendor/peek/core/policy.py
echo
echo "=== Dependency declarations ==="
rg -n -C3 'tiktoken|optional-dependencies|peek' pyproject.tomlRepository: apenab/pyrlm-runtime
Length of output: 1046
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Call sites in policy ==="
rg -n -C2 'def _default_tokenizer|import tiktoken|token_counter\s*=\s*_default_tokenizer' vendor/peek/core/policy.py
echo
echo "=== Dependency declarations ==="
rg -n -C3 'tiktoken|optional-dependencies|peek' pyproject.tomlRepository: apenab/pyrlm-runtime
Length of output: 1046
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Call sites in policy ==="
rg -n -C2 'def _default_tokenizer|import tiktoken|token_counter\s*=\s*_default_tokenizer' vendor/peek/core/policy.py
echo
echo "=== Dependency declarations ==="
rg -n -C3 'tiktoken|optional-dependencies|peek' pyproject.tomlRepository: apenab/pyrlm-runtime
Length of output: 1046
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Call sites in policy ==="
rg -n -C2 'def _default_tokenizer|import tiktoken|token_counter\s*=\s*_default_tokenizer' vendor/peek/core/policy.py
echo
echo "=== Dependency declarations ==="
rg -n -C3 'tiktoken|optional-dependencies|peek' pyproject.tomlRepository: apenab/pyrlm-runtime
Length of output: 1046
Guard tiktoken in _default_tokenizer() to prevent CachePolicy() init crashes
In vendor/peek/core/policy.py, _default_tokenizer() unconditionally imports tiktoken (around line 27) and CachePolicy sets self.token_counter = _default_tokenizer() when token_counter is None (around line 74). Since tiktoken is only declared under pyproject.toml’s optional peek extra, constructing CachePolicy() without that extra can raise ModuleNotFoundError.
💡 Proposed fix
def _default_tokenizer() -> TokenCounter:
- import tiktoken
-
- enc = tiktoken.get_encoding("o200k_base")
- return lambda s: len(enc.encode(s))
+ try:
+ import tiktoken
+ except ModuleNotFoundError:
+ # Keep optional integration usable without hard dependency.
+ return lambda s: len(s.split())
+
+ enc = tiktoken.get_encoding("o200k_base")
+ return lambda s: len(enc.encode(s))📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def _default_tokenizer() -> TokenCounter: | |
| import tiktoken | |
| enc = tiktoken.get_encoding("o200k_base") | |
| return lambda s: len(enc.encode(s)) | |
| def _default_tokenizer() -> TokenCounter: | |
| try: | |
| import tiktoken | |
| except ModuleNotFoundError: | |
| # Keep optional integration usable without hard dependency. | |
| return lambda s: len(s.split()) | |
| enc = tiktoken.get_encoding("o200k_base") | |
| return lambda s: len(enc.encode(s)) |
🧰 Tools
🪛 GitHub Actions: Tests / 0_test (3.13).txt
[error] 27-27: Python test run failed due to missing dependency. _default_tokenizer() attempts to import 'tiktoken' and raises ModuleNotFoundError: No module named 'tiktoken'.
🪛 GitHub Actions: Tests / test (3.12)
[error] 27-27: CachePolicy initialization failed: ModuleNotFoundError: No module named 'tiktoken' (import tiktoken in _default_tokenizer).
🪛 GitHub Actions: Tests / test (3.13)
[error] 27-27: ModuleNotFoundError: No module named 'tiktoken' while initializing CachePolicy. In _default_tokenizer(): import tiktoken.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@vendor/peek/core/policy.py` around lines 26 - 30, _cache_policy currently
calls _default_tokenizer() which unconditionally imports tiktoken and can raise
ModuleNotFoundError when the optional dependency isn't installed; modify
_default_tokenizer to catch ImportError (and any tiktoken import/usage errors)
and return a safe fallback TokenCounter (e.g., a function that returns len(s) or
len(s.encode("utf-8")) as an approximate token count) so CachePolicy.__init__
(where self.token_counter = _default_tokenizer() is set when token_counter is
None) will not crash if tiktoken is absent. Ensure the function still uses
tiktoken.get_encoding("o200k_base") when available, but falls back cleanly on
exception.
| @@ -0,0 +1,20 @@ | |||
| from peek.llm.base import LMClient | |||
There was a problem hiding this comment.
Add from __future__ import annotations at the top of this module.
Line 1 starts with a regular import, but this repository requires from __future__ import annotations in all Python files.
Suggested patch
+from __future__ import annotations
+
from peek.llm.base import LMClientAs per coding guidelines: **/*.py: Include from __future__ import annotations in all Python files.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| from peek.llm.base import LMClient | |
| from __future__ import annotations | |
| from peek.llm.base import LMClient |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@vendor/peek/llm/__init__.py` at line 1, Add the future annotations import as
the very first line of this module: insert "from __future__ import annotations"
before the existing "from peek.llm.base import LMClient" import in the
peek.llm.__init__ module so all type annotations are deferred.
| usage = getattr(resp, "usage", None) | ||
| if usage is not None: | ||
| self._last = Usage( | ||
| input_tokens=getattr(usage, "input_tokens", 0) or 0, | ||
| output_tokens=getattr(usage, "output_tokens", 0) or 0, | ||
| ) | ||
| text_blocks = [b.text for b in resp.content if getattr(b, "type", "") == "text"] | ||
| return "".join(text_blocks) |
There was a problem hiding this comment.
Clear stale usage when resp.usage is missing.
Without a reset path, last_usage() can return previous-call token counts after a call that omitted usage metadata.
Suggested fix
usage = getattr(resp, "usage", None)
if usage is not None:
self._last = Usage(
input_tokens=getattr(usage, "input_tokens", 0) or 0,
output_tokens=getattr(usage, "output_tokens", 0) or 0,
)
+ else:
+ self._last = Usage()
text_blocks = [b.text for b in resp.content if getattr(b, "type", "") == "text"]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| usage = getattr(resp, "usage", None) | |
| if usage is not None: | |
| self._last = Usage( | |
| input_tokens=getattr(usage, "input_tokens", 0) or 0, | |
| output_tokens=getattr(usage, "output_tokens", 0) or 0, | |
| ) | |
| text_blocks = [b.text for b in resp.content if getattr(b, "type", "") == "text"] | |
| return "".join(text_blocks) | |
| usage = getattr(resp, "usage", None) | |
| if usage is not None: | |
| self._last = Usage( | |
| input_tokens=getattr(usage, "input_tokens", 0) or 0, | |
| output_tokens=getattr(usage, "output_tokens", 0) or 0, | |
| ) | |
| else: | |
| self._last = Usage() | |
| text_blocks = [b.text for b in resp.content if getattr(b, "type", "") == "text"] | |
| return "".join(text_blocks) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@vendor/peek/llm/anthropic_client.py` around lines 43 - 50, The code currently
only sets self._last when resp.usage is present, leaving stale values otherwise;
update the response-handling in anthropic_client.py (look around the usage =
getattr(resp, "usage", None) block and the return path of the function that
constructs text_blocks) to explicitly clear or reset self._last when resp.usage
is None (e.g., set self._last = None or a Usage with zeroed tokens) so
last_usage() cannot return previous-call counts after a response omits usage
metadata.
| usage = getattr(resp, "usage_metadata", None) | ||
| if usage is not None: | ||
| self._last = Usage( | ||
| input_tokens=getattr(usage, "prompt_token_count", 0) or 0, | ||
| output_tokens=getattr(usage, "candidates_token_count", 0) or 0, | ||
| ) | ||
| return resp.text or "" |
There was a problem hiding this comment.
Reset usage when Gemini usage metadata is absent.
last_usage() should represent the latest completion; current behavior can leak prior-call token counts.
Suggested fix
usage = getattr(resp, "usage_metadata", None)
if usage is not None:
self._last = Usage(
input_tokens=getattr(usage, "prompt_token_count", 0) or 0,
output_tokens=getattr(usage, "candidates_token_count", 0) or 0,
)
+ else:
+ self._last = Usage()
return resp.text or ""🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@vendor/peek/llm/gemini_client.py` around lines 59 - 65, The code currently
leaves self._last unchanged when getattr(resp, "usage_metadata", None) is None,
causing prior token counts to leak; update the block in the gemini response
handling so that if usage is None you explicitly reset self._last =
Usage(input_tokens=0, output_tokens=0) (or an equivalent empty Usage) instead of
skipping assignment; keep using getattr(resp, "usage_metadata", None), the Usage
class and self._last identifier so the latest completion always reflects zeroed
usage when metadata is absent and return resp.text or "" as before.
| usage = getattr(resp, "usage", None) | ||
| if usage is not None: | ||
| self._last = Usage( | ||
| input_tokens=getattr(usage, "prompt_tokens", 0) or 0, | ||
| output_tokens=getattr(usage, "completion_tokens", 0) or 0, | ||
| ) | ||
| return resp.choices[0].message.content or "" |
There was a problem hiding this comment.
Reset usage on calls where provider usage is absent.
When resp.usage is missing, self._last currently keeps prior values, so last_usage() can report stale data instead of the most recent call.
Suggested fix
usage = getattr(resp, "usage", None)
if usage is not None:
self._last = Usage(
input_tokens=getattr(usage, "prompt_tokens", 0) or 0,
output_tokens=getattr(usage, "completion_tokens", 0) or 0,
)
+ else:
+ self._last = Usage()
return resp.choices[0].message.content or ""🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@vendor/peek/llm/openai_client.py` around lines 46 - 52, The code currently
leaves self._last set to prior values when resp.usage is None; update the call
handling in openai_client.py (the resp/usage branch around getattr(resp,
"usage", None)) so that when usage is None you explicitly reset self._last to an
empty Usage (e.g., input_tokens=0, output_tokens=0) instead of leaving the
previous value, ensuring last_usage() reflects the current call; keep the
existing branch that populates self._last when usage is present (and leave the
final return of resp.choices[0].message.content unchanged).
be9e19b to
53f8b52
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (6)
vendor/peek/llm/__init__.py (1)
1-1:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAdd the required future-annotations import at the top of this module.
The file starts with a normal import and misses
from __future__ import annotations, which is required for all Python files in this repo.Suggested patch
+from __future__ import annotations + from peek.llm.base import LMClientAs per coding guidelines:
**/*.py: Includefrom __future__ import annotationsin all Python files.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@vendor/peek/llm/__init__.py` at line 1, Add the required future-annotations import to this module by inserting "from __future__ import annotations" as the very first statement in the file (before the existing "from peek.llm.base import LMClient") so that forward references in type hints are treated as postponed for the entire module; update the top of vendor/peek/llm/__init__.py to include this import above the LMClient import.vendor/peek/__init__.py (1)
1-3:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAdd the required future-annotations import at module top.
from __future__ import annotationsis missing here and should be the first statement in the module.Suggested patch
+from __future__ import annotations + """PEEK: Context Map as an Orientation Cache for Long-Context LLM Agents."""As per coding guidelines:
**/*.py: Includefrom __future__ import annotationsin all Python files.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@vendor/peek/__init__.py` around lines 1 - 3, Add the missing future annotations import as the very first statement in the module: insert "from __future__ import annotations" at the top of vendor/peek/__init__.py before any docstring or other code so that the module-level forward-reference behavior applies; ensure it appears above the existing module docstring and the subsequent "from peek.core import (" import to comply with the project's Python coding guidelines.vendor/peek/llm/anthropic_client.py (1)
43-50:⚠️ Potential issue | 🟠 Major | ⚡ Quick winReset cached usage when the response omits usage metadata.
If
resp.usageis absent,_lastkeeps previous-call values, solast_usage()can report stale token counts.Suggested patch
usage = getattr(resp, "usage", None) if usage is not None: self._last = Usage( input_tokens=getattr(usage, "input_tokens", 0) or 0, output_tokens=getattr(usage, "output_tokens", 0) or 0, ) + else: + self._last = Usage() text_blocks = [b.text for b in resp.content if getattr(b, "type", "") == "text"] return "".join(text_blocks)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@vendor/peek/llm/anthropic_client.py` around lines 43 - 50, The code only updates self._last when resp.usage exists, leaving stale values otherwise; change the logic so that when getattr(resp, "usage", None) is None you explicitly reset self._last to a new Usage with zeroed fields (e.g., Usage(input_tokens=0, output_tokens=0)); update the branch around usage = getattr(resp, "usage", None) to set self._last = Usage(0,0) in the else path so last_usage() returns zeros instead of previous-call values.vendor/peek/core/policy.py (1)
26-30:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winGuard optional
tiktokenimport in_default_tokenizer()to prevent init crashes.
CachePolicy()currently crashes whentiktokenis not installed, even though it’s configured as an optional extra; this is already showing up in CI failures.Suggested patch
def _default_tokenizer() -> TokenCounter: - import tiktoken - - enc = tiktoken.get_encoding("o200k_base") - return lambda s: len(enc.encode(s)) + try: + import tiktoken + except ModuleNotFoundError: + # Fallback keeps optional integration usable without hard dependency. + return lambda s: len(s.encode("utf-8")) + + enc = tiktoken.get_encoding("o200k_base") + return lambda s: len(enc.encode(s))Also applies to: 73-74
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@vendor/peek/core/policy.py` around lines 26 - 30, _guard the optional tiktoken import inside _default_tokenizer so importing the module never raises when tiktoken is absent: wrap the import tiktoken in a try/except ImportError and return a safe fallback TokenCounter (e.g., a simple character/word-count lambda) if import fails; update the same pattern where tiktoken is imported again (lines referenced around the second occurrence) and ensure CachePolicy or other callers still work with the fallback tokenizer rather than causing an import-time crash.vendor/peek/core/__init__.py (1)
1-1:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAdd the required future-annotations import at module top.
This module is missing
from __future__ import annotations, so it currently violates the repository Python-file convention.Suggested patch
+from __future__ import annotations + from peek.core.cartographer import Cartographer from peek.core.context_map import ContextMap, normalize_section from peek.core.distiller import DistillerAs per coding guidelines:
**/*.py: Includefrom __future__ import annotationsin all Python files.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@vendor/peek/core/__init__.py` at line 1, Add the required future import at the top of the module: insert "from __future__ import annotations" as the first import in vendor/peek/core/__init__.py so the module conforms to the repository convention; ensure it appears before the existing "from peek.core.cartographer import Cartographer" import and that no other code precedes it.tests/test_peek_integration.py (1)
17-22:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winExtend
PEEK_AVAILABLEto includetiktoken.
import peekalone is insufficient here; CI showsCachePolicy(...)still fails withModuleNotFoundError: No module named 'tiktoken', so these tests run when they should skip.Suggested fix
try: import peek # noqa: F401 + import tiktoken # noqa: F401 PEEK_AVAILABLE = True except ImportError: PEEK_AVAILABLE = False🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_peek_integration.py` around lines 17 - 22, PEEK_AVAILABLE currently only checks for import peek but CI fails due to missing tiktoken; update the availability check so PEEK_AVAILABLE is True only if both peek and tiktoken are importable (e.g., in the try block attempt to import peek and import tiktoken, or use importlib.util.find_spec('tiktoken') alongside importing peek) so tests that depend on CachePolicy/tiktoken are correctly skipped when tiktoken is absent.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/peek-bench/PEEK-DIAGNOSIS.md`:
- Around line 83-87: The fenced code block containing the message "[cu-00010]
Corpus-level label balance is exact..." is missing a language tag which triggers
markdownlint MD040; update that fenced block (the triple-backtick block that
wraps that message) to include a language identifier such as text (i.e., change
``` to ```text) so the block is explicitly marked as plain text and the linter
warning is resolved.
In `@examples/peek_bench/analyze_peek_trace.py`:
- Around line 232-237: The code is truncating CachePolicy scores to ints when
appending to score history; in the loop that iterates t["map_after"]["items"]
and updates life["score_history"] (using scores = t["map_after"].get("scores",
{}), by_id, item["id"], and q), remove the int(...) cast and append the raw
float score (e.g., scores.get(item["id"], 0.0)) so the full floating-point
decay/penalty signal is preserved.
- Around line 441-443: The loop calls load_context_traces(contexts[cid]) and
immediately passes the result to print_context_report(cid, loaded[cid],
top_items=args.top_items) without checking for empty or None trace data; if a
ctx_* folder has no q*.json files print_context_report will crash on
aggregate/lifetime access. Fix by guarding after loading: check that loaded[cid]
(the result of load_context_traces) is not empty/None and contains the expected
trace entries before calling print_context_report; if empty, skip reporting for
that cid or emit a clear message. Ensure you reference the variables/functions
involved (loaded, load_context_traces, print_context_report, contexts, cid,
args.top_items) when making the change.
In `@tests/test_peek_integration.py`:
- Around line 265-267: The current test allows an empty supplement because it
uses "assert supp == "" or 'Context Map' in supp"; change the assertion in
tests/test_peek_integration.py to require a non-empty supplement and that it
contains the expected marker: ensure the test checks that supp is truthy
(non-empty) and that "Context Map" is present (e.g., replace the existing
condition with a combined requirement such as "supp and 'Context Map' in supp"),
so the test fails when the supplement is empty or missing the expected content.
---
Duplicate comments:
In `@tests/test_peek_integration.py`:
- Around line 17-22: PEEK_AVAILABLE currently only checks for import peek but CI
fails due to missing tiktoken; update the availability check so PEEK_AVAILABLE
is True only if both peek and tiktoken are importable (e.g., in the try block
attempt to import peek and import tiktoken, or use
importlib.util.find_spec('tiktoken') alongside importing peek) so tests that
depend on CachePolicy/tiktoken are correctly skipped when tiktoken is absent.
In `@vendor/peek/__init__.py`:
- Around line 1-3: Add the missing future annotations import as the very first
statement in the module: insert "from __future__ import annotations" at the top
of vendor/peek/__init__.py before any docstring or other code so that the
module-level forward-reference behavior applies; ensure it appears above the
existing module docstring and the subsequent "from peek.core import (" import to
comply with the project's Python coding guidelines.
In `@vendor/peek/core/__init__.py`:
- Line 1: Add the required future import at the top of the module: insert "from
__future__ import annotations" as the first import in
vendor/peek/core/__init__.py so the module conforms to the repository
convention; ensure it appears before the existing "from peek.core.cartographer
import Cartographer" import and that no other code precedes it.
In `@vendor/peek/core/policy.py`:
- Around line 26-30: _guard the optional tiktoken import inside
_default_tokenizer so importing the module never raises when tiktoken is absent:
wrap the import tiktoken in a try/except ImportError and return a safe fallback
TokenCounter (e.g., a simple character/word-count lambda) if import fails;
update the same pattern where tiktoken is imported again (lines referenced
around the second occurrence) and ensure CachePolicy or other callers still work
with the fallback tokenizer rather than causing an import-time crash.
In `@vendor/peek/llm/__init__.py`:
- Line 1: Add the required future-annotations import to this module by inserting
"from __future__ import annotations" as the very first statement in the file
(before the existing "from peek.llm.base import LMClient") so that forward
references in type hints are treated as postponed for the entire module; update
the top of vendor/peek/llm/__init__.py to include this import above the LMClient
import.
In `@vendor/peek/llm/anthropic_client.py`:
- Around line 43-50: The code only updates self._last when resp.usage exists,
leaving stale values otherwise; change the logic so that when getattr(resp,
"usage", None) is None you explicitly reset self._last to a new Usage with
zeroed fields (e.g., Usage(input_tokens=0, output_tokens=0)); update the branch
around usage = getattr(resp, "usage", None) to set self._last = Usage(0,0) in
the else path so last_usage() returns zeros instead of previous-call values.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 39e0efc3-ef6f-4791-8115-2a0ebea1f852
📒 Files selected for processing (29)
.gitignoreCLAUDE.mddocs/peek-bench/PEEK-DIAGNOSIS.mddocs/peek-bench/PEEK-EXPERIMENTS.mdexamples/peek_bench/__init__.pyexamples/peek_bench/analyze_peek_trace.pyexamples/peek_bench/run_oolong_rlm_vs_peek.pypyproject.tomlsrc/pyrlm_runtime/peek_integration.pytests/test_peek_integration.pyvendor/peek/LICENSEvendor/peek/VENDOR.mdvendor/peek/__init__.pyvendor/peek/_io.pyvendor/peek/core/__init__.pyvendor/peek/core/cartographer.pyvendor/peek/core/context_map.pyvendor/peek/core/distiller.pyvendor/peek/core/evictor.pyvendor/peek/core/policy.pyvendor/peek/core/types.pyvendor/peek/llm/__init__.pyvendor/peek/llm/anthropic_client.pyvendor/peek/llm/base.pyvendor/peek/llm/gemini_client.pyvendor/peek/llm/openai_client.pyvendor/peek/prompts/cartographer.txtvendor/peek/prompts/distiller.txtvendor/peek/prompts/initial_context_map.txt
✅ Files skipped from review due to trivial changes (5)
- .gitignore
- vendor/peek/LICENSE
- vendor/peek/prompts/initial_context_map.txt
- CLAUDE.md
- vendor/peek/VENDOR.md
| ``` | ||
| [cu-00010] Corpus-level label balance is exact: correct = 39,060 and | ||
| incorrect = 39,060. | ||
| ``` | ||
|
|
There was a problem hiding this comment.
Add a language to the fenced code block.
This triggers markdownlint MD040 and is trivial to fix.
Suggested fix
-```
+```text
[cu-00010] Corpus-level label balance is exact: correct = 39,060 and
incorrect = 39,060.</details>
<details>
<summary>🧰 Tools</summary>
<details>
<summary>🪛 markdownlint-cli2 (0.22.1)</summary>
[warning] 83-83: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
</details>
</details>
<details>
<summary>🤖 Prompt for AI Agents</summary>
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @docs/peek-bench/PEEK-DIAGNOSIS.md around lines 83 - 87, The fenced code
block containing the message "[cu-00010] Corpus-level label balance is exact..."
is missing a language tag which triggers markdownlint MD040; update that fenced
block (the triple-backtick block that wraps that message) to include a language
identifier such as text (i.e., change totext) so the block is explicitly
marked as plain text and the linter warning is resolved.
</details>
<!-- fingerprinting:phantom:poseidon:hawk -->
<!-- This is an auto-generated comment by CodeRabbit -->
| scores = t["map_after"].get("scores", {}) | ||
| for item in t["map_after"]["items"]: | ||
| life = by_id.get(item["id"]) | ||
| if life is not None: | ||
| life["score_history"].append((q, int(scores.get(item["id"], 0)))) | ||
| # Death detection: items evicted at this query |
There was a problem hiding this comment.
Do not truncate score history to integers.
CachePolicy scores are floats; converting to int loses decay/penalty signal and can misclassify “sticky-zero” behavior.
Suggested fix
- life["score_history"].append((q, int(scores.get(item["id"], 0))))
+ life["score_history"].append((q, float(scores.get(item["id"], 0.0))))📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| scores = t["map_after"].get("scores", {}) | |
| for item in t["map_after"]["items"]: | |
| life = by_id.get(item["id"]) | |
| if life is not None: | |
| life["score_history"].append((q, int(scores.get(item["id"], 0)))) | |
| # Death detection: items evicted at this query | |
| scores = t["map_after"].get("scores", {}) | |
| for item in t["map_after"]["items"]: | |
| life = by_id.get(item["id"]) | |
| if life is not None: | |
| life["score_history"].append((q, float(scores.get(item["id"], 0.0)))) | |
| # Death detection: items evicted at this query |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@examples/peek_bench/analyze_peek_trace.py` around lines 232 - 237, The code
is truncating CachePolicy scores to ints when appending to score history; in the
loop that iterates t["map_after"]["items"] and updates life["score_history"]
(using scores = t["map_after"].get("scores", {}), by_id, item["id"], and q),
remove the int(...) cast and append the raw float score (e.g.,
scores.get(item["id"], 0.0)) so the full floating-point decay/penalty signal is
preserved.
| # After update, the Cartographer added a roadmap entry — map is non-empty | ||
| assert supp == "" or "Context Map" in supp | ||
|
|
There was a problem hiding this comment.
Make the non-empty supplement test strict.
This assertion passes even when the supplement is empty, which hides regressions.
Suggested fix
- assert supp == "" or "Context Map" in supp
+ assert "Context Map" in supp🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/test_peek_integration.py` around lines 265 - 267, The current test
allows an empty supplement because it uses "assert supp == "" or 'Context Map'
in supp"; change the assertion in tests/test_peek_integration.py to require a
non-empty supplement and that it contains the expected marker: ensure the test
checks that supp is truthy (non-empty) and that "Context Map" is present (e.g.,
replace the existing condition with a combined requirement such as "supp and
'Context Map' in supp"), so the test fails when the supplement is empty or
missing the expected content.
53f8b52 to
78e24f0
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (10)
vendor/peek/core/__init__.py (1)
1-1:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAdd required future-annotations import at module top.
This module is missing
from __future__ import annotations.Suggested fix
+from __future__ import annotations + from peek.core.cartographer import Cartographer from peek.core.context_map import ContextMap, normalize_sectionAs per coding guidelines:
**/*.py: Usefrom __future__ import annotationsin all Python files.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@vendor/peek/core/__init__.py` at line 1, The module is missing the required future annotations import; add the line "from __future__ import annotations" at the very top of this module before any other code so that type annotations are deferred, ensuring compliance with the project's guideline for files like the one that imports Cartographer (see the import "from peek.core.cartographer import Cartographer").vendor/peek/llm/openai_client.py (1)
46-52:⚠️ Potential issue | 🟠 Major | ⚡ Quick winReset
_lastwhen usage metadata is missing.Without a reset path,
last_usage()can return stale token counts from a previous call.Suggested fix
usage = getattr(resp, "usage", None) if usage is not None: self._last = Usage( input_tokens=getattr(usage, "prompt_tokens", 0) or 0, output_tokens=getattr(usage, "completion_tokens", 0) or 0, ) + else: + self._last = Usage() return resp.choices[0].message.content or ""🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@vendor/peek/llm/openai_client.py` around lines 46 - 52, The code currently only sets self._last when resp.usage exists, leaving stale values for last_usage(); update the handler in the method that reads resp to reset self._last to None (or Usage.zero) when usage metadata is missing so last_usage() does not return previous counts—modify the block that checks getattr(resp, "usage", None) to explicitly assign self._last = None (or a zeroed Usage) in the else/missing case; reference symbols: self._last, Usage, and last_usage().vendor/peek/llm/anthropic_client.py (1)
43-50:⚠️ Potential issue | 🟠 Major | ⚡ Quick winClear
_lastwhenresp.usageis absent.Current flow can keep stale usage from prior responses.
Suggested fix
usage = getattr(resp, "usage", None) if usage is not None: self._last = Usage( input_tokens=getattr(usage, "input_tokens", 0) or 0, output_tokens=getattr(usage, "output_tokens", 0) or 0, ) + else: + self._last = Usage() text_blocks = [b.text for b in resp.content if getattr(b, "type", "") == "text"]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@vendor/peek/llm/anthropic_client.py` around lines 43 - 50, The code currently only updates self._last when resp.usage exists, which can leave a stale Usage instance; modify the response handling in the method that processes resp (the block referencing resp, usage, and constructing Usage) so that if getattr(resp, "usage", None) is None you explicitly set self._last = None, otherwise set self._last = Usage(input_tokens=..., output_tokens=...); keep the existing text_blocks extraction/return logic unchanged so only the _last clearing behavior is added.vendor/peek/_io.py (1)
14-33:⚠️ Potential issue | 🟠 Major | ⚡ Quick winEnforce dict-only returns in
extract_json.
json.loadscan return non-dict JSON (e.g., list/scalar), but this function promisesdict[str, Any] | None.Suggested fix
def extract_json(text: str) -> dict[str, Any] | None: """Best-effort JSON extraction from an LLM response.""" + def _parse_object(candidate: str) -> dict[str, Any] | None: + try: + parsed = json.loads(candidate) + except json.JSONDecodeError: + return None + return parsed if isinstance(parsed, dict) else None + s = text.strip() - try: - return json.loads(s) - except json.JSONDecodeError: - pass + parsed = _parse_object(s) + if parsed is not None: + return parsed for block in _scan_fenced_blocks(text): - try: - return json.loads(block.strip()) - except json.JSONDecodeError: - continue + parsed = _parse_object(block.strip()) + if parsed is not None: + return parsed for blob in _scan_balanced_braces(text): - try: - return json.loads(blob) - except json.JSONDecodeError: - continue + parsed = _parse_object(blob) + if parsed is not None: + return parsed return None🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@vendor/peek/_io.py` around lines 14 - 33, The extract_json function currently returns whatever json.loads yields (which may be list or scalar) but its signature promises dict[str, Any] | None; change the logic in extract_json to only return a value when the parsed result is an instance of dict and otherwise continue searching; specifically, after each json.loads call in the initial parse, in the loop over _scan_fenced_blocks, and in the loop over _scan_balanced_braces, check isinstance(parsed, dict) and return parsed only when True, otherwise treat it as a JSONDecode/invalid type and continue; keep the overall return None if no dict is found.vendor/peek/llm/gemini_client.py (1)
59-65:⚠️ Potential issue | 🟠 Major | ⚡ Quick winReset usage state when Gemini omits
usage_metadata.
last_usage()should always reflect the latest completion, including zeroed usage when metadata is absent.Suggested fix
usage = getattr(resp, "usage_metadata", None) if usage is not None: self._last = Usage( input_tokens=getattr(usage, "prompt_token_count", 0) or 0, output_tokens=getattr(usage, "candidates_token_count", 0) or 0, ) + else: + self._last = Usage() return resp.text or ""🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@vendor/peek/llm/gemini_client.py` around lines 59 - 65, The current logic in Gemini client only updates self._last when resp.usage_metadata exists, so last_usage() can incorrectly return stale data when the API omits usage_metadata; inside the response handling block (the code setting usage = getattr(resp, "usage_metadata", None) in gemini_client.py) ensure that when usage is None you explicitly reset self._last = Usage(input_tokens=0, output_tokens=0) (i.e., assign a zeroed Usage object) so last_usage() always reflects the most recent completion; update the branch that currently only sets self._last on non-None usage to also set the zeroed Usage when usage is None.vendor/peek/core/cartographer.py (1)
62-63:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winStrip whitespace-only
contentbefore accepting ADD/REPLACE.
contentcurrently passes when it is only spaces, so empty-value edits can still be emitted.Suggested patch
if kind == "ADD": section = op.get("section") content = op.get("content") - if not isinstance(section, str) or not isinstance(content, str) or not content: + if not isinstance(section, str) or not isinstance(content, str): continue + content = content.strip() + if not content: + continue section_norm = normalize_section(section) if section_norm not in _ALLOWED_SECTIONS: continue out.append(Operation(type="ADD", section=section_norm, content=content)) @@ elif kind == "REPLACE": item_id = op.get("item_id") or op.get("bullet_id") content = op.get("content") - if isinstance(item_id, str) and item_id and isinstance(content, str) and content: - out.append(Operation(type="REPLACE", item_id=item_id, content=content)) + if isinstance(item_id, str) and item_id and isinstance(content, str): + content = content.strip() + if content: + out.append(Operation(type="REPLACE", item_id=item_id, content=content))Also applies to: 75-76
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@vendor/peek/core/cartographer.py` around lines 62 - 63, The guard that skips non-string or empty edits should also treat whitespace-only strings as empty: replace the current checks that use "not content" with a check on "not content.strip()" so content consisting only of spaces is rejected; update both occurrences that reference "section" and "content" (the IF that currently reads "if not isinstance(section, str) or not isinstance(content, str) or not content: continue" and the similar check at the other occurrence) to use content.strip() instead.vendor/peek/core/policy.py (1)
26-30:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winGuard optional
tiktokenimport in the default tokenizer path.Line 27 unconditionally imports
tiktoken, soCachePolicy()fails at Line 74 when the optional extra is absent (already reflected in CI failures).Suggested patch
def _default_tokenizer() -> TokenCounter: - import tiktoken - - enc = tiktoken.get_encoding("o200k_base") - return lambda s: len(enc.encode(s)) + try: + import tiktoken + + enc = tiktoken.get_encoding("o200k_base") + return lambda s: len(enc.encode(s)) + except Exception: + # Optional dependency fallback. + return lambda s: len(s.encode("utf-8"))#!/bin/bash set -euo pipefail echo "=== policy tokenizer call path ===" rg -n -C2 'def _default_tokenizer|import tiktoken|self\.token_counter = _default_tokenizer' vendor/peek/core/policy.py echo echo "=== dependency declaration context ===" rg -n -C3 'optional-dependencies|peek|tiktoken' pyproject.tomlAlso applies to: 74-74
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@vendor/peek/core/policy.py` around lines 26 - 30, The _default_tokenizer currently does an unconditional import of tiktoken which causes CachePolicy initialization (where self.token_counter = _default_tokenizer) to fail if the optional tiktoken dependency is missing; update _default_tokenizer to guard the import (try/except ImportError) and return a safe fallback tokenizer (e.g., a simple len(s) or whitespace-based counter) when tiktoken is not available, ensuring CachePolicy and any call sites using token_counter work without the optional extra.vendor/peek/__init__.py (1)
1-3:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAdd postponed-annotations import at module top.
This module is missing
from __future__ import annotations.Suggested patch
+from __future__ import annotations + """PEEK: Context Map as an Orientation Cache for Long-Context LLM Agents."""As per coding guidelines:
**/*.py: Usefrom __future__ import annotationsin all Python files.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@vendor/peek/__init__.py` around lines 1 - 3, Add the future annotations import at the top of the module: insert "from __future__ import annotations" as the first import in vendor/peek/__init__.py (above the module docstring or immediately after a shebang/encoding if present) so all forward references in this module (e.g., types used in the public API that involve peek.core imports) are postponed; ensure it appears before the existing "from peek.core import (" block.tests/test_peek_integration.py (2)
261-266:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winWeak assertion allows empty supplement to pass.
The test name is
test_system_prompt_supplement_nonempty_after_update, but the assertionsupp == "" or "Context Map" in supppasses whensuppis empty. Since the stub's_CARTOGRAPHER_RESPONSEadds a roadmap entry, the map should be non-empty after update.Proposed fix
- assert supp == "" or "Context Map" in supp + assert "Context Map" in supp🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_peek_integration.py` around lines 261 - 266, The test test_system_prompt_supplement_nonempty_after_update currently allows an empty supplement because it asserts supp == "" or "Context Map" in supp; change this to assert the supplement is non-empty and includes the expected roadmap marker produced by the stub (e.g., replace the assertion with assert "Context Map" in supp or assert supp != "" and "Context Map" in supp) so that system_prompt_supplement (examined after session.update_from_run with _make_simple_trace and the stubbed _CARTOGRAPHER_RESPONSE) must contain the roadmap entry.
17-22:⚠️ Potential issue | 🔴 Critical | ⚡ Quick win
PEEK_AVAILABLEcheck must includetiktoken— allCachePolicytests are failing.The pipeline failures confirm that
import peeksucceeds butCachePolicy(...)raisesModuleNotFoundError: No module named 'tiktoken'. Thetiktokenpackage is a runtime dependency ofCachePolicy.__post_init__()(via_default_tokenizer()), so tests skip the wrong condition.Proposed fix
try: import peek # noqa: F401 + import tiktoken # noqa: F401 PEEK_AVAILABLE = True except ImportError: PEEK_AVAILABLE = False🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_peek_integration.py` around lines 17 - 22, The test's PEEK_AVAILABLE check only tries to import peek but must also ensure runtime tokenizer dependency is present; update the test (PEEK_AVAILABLE) to attempt importing both peek and tiktoken (or otherwise call CachePolicy.__post_init__ dependency path) so that ModuleNotFoundError from CachePolicy.__post_init__/_default_tokenizer is caught and PEEK_AVAILABLE is set False when tiktoken is missing; locate the variable PEEK_AVAILABLE in tests/test_peek_integration.py and change the try/except to import peek and import tiktoken (or perform a lightweight check that _default_tokenizer will succeed) so CachePolicy tests are skipped when tiktoken is not installed.
🧹 Nitpick comments (1)
tests/test_peek_integration.py (1)
322-327: 💤 Low valueTest name and body are mismatched.
test_load_raises_without_peekdoesn't test theImportErrorpath — it just creates a session normally viaPeekSession.create(). The test duplicatestest_create_factory_uses_model_adapterand the name is misleading.Consider either renaming to reflect what it actually tests, or replacing with a test that patches
_require_peekto raise and verifies the exception propagates.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_peek_integration.py` around lines 322 - 327, The test named test_load_raises_without_peek is misleading because it does not exercise the ImportError path; either rename it to reflect that it verifies PeekSession.create succeeds when peek is importable (e.g., test_create_session_with_peek_present) or change its body to patch the helper _require_peek to raise ImportError and assert that PeekSession.create(...) propagates that error; locate the test function by name test_load_raises_without_peek and update either the name or replace the body to use monkeypatch to make _require_peek raise and assert the expected exception from PeekSession.create.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@examples/peek_bench/analyze_peek_trace.py`:
- Around line 380-391: The aggregation helper agg(rows: list[dict[str, Any]])
can divide by len(rows) and call max()/rows[-1] on empty lists; update agg to
first check if not rows and return a safe default dict (e.g., zeros for
avg_items/avg_added/avg_evicted/avg_ops/helpful_tags/harmful_tags/neutral_tags/max_dup_pairs/final_items
or None where appropriate) so it never raises ZeroDivisionError or ValueError,
and ensure compare_contexts continues to call agg for each context without
additional checks. Use the existing symbol agg to implement this empty-list
guard and populate the same keys as the current return shape.
---
Duplicate comments:
In `@tests/test_peek_integration.py`:
- Around line 261-266: The test
test_system_prompt_supplement_nonempty_after_update currently allows an empty
supplement because it asserts supp == "" or "Context Map" in supp; change this
to assert the supplement is non-empty and includes the expected roadmap marker
produced by the stub (e.g., replace the assertion with assert "Context Map" in
supp or assert supp != "" and "Context Map" in supp) so that
system_prompt_supplement (examined after session.update_from_run with
_make_simple_trace and the stubbed _CARTOGRAPHER_RESPONSE) must contain the
roadmap entry.
- Around line 17-22: The test's PEEK_AVAILABLE check only tries to import peek
but must also ensure runtime tokenizer dependency is present; update the test
(PEEK_AVAILABLE) to attempt importing both peek and tiktoken (or otherwise call
CachePolicy.__post_init__ dependency path) so that ModuleNotFoundError from
CachePolicy.__post_init__/_default_tokenizer is caught and PEEK_AVAILABLE is set
False when tiktoken is missing; locate the variable PEEK_AVAILABLE in
tests/test_peek_integration.py and change the try/except to import peek and
import tiktoken (or perform a lightweight check that _default_tokenizer will
succeed) so CachePolicy tests are skipped when tiktoken is not installed.
In `@vendor/peek/__init__.py`:
- Around line 1-3: Add the future annotations import at the top of the module:
insert "from __future__ import annotations" as the first import in
vendor/peek/__init__.py (above the module docstring or immediately after a
shebang/encoding if present) so all forward references in this module (e.g.,
types used in the public API that involve peek.core imports) are postponed;
ensure it appears before the existing "from peek.core import (" block.
In `@vendor/peek/_io.py`:
- Around line 14-33: The extract_json function currently returns whatever
json.loads yields (which may be list or scalar) but its signature promises
dict[str, Any] | None; change the logic in extract_json to only return a value
when the parsed result is an instance of dict and otherwise continue searching;
specifically, after each json.loads call in the initial parse, in the loop over
_scan_fenced_blocks, and in the loop over _scan_balanced_braces, check
isinstance(parsed, dict) and return parsed only when True, otherwise treat it as
a JSONDecode/invalid type and continue; keep the overall return None if no dict
is found.
In `@vendor/peek/core/__init__.py`:
- Line 1: The module is missing the required future annotations import; add the
line "from __future__ import annotations" at the very top of this module before
any other code so that type annotations are deferred, ensuring compliance with
the project's guideline for files like the one that imports Cartographer (see
the import "from peek.core.cartographer import Cartographer").
In `@vendor/peek/core/cartographer.py`:
- Around line 62-63: The guard that skips non-string or empty edits should also
treat whitespace-only strings as empty: replace the current checks that use "not
content" with a check on "not content.strip()" so content consisting only of
spaces is rejected; update both occurrences that reference "section" and
"content" (the IF that currently reads "if not isinstance(section, str) or not
isinstance(content, str) or not content: continue" and the similar check at the
other occurrence) to use content.strip() instead.
In `@vendor/peek/core/policy.py`:
- Around line 26-30: The _default_tokenizer currently does an unconditional
import of tiktoken which causes CachePolicy initialization (where
self.token_counter = _default_tokenizer) to fail if the optional tiktoken
dependency is missing; update _default_tokenizer to guard the import (try/except
ImportError) and return a safe fallback tokenizer (e.g., a simple len(s) or
whitespace-based counter) when tiktoken is not available, ensuring CachePolicy
and any call sites using token_counter work without the optional extra.
In `@vendor/peek/llm/anthropic_client.py`:
- Around line 43-50: The code currently only updates self._last when resp.usage
exists, which can leave a stale Usage instance; modify the response handling in
the method that processes resp (the block referencing resp, usage, and
constructing Usage) so that if getattr(resp, "usage", None) is None you
explicitly set self._last = None, otherwise set self._last =
Usage(input_tokens=..., output_tokens=...); keep the existing text_blocks
extraction/return logic unchanged so only the _last clearing behavior is added.
In `@vendor/peek/llm/gemini_client.py`:
- Around line 59-65: The current logic in Gemini client only updates self._last
when resp.usage_metadata exists, so last_usage() can incorrectly return stale
data when the API omits usage_metadata; inside the response handling block (the
code setting usage = getattr(resp, "usage_metadata", None) in gemini_client.py)
ensure that when usage is None you explicitly reset self._last =
Usage(input_tokens=0, output_tokens=0) (i.e., assign a zeroed Usage object) so
last_usage() always reflects the most recent completion; update the branch that
currently only sets self._last on non-None usage to also set the zeroed Usage
when usage is None.
In `@vendor/peek/llm/openai_client.py`:
- Around line 46-52: The code currently only sets self._last when resp.usage
exists, leaving stale values for last_usage(); update the handler in the method
that reads resp to reset self._last to None (or Usage.zero) when usage metadata
is missing so last_usage() does not return previous counts—modify the block that
checks getattr(resp, "usage", None) to explicitly assign self._last = None (or a
zeroed Usage) in the else/missing case; reference symbols: self._last, Usage,
and last_usage().
---
Nitpick comments:
In `@tests/test_peek_integration.py`:
- Around line 322-327: The test named test_load_raises_without_peek is
misleading because it does not exercise the ImportError path; either rename it
to reflect that it verifies PeekSession.create succeeds when peek is importable
(e.g., test_create_session_with_peek_present) or change its body to patch the
helper _require_peek to raise ImportError and assert that
PeekSession.create(...) propagates that error; locate the test function by name
test_load_raises_without_peek and update either the name or replace the body to
use monkeypatch to make _require_peek raise and assert the expected exception
from PeekSession.create.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 07a9469a-64c3-42b3-aead-bea9302bc71b
📒 Files selected for processing (28)
CLAUDE.mddocs/peek-bench/PEEK-DIAGNOSIS.mddocs/peek-bench/PEEK-EXPERIMENTS.mdexamples/peek_bench/__init__.pyexamples/peek_bench/analyze_peek_trace.pyexamples/peek_bench/run_oolong_rlm_vs_peek.pypyproject.tomlsrc/pyrlm_runtime/peek_integration.pytests/test_peek_integration.pyvendor/peek/LICENSEvendor/peek/VENDOR.mdvendor/peek/__init__.pyvendor/peek/_io.pyvendor/peek/core/__init__.pyvendor/peek/core/cartographer.pyvendor/peek/core/context_map.pyvendor/peek/core/distiller.pyvendor/peek/core/evictor.pyvendor/peek/core/policy.pyvendor/peek/core/types.pyvendor/peek/llm/__init__.pyvendor/peek/llm/anthropic_client.pyvendor/peek/llm/base.pyvendor/peek/llm/gemini_client.pyvendor/peek/llm/openai_client.pyvendor/peek/prompts/cartographer.txtvendor/peek/prompts/distiller.txtvendor/peek/prompts/initial_context_map.txt
✅ Files skipped from review due to trivial changes (4)
- vendor/peek/prompts/initial_context_map.txt
- vendor/peek/VENDOR.md
- vendor/peek/LICENSE
- CLAUDE.md
| def agg(rows: list[dict[str, Any]]) -> dict[str, float]: | ||
| return { | ||
| "avg_items": sum(r["items_after"] for r in rows) / len(rows), | ||
| "avg_added": sum(r["added"] for r in rows) / len(rows), | ||
| "avg_evicted": sum(r["evicted"] for r in rows) / len(rows), | ||
| "avg_ops": sum(r["ops_applied"] for r in rows) / len(rows), | ||
| "helpful_tags": sum(r["tag_helpful"] for r in rows), | ||
| "harmful_tags": sum(r["tag_harmful"] for r in rows), | ||
| "neutral_tags": sum(r["tag_neutral"] for r in rows), | ||
| "max_dup_pairs": max(r["duplicate_pairs"] for r in rows), | ||
| "final_items": rows[-1]["items_after"], | ||
| } |
There was a problem hiding this comment.
Guard against empty trace lists in compare_contexts.
If either context has no traces, agg() will raise ZeroDivisionError on division by len(rows).
Suggested fix
def agg(rows: list[dict[str, Any]]) -> dict[str, float]:
+ if not rows:
+ return {
+ "avg_items": 0.0, "avg_added": 0.0, "avg_evicted": 0.0,
+ "avg_ops": 0.0, "helpful_tags": 0.0, "harmful_tags": 0.0,
+ "neutral_tags": 0.0, "max_dup_pairs": 0.0, "final_items": 0.0,
+ }
return {
"avg_items": sum(r["items_after"] for r in rows) / len(rows),🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@examples/peek_bench/analyze_peek_trace.py` around lines 380 - 391, The
aggregation helper agg(rows: list[dict[str, Any]]) can divide by len(rows) and
call max()/rows[-1] on empty lists; update agg to first check if not rows and
return a safe default dict (e.g., zeros for
avg_items/avg_added/avg_evicted/avg_ops/helpful_tags/harmful_tags/neutral_tags/max_dup_pairs/final_items
or None where appropriate) so it never raises ZeroDivisionError or ValueError,
and ensure compare_contexts continues to call agg for each context without
additional checks. Use the existing symbol agg to implement this empty-list
guard and populate the same keys as the current return shape.
Documents the pyrlm-runtime project's structure, code style, testing conventions, REPL backends, design constraints (the "library is generic" rule), and the empirical research methodology that gates benchmark-driven work.
Adds PEEK orientation-cache (arXiv:2605.19932) support to pyrlm-runtime,
vendored at vendor/peek/ from zhuohangu/peek commit 57de91ac (Apache-2.0
preserved with VENDOR.md). PeekSession bridges our ModelAdapter to peek's
LMClient protocol and injects the map through the existing system_prompt
_supplement slot on RLM — no changes to the core API.
What's included:
- vendor/peek/: vendored upstream copy with two patches applied inline,
documented with `# peek-patch` comments:
* vendor/peek/core/evictor.py — score-decay patch (C3): existing scores
are multiplied by SCORE_DECAY=0.85 before each tagging pass, and
neutral tags contribute -NEUTRAL_PENALTY=-0.5 instead of 0. Constants
1.0/0.0 recover upstream behaviour exactly.
* vendor/peek/_io.py — extract_json now scans fenced code blocks with
an O(n) state machine instead of `re.findall(r"```...(.*?)\s*```",
re.DOTALL | re.IGNORECASE)`. The regex catastrophically backtracks on
LLM output with an unclosed fence; reproduced as a 45-minute hung
benchmark run.
- src/pyrlm_runtime/peek_integration.py: PeekSession, _PeekLMClientAdapter,
and trace_to_peek_trajectory. Optional trace_dir writes per-update JSON
snapshots (query, trajectory, map before/after with item scores, parsed
Distiller and Cartographer outputs, diff) for offline diagnosis.
- examples/peek_bench/run_oolong_rlm_vs_peek.py: benchmark harness comparing
RLM vs RLM+PEEK on oolongbench/oolong-synth (the only OOLONG dataset on
HuggingFace as of 2026-05-25). Supports explicit context selection,
fully-online evolution, and per-query JSON tracing via --trace.
- examples/peek_bench/analyze_peek_trace.py: diagnostic analyzer reporting
per-query churn, Distiller tag distribution, Cartographer operation
breakdown, per-item lifetimes, and duplicate rate.
- tests/test_peek_integration.py: 24 tests (FakeAdapter / stub LMClient
driven), including regression tests for the trace_dir behaviour and
the catastrophic-backtracking bug fix.
- docs/peek-bench/PEEK-EXPERIMENTS.md: full experiment log — pre-committed
hypothesis and decision rule, Pilot, Phase 2 (N=27), Phase 3 (N=5 large
contexts), Phase 4 (vendoring + C3 patch + C5' ablation + bug fix).
- docs/peek-bench/PEEK-DIAGNOSIS.md: trace-based diagnosis of PEEK
upstream failure modes — silent-failures audit (refuted), Distiller
blindness (confirmed and addressed by C3), hallucination, etc.
Phase 4.C.1 (C3 score decay) result: +5.7pp aggregate improvement vs
upstream in an in-session A/B (n=5 large contexts). agnews +18.3pp,
multinli +7.4pp, metaphors +4.0pp; imdb and negation within the LLM
noise floor. Trace-level mechanism confirmed (metaphors neutral-tag
rate dropped 29% -> 4%).
Phase 4.C.2 (C5' section-weighted decay + rr-* birth bonus) result:
REJECTED across 3 paired A/B trials (mean -3.2pp vs C3 alone, dominated
by per-sub-dataset noise of ±5–20pp on N=1). Mechanism kept in vendored
library only via the validated C3 globals; per-section dicts removed
from the working tree. Full ablation documented in PEEK-EXPERIMENTS.md.
peek-ai vendored — no PyPI install needed. tiktoken>=0.7.0 added as the
`peek` extra. pyproject.toml updated to import vendor/peek/ as `peek`.
78e24f0 to
169853e
Compare
Summary
Integrates PEEK (Gu et al., MIT CSAIL)
into pyrlm-runtime as a vendored, instrumented, and empirically improved
orientation cache.
This PR supersedes #33 — that one added the bare integration with
upstream peek-ai installed from pip; this one vendors peek-ai locally
and applies a validated patch on top.
What's in the box
PeekSessionwrapping peek-ai'sCachePolicy; map injected via existingsystem_prompt_supplementslot onRLMSCORE_DECAY=0.85,NEUTRAL_PENALTY=−0.5) invendor/peek/core/evictor.pyto remove the "score-zero stickiness" failure modevendor/peek/_io.py:extract_jsonreplaced regex-based fenced-block scan (which catastrophically backtracked on LLM output with an unclosed fence) with an O(n) state machinePeekSession(trace_dir=…)writes per-update JSON snapshots (query, trajectory, map before/after with item scores, parsed Distiller + Cartographer output, diff)examples/peek_bench/run_oolong_rlm_vs_peek.py(RLM vs RLM+PEEK on oolongbench/oolong-synth, with--trace)examples/peek_bench/analyze_peek_trace.py(per-query churn, Distiller tags, Cartographer ops, item lifetimes)Ablation that was tested and rejected (documented)
C5' — section-weighted decay +
rr-*birth bonus. Hypothesised thatconcrete computed answers (
reusable_results) should decay slower thanabstract claims (
context_understanding). Three paired A/B trials vsC3 alone gave a mean −3.2pp patch effect, dominated by per-sub-dataset
LLM noise of ±5–20pp at N=1. Documented as a negative ablation in
PEEK-EXPERIMENTS.mdPhase 4.C.2 with the full data + decision-ruletrail. Code removed from the working tree; only the validated C3 globals
remain in the vendored library.
Vendoring
vendor/peek/mirrorszhuohangu/peekat commit57de91acwithApache-2.0 license preserved.
VENDOR.mddocuments the commit andpatch policy (every diff vs upstream marked with a
# peek-patch:inline comment).
Files
src/pyrlm_runtime/peek_integration.py—PeekSession, adapter,trace serializer
vendor/peek/— vendored upstream + C3 patch + bug fixexamples/peek_bench/— benchmark harness + analyzerdocs/peek-bench/PEEK-EXPERIMENTS.md— full experiment logdocs/peek-bench/PEEK-DIAGNOSIS.md— trace-based diagnosis findingstests/test_peek_integration.py— 24 tests (FakeAdapter + stub LMClient)pyproject.toml—peekextra now contains onlytiktoken>=0.7.0Test plan
uv sync --extra peek— installs tiktoken; vendored peek imports OKuv run pytest tests/test_peek_integration.py -v— 24/24 passuv run pytest— full suite greenuv run ruff check src/ vendor/peek/ examples/peek_bench/ tests/— cleanPEEK-EXPERIMENTS.mdend-to-end for the empirical story🤖 Generated with Claude Code
Summary by CodeRabbit
Release Notes
New Features
Tests