Skip to content

feat(peek): vendor peek-ai with score-decay patch, tracing, and bug fix#34

Open
apenab wants to merge 2 commits into
mainfrom
feat/peek-improvement
Open

feat(peek): vendor peek-ai with score-decay patch, tracing, and bug fix#34
apenab wants to merge 2 commits into
mainfrom
feat/peek-improvement

Conversation

@apenab

@apenab apenab commented May 25, 2026

Copy link
Copy Markdown
Owner

Summary

Integrates PEEK (Gu et al., MIT CSAIL)
into pyrlm-runtime as a vendored, instrumented, and empirically improved
orientation cache.

This PR supersedes #33 — that one added the bare integration with
upstream peek-ai installed from pip; this one vendors peek-ai locally
and applies a validated patch on top.

What's in the box

Layer What Validated?
Integration PeekSession wrapping peek-ai's CachePolicy; map injected via existing system_prompt_supplement slot on RLM API audited vs upstream — no critical issues
Patch C3 Score decay (SCORE_DECAY=0.85, NEUTRAL_PENALTY=−0.5) in vendor/peek/core/evictor.py to remove the "score-zero stickiness" failure mode +5.7pp aggregate vs upstream in an in-session A/B (N=5 large oolong-synth contexts, fully-online evolution). agnews +18.3pp, multinli +7.4pp, metaphors +4.0pp; imdb and negation inside the LLM noise floor. Trace-level mechanism confirmed (metaphors neutral-tag rate 29% → 4%).
Bug fix vendor/peek/_io.py:extract_json replaced regex-based fenced-block scan (which catastrophically backtracked on LLM output with an unclosed fence) with an O(n) state machine Regression test added: a 100KB unclosed fence completes in <1s; pre-fix hung at 100% CPU. Reproducible bug — caused a 45-minute hung benchmark run during this work. Upstream-contributable.
Tracing PeekSession(trace_dir=…) writes per-update JSON snapshots (query, trajectory, map before/after with item scores, parsed Distiller + Cartographer output, diff) Used to diagnose the C3 mechanism and to catch the regex bug
Bench harness examples/peek_bench/run_oolong_rlm_vs_peek.py (RLM vs RLM+PEEK on oolongbench/oolong-synth, with --trace) Used for all the empirical work below
Analyzer examples/peek_bench/analyze_peek_trace.py (per-query churn, Distiller tags, Cartographer ops, item lifetimes) Used for the diagnosis

Ablation that was tested and rejected (documented)

C5' — section-weighted decay + rr-* birth bonus. Hypothesised that
concrete computed answers (reusable_results) should decay slower than
abstract claims (context_understanding). Three paired A/B trials vs
C3 alone gave a mean −3.2pp patch effect, dominated by per-sub-dataset
LLM noise of ±5–20pp at N=1. Documented as a negative ablation in
PEEK-EXPERIMENTS.md Phase 4.C.2 with the full data + decision-rule
trail. Code removed from the working tree; only the validated C3 globals
remain in the vendored library.

Vendoring

vendor/peek/ mirrors zhuohangu/peek at commit 57de91ac with
Apache-2.0 license preserved. VENDOR.md documents the commit and
patch policy (every diff vs upstream marked with a # peek-patch:
inline comment).

Files

  • src/pyrlm_runtime/peek_integration.pyPeekSession, adapter,
    trace serializer
  • vendor/peek/ — vendored upstream + C3 patch + bug fix
  • examples/peek_bench/ — benchmark harness + analyzer
  • docs/peek-bench/PEEK-EXPERIMENTS.md — full experiment log
  • docs/peek-bench/PEEK-DIAGNOSIS.md — trace-based diagnosis findings
  • tests/test_peek_integration.py — 24 tests (FakeAdapter + stub LMClient)
  • pyproject.tomlpeek extra now contains only tiktoken>=0.7.0

Test plan

  • uv sync --extra peek — installs tiktoken; vendored peek imports OK
  • uv run pytest tests/test_peek_integration.py -v — 24/24 pass
  • uv run pytest — full suite green
  • uv run ruff check src/ vendor/peek/ examples/peek_bench/ tests/ — clean
  • Review PEEK-EXPERIMENTS.md end-to-end for the empirical story
  • Decide whether to close feat(peek): add optional PEEK orientation-cache integration #33 in favor of this PR

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

  • New Features

    • Added an optional persistent context-orientation cache that maintains and evolves a map of understanding across multiple RLM runs
    • Included benchmark suite for performance evaluation and experiment tracking against baseline runs
    • Added trace-based analysis tools for inspecting cache dynamics and item lifecycle metrics
  • Tests

    • Added comprehensive integration tests for cache operations and trajectory conversion

Review Change Stack

@coderabbitai

coderabbitai Bot commented May 25, 2026

Copy link
Copy Markdown

Warning

Review limit reached

@apenab, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 40 minutes and 35 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ef22c1c5-0e85-4255-93c9-1d16763d1433

📥 Commits

Reviewing files that changed from the base of the PR and between 78e24f0 and 169853e.

📒 Files selected for processing (28)
  • CLAUDE.md
  • docs/peek-bench/PEEK-DIAGNOSIS.md
  • docs/peek-bench/PEEK-EXPERIMENTS.md
  • examples/peek_bench/__init__.py
  • examples/peek_bench/analyze_peek_trace.py
  • examples/peek_bench/run_oolong_rlm_vs_peek.py
  • pyproject.toml
  • src/pyrlm_runtime/peek_integration.py
  • tests/test_peek_integration.py
  • vendor/peek/LICENSE
  • vendor/peek/VENDOR.md
  • vendor/peek/__init__.py
  • vendor/peek/_io.py
  • vendor/peek/core/__init__.py
  • vendor/peek/core/cartographer.py
  • vendor/peek/core/context_map.py
  • vendor/peek/core/distiller.py
  • vendor/peek/core/evictor.py
  • vendor/peek/core/policy.py
  • vendor/peek/core/types.py
  • vendor/peek/llm/__init__.py
  • vendor/peek/llm/anthropic_client.py
  • vendor/peek/llm/base.py
  • vendor/peek/llm/gemini_client.py
  • vendor/peek/llm/openai_client.py
  • vendor/peek/prompts/cartographer.txt
  • vendor/peek/prompts/distiller.txt
  • vendor/peek/prompts/initial_context_map.txt
📝 Walkthrough

Walkthrough

This PR adds optional PEEK (MIT CSAIL) integration to pyrlm-runtime, enabling persistent context caching across RLM runs. It includes a vendored peek-ai library, PeekSession wrapper for ModelAdapter, 412 integration tests, two benchmark scripts (703 and 453 lines), and comprehensive experimental documentation showing REJECT verdict on oolong-synth dataset.

Changes

Optional PEEK Orientation-Cache Integration

Layer / File(s) Summary
Project scope and dependencies
CLAUDE.md, pyproject.toml
Minor documentation update to project conventions; pyproject.toml adds optional peek dependency with tiktoken>=0.7.0 and updates pythonpath and wheel build targets for vendored code.
PEEK core types, I/O utilities, and LM client protocol
vendor/peek/core/types.py, vendor/peek/_io.py, vendor/peek/llm/base.py
Core data structures (ItemTag, OpType, SECTIONS, Item, Operation, Usage, DistillerOutput, CartographerOutput); JSON extraction with O(n) fenced-block scanner and brace-matching fallback; LMClient protocol specification.
LM client implementations
vendor/peek/llm/{openai,anthropic,gemini}_client.py, vendor/peek/llm/__init__.py
Concrete OpenAIClient, AnthropicClient, and GeminiClient implementations handling API-specific completion and usage tracking; lazy import resolution in llm package.
Context map abstraction
vendor/peek/core/context_map.py
ContextMap dataclass for parsing, storing, and applying section-indexed operations (ADD/DELETE/REPLACE) with stable item identifiers and sequential ID generation.
Distiller and Cartographer components
vendor/peek/core/{distiller,cartographer}.py, vendor/peek/prompts/{distiller,cartographer}.txt
Distiller evaluates trajectories and produces item tags; Cartographer generates map edits under token constraints; both parse JSON outputs and use LM clients; prompt templates included.
Scoring and eviction
vendor/peek/core/evictor.py
Score decay and tag-driven adjustments (helpful/harmful/neutral/stale); priority-based eviction ordered by score and item age to fit token budgets.
CachePolicy orchestration
vendor/peek/core/policy.py
Main control flow coordinating distillation, scoring, cartographer edits, and eviction; step counting and evolution freezing; save/load persistence for cache state.
PEEK package initialization
vendor/peek/__init__.py, vendor/peek/core/__init__.py
Version tracking and re-exported core API; lazy LM client attribute resolution.
Vendoring metadata
vendor/peek/{VENDOR.md,LICENSE}
Vendoring reference documenting source commit 57de91ac, patching policy, and Apache 2.0 license.
PeekSession integration API
src/pyrlm_runtime/peek_integration.py
Public PeekSession wrapping peek's CachePolicy; _PeekLMClientAdapter bridges ModelAdapter to LMClient protocol; trace_to_peek_trajectory serializer for RLM Trace → PEEK trajectory format conversion.
Integration tests
tests/test_peek_integration.py
412 lines covering trajectory formatting, adapter passthrough, PeekSession state management, evolve-step freezing, save/load roundtrip, and regression tests for JSON extraction robustness (unclosed fences, valid/invalid blocks).
Benchmark harness
examples/peek_bench/run_oolong_rlm_vs_peek.py
Complete evaluation script comparing baseline RLM vs RLM+PEEK on oolong-synth; baseline/peek/compare modes; per-query and per-context aggregation; dry-run token estimation; pre-committed decision thresholds.
Trace analysis tools
examples/peek_bench/analyze_peek_trace.py
Diagnostic script for post-run trace analysis: per-query metrics (operations, duplicates, tag counts), per-item lifetime tracking (birth/death/score history), and side-by-side context comparison.
Experimental methodology
docs/peek-bench/PEEK-EXPERIMENTS.md
Pre-committed hypothesis and decision rules; scorer validation against oolong; Phase 2/3 results (REJECT verdict); cross-run diagnostics; Phase 4 improvements including Patch C3, Patch C5' revert, and JSON extraction fix.
Phase B diagnosis findings
docs/peek-bench/PEEK-DIAGNOSIS.md
Trace-based behavioral audit across 5 sub-datasets; audit hypotheses validation; failure mechanism analysis; proposed improvements C3, C4', and C5' with execution order recommendations.

🎯 4 (Complex) | ⏱️ ~60 minutes

🐰 A cache for clarity, a map for the mind,
PEEK brings structure to queries unkind,
With Distiller's wisdom and Cartographer's sight,
Reusable knowledge flows through the night.

Estimated code review effort: 🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 22.76% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: vendoring peek-ai with a score-decay patch, tracing support, and a bug fix—all central aspects of the changeset.
Linked Issues check ✅ Passed All coding objectives from issue #33 are met: PeekSession wraps CachePolicy, adapts ModelAdapter to LMClient, injects map via system_prompt_supplement, includes benchmark harness and trace analysis tooling, adds 24 integration tests with graceful fallback, and enables optional installation.
Out of Scope Changes check ✅ Passed All changes directly support the linked issue objectives: peek vendoring + integration, benchmarking/analysis tooling, documentation, tests, and packaging updates. The CLAUDE.md edit removes unwanted advisory text unrelated to core behavior.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/peek-improvement

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

🧹 Nitpick comments (1)
src/pyrlm_runtime/peek_integration.py (1)

307-328: 💤 Low value

load() drops trace_dir capability.

The load() classmethod doesn't accept or forward a trace_dir parameter, so loaded sessions cannot produce per-update JSON snapshots. If this is intentional (tracing only for fresh sessions), consider documenting it; otherwise, add the parameter for parity with create().

Proposed fix to add trace_dir support
     `@classmethod`
     def load(
         cls,
         path: str | Path,
         adapter: ModelAdapter,
         *,
         max_tokens: int = 2048,
+        trace_dir: str | Path | None = None,
     ) -> PeekSession:
         """Load a previously saved session.

         Args:
             path: Path to the JSON file created by ``save``.
             adapter: ModelAdapter for future Distiller/Cartographer calls.
             max_tokens: Max completion tokens for those calls.
+            trace_dir: If set, write a JSON trace per ``update_from_run`` call.
         """
         _require_peek()
         from peek import CachePolicy

         lm_client = _PeekLMClientAdapter(adapter, max_tokens=max_tokens)
         policy = CachePolicy.load(path, client=lm_client)
-        return cls(policy)
+        return cls(policy, trace_dir=trace_dir)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/pyrlm_runtime/peek_integration.py` around lines 307 - 328, The load()
classmethod currently omits the trace_dir option so reloaded PeekSession
instances cannot emit per-update JSON traces; update PeekSession.load to accept
a trace_dir: str | Path | None parameter (matching create()), pass it through
when constructing the session/policy, and ensure CachePolicy.load(...) or the
returned policy is wired to use trace_dir (e.g., by forwarding trace_dir into
CachePolicy.load or into cls(policy) initialization); reference the existing
symbols load(), create(), PeekSession, CachePolicy.load, _PeekLMClientAdapter,
and cls(policy) when making the change so behavior matches create() and tracing
is preserved for loaded sessions.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/test_peek_integration.py`:
- Around line 17-22: The current PEEK_AVAILABLE check only tests import peek and
misses that peek's optional dependency tiktoken may be absent; update the check
in tests/test_peek_integration.py so PEEK_AVAILABLE is only True if both import
peek and import tiktoken succeed (e.g., try: import peek; import tiktoken;
PEEK_AVAILABLE = True except ImportError: PEEK_AVAILABLE = False), referencing
the PEEK_AVAILABLE symbol and the import block to locate where to change it.

In `@vendor/peek/__init__.py`:
- Around line 1-3: Add the future annotations import at the very top of the
module so type annotations are postponed; specifically insert "from __future__
import annotations" before the module docstring and before the existing import
block (the line starting with "from peek.core import (") in
vendor/peek/__init__.py to satisfy the repo convention.

In `@vendor/peek/_io.py`:
- Around line 14-33: The extract_json function currently returns whatever
json.loads yields (which can be lists or scalars) despite its declared return
type dict[str, Any] | None; update extract_json to validate the parsed value is
a dict before returning: in the initial json.loads(s) attempt and in each
successful json.loads(...) inside the loops over _scan_fenced_blocks and
_scan_balanced_braces, only return the value if isinstance(result, dict),
otherwise continue scanning and ultimately return None if no dict is found;
reference the extract_json function and the helper generators
_scan_fenced_blocks and _scan_balanced_braces when making this change.

In `@vendor/peek/core/__init__.py`:
- Line 1: Add the required future annotations import at the top of the module:
insert "from __future__ import annotations" as the very first statement in
vendor/peek/core/__init__.py (before the existing "from peek.core.cartographer
import Cartographer") so the module-level type annotations use postponed
evaluation per project guidelines.

In `@vendor/peek/core/cartographer.py`:
- Around line 62-63: Update the guard that checks inputs so whitespace-only
strings are rejected: change the checks around the variables section and content
(the conditional currently using "if not isinstance(section, str) or not
isinstance(content, str) or not content") to also verify content.strip() is
non-empty before proceeding; apply the same change to the similar check at the
later occurrence (the block around lines handling section/content at the other
location). This ensures you skip creating edits for content that is only
whitespace while still allowing valid non-empty strings.

In `@vendor/peek/core/policy.py`:
- Around line 26-30: _cache_policy currently calls _default_tokenizer() which
unconditionally imports tiktoken and can raise ModuleNotFoundError when the
optional dependency isn't installed; modify _default_tokenizer to catch
ImportError (and any tiktoken import/usage errors) and return a safe fallback
TokenCounter (e.g., a function that returns len(s) or len(s.encode("utf-8")) as
an approximate token count) so CachePolicy.__init__ (where self.token_counter =
_default_tokenizer() is set when token_counter is None) will not crash if
tiktoken is absent. Ensure the function still uses
tiktoken.get_encoding("o200k_base") when available, but falls back cleanly on
exception.

In `@vendor/peek/llm/__init__.py`:
- Line 1: Add the future annotations import as the very first line of this
module: insert "from __future__ import annotations" before the existing "from
peek.llm.base import LMClient" import in the peek.llm.__init__ module so all
type annotations are deferred.

In `@vendor/peek/llm/anthropic_client.py`:
- Around line 43-50: The code currently only sets self._last when resp.usage is
present, leaving stale values otherwise; update the response-handling in
anthropic_client.py (look around the usage = getattr(resp, "usage", None) block
and the return path of the function that constructs text_blocks) to explicitly
clear or reset self._last when resp.usage is None (e.g., set self._last = None
or a Usage with zeroed tokens) so last_usage() cannot return previous-call
counts after a response omits usage metadata.

In `@vendor/peek/llm/gemini_client.py`:
- Around line 59-65: The code currently leaves self._last unchanged when
getattr(resp, "usage_metadata", None) is None, causing prior token counts to
leak; update the block in the gemini response handling so that if usage is None
you explicitly reset self._last = Usage(input_tokens=0, output_tokens=0) (or an
equivalent empty Usage) instead of skipping assignment; keep using getattr(resp,
"usage_metadata", None), the Usage class and self._last identifier so the latest
completion always reflects zeroed usage when metadata is absent and return
resp.text or "" as before.

In `@vendor/peek/llm/openai_client.py`:
- Around line 46-52: The code currently leaves self._last set to prior values
when resp.usage is None; update the call handling in openai_client.py (the
resp/usage branch around getattr(resp, "usage", None)) so that when usage is
None you explicitly reset self._last to an empty Usage (e.g., input_tokens=0,
output_tokens=0) instead of leaving the previous value, ensuring last_usage()
reflects the current call; keep the existing branch that populates self._last
when usage is present (and leave the final return of
resp.choices[0].message.content unchanged).

---

Nitpick comments:
In `@src/pyrlm_runtime/peek_integration.py`:
- Around line 307-328: The load() classmethod currently omits the trace_dir
option so reloaded PeekSession instances cannot emit per-update JSON traces;
update PeekSession.load to accept a trace_dir: str | Path | None parameter
(matching create()), pass it through when constructing the session/policy, and
ensure CachePolicy.load(...) or the returned policy is wired to use trace_dir
(e.g., by forwarding trace_dir into CachePolicy.load or into cls(policy)
initialization); reference the existing symbols load(), create(), PeekSession,
CachePolicy.load, _PeekLMClientAdapter, and cls(policy) when making the change
so behavior matches create() and tracing is preserved for loaded sessions.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 83b6d173-13d0-4f66-8e4c-eadc0d4a1c8a

📥 Commits

Reviewing files that changed from the base of the PR and between d4d34d3 and be9e19b.

📒 Files selected for processing (29)
  • .gitignore
  • CLAUDE.md
  • docs/peek-bench/PEEK-DIAGNOSIS.md
  • docs/peek-bench/PEEK-EXPERIMENTS.md
  • examples/peek_bench/__init__.py
  • examples/peek_bench/analyze_peek_trace.py
  • examples/peek_bench/run_oolong_rlm_vs_peek.py
  • pyproject.toml
  • src/pyrlm_runtime/peek_integration.py
  • tests/test_peek_integration.py
  • vendor/peek/LICENSE
  • vendor/peek/VENDOR.md
  • vendor/peek/__init__.py
  • vendor/peek/_io.py
  • vendor/peek/core/__init__.py
  • vendor/peek/core/cartographer.py
  • vendor/peek/core/context_map.py
  • vendor/peek/core/distiller.py
  • vendor/peek/core/evictor.py
  • vendor/peek/core/policy.py
  • vendor/peek/core/types.py
  • vendor/peek/llm/__init__.py
  • vendor/peek/llm/anthropic_client.py
  • vendor/peek/llm/base.py
  • vendor/peek/llm/gemini_client.py
  • vendor/peek/llm/openai_client.py
  • vendor/peek/prompts/cartographer.txt
  • vendor/peek/prompts/distiller.txt
  • vendor/peek/prompts/initial_context_map.txt

Comment on lines +17 to +22
try:
import peek # noqa: F401

PEEK_AVAILABLE = True
except ImportError:
PEEK_AVAILABLE = False

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

PEEK_AVAILABLE check is insufficient — CI fails because tiktoken is missing.

The pipeline failures show that import peek succeeds but CachePolicy(...) raises ModuleNotFoundError: No module named 'tiktoken'. The tiktoken package is part of the peek extra but isn't installed in the base test environment, so all tests decorated with @pytest.mark.skipif(not PEEK_AVAILABLE, ...) run and then fail.

Either extend the availability check to include tiktoken, or add tiktoken to the test dependencies.

Proposed fix: extend PEEK_AVAILABLE to include tiktoken
 try:
     import peek  # noqa: F401
+    import tiktoken  # noqa: F401

     PEEK_AVAILABLE = True
 except ImportError:
     PEEK_AVAILABLE = False
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
try:
import peek # noqa: F401
PEEK_AVAILABLE = True
except ImportError:
PEEK_AVAILABLE = False
try:
import peek # noqa: F401
import tiktoken # noqa: F401
PEEK_AVAILABLE = True
except ImportError:
PEEK_AVAILABLE = False
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_peek_integration.py` around lines 17 - 22, The current
PEEK_AVAILABLE check only tests import peek and misses that peek's optional
dependency tiktoken may be absent; update the check in
tests/test_peek_integration.py so PEEK_AVAILABLE is only True if both import
peek and import tiktoken succeed (e.g., try: import peek; import tiktoken;
PEEK_AVAILABLE = True except ImportError: PEEK_AVAILABLE = False), referencing
the PEEK_AVAILABLE symbol and the import block to locate where to change it.

Comment thread vendor/peek/__init__.py
Comment on lines +1 to +3
"""PEEK: Context Map as an Orientation Cache for Long-Context LLM Agents."""

from peek.core import (

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add from __future__ import annotations at module top.

Required by repo Python conventions, but currently missing.

Suggested fix
+from __future__ import annotations
+
 """PEEK: Context Map as an Orientation Cache for Long-Context LLM Agents."""

As per coding guidelines: **/*.py: Include from __future__ import annotations in all Python files.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"""PEEK: Context Map as an Orientation Cache for Long-Context LLM Agents."""
from peek.core import (
from __future__ import annotations
"""PEEK: Context Map as an Orientation Cache for Long-Context LLM Agents."""
from peek.core import (
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@vendor/peek/__init__.py` around lines 1 - 3, Add the future annotations
import at the very top of the module so type annotations are postponed;
specifically insert "from __future__ import annotations" before the module
docstring and before the existing import block (the line starting with "from
peek.core import (") in vendor/peek/__init__.py to satisfy the repo convention.

Comment thread vendor/peek/_io.py
Comment on lines +14 to +33
def extract_json(text: str) -> dict[str, Any] | None:
"""Best-effort JSON extraction from an LLM response."""
s = text.strip()
try:
return json.loads(s)
except json.JSONDecodeError:
pass

for block in _scan_fenced_blocks(text):
try:
return json.loads(block.strip())
except json.JSONDecodeError:
continue

for blob in _scan_balanced_braces(text):
try:
return json.loads(blob)
except json.JSONDecodeError:
continue
return None

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Enforce dict-only parsing to match extract_json’s declared return type.

json.loads may return lists/scalars, but this function promises dict[str, Any] | None. Returning non-dicts can break callers expecting mapping semantics.

Suggested fix
 def extract_json(text: str) -> dict[str, Any] | None:
     """Best-effort JSON extraction from an LLM response."""
+    def _parse_object(candidate: str) -> dict[str, Any] | None:
+        try:
+            parsed = json.loads(candidate)
+        except json.JSONDecodeError:
+            return None
+        return parsed if isinstance(parsed, dict) else None
+
     s = text.strip()
-    try:
-        return json.loads(s)
-    except json.JSONDecodeError:
-        pass
+    parsed = _parse_object(s)
+    if parsed is not None:
+        return parsed
 
     for block in _scan_fenced_blocks(text):
-        try:
-            return json.loads(block.strip())
-        except json.JSONDecodeError:
-            continue
+        parsed = _parse_object(block.strip())
+        if parsed is not None:
+            return parsed
 
     for blob in _scan_balanced_braces(text):
-        try:
-            return json.loads(blob)
-        except json.JSONDecodeError:
-            continue
+        parsed = _parse_object(blob)
+        if parsed is not None:
+            return parsed
     return None
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def extract_json(text: str) -> dict[str, Any] | None:
"""Best-effort JSON extraction from an LLM response."""
s = text.strip()
try:
return json.loads(s)
except json.JSONDecodeError:
pass
for block in _scan_fenced_blocks(text):
try:
return json.loads(block.strip())
except json.JSONDecodeError:
continue
for blob in _scan_balanced_braces(text):
try:
return json.loads(blob)
except json.JSONDecodeError:
continue
return None
def extract_json(text: str) -> dict[str, Any] | None:
"""Best-effort JSON extraction from an LLM response."""
def _parse_object(candidate: str) -> dict[str, Any] | None:
try:
parsed = json.loads(candidate)
except json.JSONDecodeError:
return None
return parsed if isinstance(parsed, dict) else None
s = text.strip()
parsed = _parse_object(s)
if parsed is not None:
return parsed
for block in _scan_fenced_blocks(text):
parsed = _parse_object(block.strip())
if parsed is not None:
return parsed
for blob in _scan_balanced_braces(text):
parsed = _parse_object(blob)
if parsed is not None:
return parsed
return None
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@vendor/peek/_io.py` around lines 14 - 33, The extract_json function currently
returns whatever json.loads yields (which can be lists or scalars) despite its
declared return type dict[str, Any] | None; update extract_json to validate the
parsed value is a dict before returning: in the initial json.loads(s) attempt
and in each successful json.loads(...) inside the loops over _scan_fenced_blocks
and _scan_balanced_braces, only return the value if isinstance(result, dict),
otherwise continue scanning and ultimately return None if no dict is found;
reference the extract_json function and the helper generators
_scan_fenced_blocks and _scan_balanced_braces when making this change.

@@ -0,0 +1,32 @@
from peek.core.cartographer import Cartographer

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add from __future__ import annotations at module top.

This file misses the required future-annotations import.

Suggested fix
+from __future__ import annotations
+
 from peek.core.cartographer import Cartographer
 from peek.core.context_map import ContextMap, normalize_section

As per coding guidelines: **/*.py: Include from __future__ import annotations in all Python files.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from peek.core.cartographer import Cartographer
from __future__ import annotations
from peek.core.cartographer import Cartographer
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@vendor/peek/core/__init__.py` at line 1, Add the required future annotations
import at the top of the module: insert "from __future__ import annotations" as
the very first statement in vendor/peek/core/__init__.py (before the existing
"from peek.core.cartographer import Cartographer") so the module-level type
annotations use postponed evaluation per project guidelines.

Comment thread vendor/peek/core/cartographer.py
Comment on lines +26 to +30
def _default_tokenizer() -> TokenCounter:
import tiktoken

enc = tiktoken.get_encoding("o200k_base")
return lambda s: len(enc.encode(s))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Call sites in policy ==="
rg -n -C2 'def _default_tokenizer|import tiktoken|self\.token_counter = _default_tokenizer' vendor/peek/core/policy.py

echo
echo "=== Dependency declarations ==="
rg -n -C3 'tiktoken|optional-dependencies|peek' pyproject.toml

Repository: apenab/pyrlm-runtime

Length of output: 1046


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Call sites in policy ==="
rg -n -C2 'def _default_tokenizer|import tiktoken|token_counter\s*=\s*_default_tokenizer' vendor/peek/core/policy.py

echo
echo "=== Dependency declarations ==="
rg -n -C3 'tiktoken|optional-dependencies|peek' pyproject.toml

Repository: apenab/pyrlm-runtime

Length of output: 1046


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Call sites in policy ==="
rg -n -C2 'def _default_tokenizer|import tiktoken|token_counter\s*=\s*_default_tokenizer' vendor/peek/core/policy.py

echo
echo "=== Dependency declarations ==="
rg -n -C3 'tiktoken|optional-dependencies|peek' pyproject.toml

Repository: apenab/pyrlm-runtime

Length of output: 1046


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Call sites in policy ==="
rg -n -C2 'def _default_tokenizer|import tiktoken|token_counter\s*=\s*_default_tokenizer' vendor/peek/core/policy.py

echo
echo "=== Dependency declarations ==="
rg -n -C3 'tiktoken|optional-dependencies|peek' pyproject.toml

Repository: apenab/pyrlm-runtime

Length of output: 1046


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Call sites in policy ==="
rg -n -C2 'def _default_tokenizer|import tiktoken|token_counter\s*=\s*_default_tokenizer' vendor/peek/core/policy.py

echo
echo "=== Dependency declarations ==="
rg -n -C3 'tiktoken|optional-dependencies|peek' pyproject.toml

Repository: apenab/pyrlm-runtime

Length of output: 1046


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Call sites in policy ==="
rg -n -C2 'def _default_tokenizer|import tiktoken|self\.token_counter = _default_tokenizer' vendor/peek/core/policy.py

echo
echo "=== Dependency declarations ==="
rg -n -C3 'tiktoken|optional-dependencies|peek' pyproject.toml

Repository: apenab/pyrlm-runtime

Length of output: 1046


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Call sites in policy ==="
rg -n -C2 'def _default_tokenizer|import tiktoken|token_counter\s*=\s*_default_tokenizer' vendor/peek/core/policy.py

echo
echo "=== Dependency declarations ==="
rg -n -C3 'tiktoken|optional-dependencies|peek' pyproject.toml

Repository: apenab/pyrlm-runtime

Length of output: 1046


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Call sites in policy ==="
rg -n -C2 'def _default_tokenizer|import tiktoken|token_counter\s*=\s*_default_tokenizer' vendor/peek/core/policy.py

echo
echo "=== Dependency declarations ==="
rg -n -C3 'tiktoken|optional-dependencies|peek' pyproject.toml

Repository: apenab/pyrlm-runtime

Length of output: 1046


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Call sites in policy ==="
rg -n -C2 'def _default_tokenizer|import tiktoken|token_counter\s*=\s*_default_tokenizer' vendor/peek/core/policy.py

echo
echo "=== Dependency declarations ==="
rg -n -C3 'tiktoken|optional-dependencies|peek' pyproject.toml

Repository: apenab/pyrlm-runtime

Length of output: 1046


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Call sites in policy ==="
rg -n -C2 'def _default_tokenizer|import tiktoken|self\.token_counter = _default_tokenizer' vendor/peek/core/policy.py

echo
echo "=== Dependency declarations ==="
rg -n -C3 'tiktoken|optional-dependencies|peek' pyproject.toml

Repository: apenab/pyrlm-runtime

Length of output: 1046


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Call sites in policy ==="
rg -n -C2 'def _default_tokenizer|import tiktoken|self\.token_counter\s*=\s*_default_tokenizer' vendor/peek/core/policy.py

echo
echo "=== Dependency declarations ==="
rg -n -C3 'tiktoken|optional-dependencies|peek' pyproject.toml

Repository: apenab/pyrlm-runtime

Length of output: 1046


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Call sites in policy ==="
rg -n -C2 'def _default_tokenizer|import tiktoken|token_counter\s*=\s*_default_tokenizer' vendor/peek/core/policy.py

echo
echo "=== Dependency declarations ==="
rg -n -C3 'tiktoken|optional-dependencies|peek' pyproject.toml

Repository: apenab/pyrlm-runtime

Length of output: 1046


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Call sites in policy ==="
rg -n -C2 'def _default_tokenizer|import tiktoken|token_counter\s*=\s*_default_tokenizer' vendor/peek/core/policy.py

echo
echo "=== Dependency declarations ==="
rg -n -C3 'tiktoken|optional-dependencies|peek' pyproject.toml

Repository: apenab/pyrlm-runtime

Length of output: 1046


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Call sites in policy ==="
rg -n -C2 'def _default_tokenizer|import tiktoken|token_counter\s*=\s*_default_tokenizer' vendor/peek/core/policy.py

echo
echo "=== Dependency declarations ==="
rg -n -C3 'tiktoken|optional-dependencies|peek' pyproject.toml

Repository: apenab/pyrlm-runtime

Length of output: 1046


Guard tiktoken in _default_tokenizer() to prevent CachePolicy() init crashes

In vendor/peek/core/policy.py, _default_tokenizer() unconditionally imports tiktoken (around line 27) and CachePolicy sets self.token_counter = _default_tokenizer() when token_counter is None (around line 74). Since tiktoken is only declared under pyproject.toml’s optional peek extra, constructing CachePolicy() without that extra can raise ModuleNotFoundError.

💡 Proposed fix
 def _default_tokenizer() -> TokenCounter:
-    import tiktoken
-
-    enc = tiktoken.get_encoding("o200k_base")
-    return lambda s: len(enc.encode(s))
+    try:
+        import tiktoken
+    except ModuleNotFoundError:
+        # Keep optional integration usable without hard dependency.
+        return lambda s: len(s.split())
+
+    enc = tiktoken.get_encoding("o200k_base")
+    return lambda s: len(enc.encode(s))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _default_tokenizer() -> TokenCounter:
import tiktoken
enc = tiktoken.get_encoding("o200k_base")
return lambda s: len(enc.encode(s))
def _default_tokenizer() -> TokenCounter:
try:
import tiktoken
except ModuleNotFoundError:
# Keep optional integration usable without hard dependency.
return lambda s: len(s.split())
enc = tiktoken.get_encoding("o200k_base")
return lambda s: len(enc.encode(s))
🧰 Tools
🪛 GitHub Actions: Tests / 0_test (3.13).txt

[error] 27-27: Python test run failed due to missing dependency. _default_tokenizer() attempts to import 'tiktoken' and raises ModuleNotFoundError: No module named 'tiktoken'.

🪛 GitHub Actions: Tests / test (3.12)

[error] 27-27: CachePolicy initialization failed: ModuleNotFoundError: No module named 'tiktoken' (import tiktoken in _default_tokenizer).

🪛 GitHub Actions: Tests / test (3.13)

[error] 27-27: ModuleNotFoundError: No module named 'tiktoken' while initializing CachePolicy. In _default_tokenizer(): import tiktoken.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@vendor/peek/core/policy.py` around lines 26 - 30, _cache_policy currently
calls _default_tokenizer() which unconditionally imports tiktoken and can raise
ModuleNotFoundError when the optional dependency isn't installed; modify
_default_tokenizer to catch ImportError (and any tiktoken import/usage errors)
and return a safe fallback TokenCounter (e.g., a function that returns len(s) or
len(s.encode("utf-8")) as an approximate token count) so CachePolicy.__init__
(where self.token_counter = _default_tokenizer() is set when token_counter is
None) will not crash if tiktoken is absent. Ensure the function still uses
tiktoken.get_encoding("o200k_base") when available, but falls back cleanly on
exception.

@@ -0,0 +1,20 @@
from peek.llm.base import LMClient

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add from __future__ import annotations at the top of this module.

Line 1 starts with a regular import, but this repository requires from __future__ import annotations in all Python files.

Suggested patch
+from __future__ import annotations
+
 from peek.llm.base import LMClient

As per coding guidelines: **/*.py: Include from __future__ import annotations in all Python files.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from peek.llm.base import LMClient
from __future__ import annotations
from peek.llm.base import LMClient
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@vendor/peek/llm/__init__.py` at line 1, Add the future annotations import as
the very first line of this module: insert "from __future__ import annotations"
before the existing "from peek.llm.base import LMClient" import in the
peek.llm.__init__ module so all type annotations are deferred.

Comment on lines +43 to +50
usage = getattr(resp, "usage", None)
if usage is not None:
self._last = Usage(
input_tokens=getattr(usage, "input_tokens", 0) or 0,
output_tokens=getattr(usage, "output_tokens", 0) or 0,
)
text_blocks = [b.text for b in resp.content if getattr(b, "type", "") == "text"]
return "".join(text_blocks)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Clear stale usage when resp.usage is missing.

Without a reset path, last_usage() can return previous-call token counts after a call that omitted usage metadata.

Suggested fix
         usage = getattr(resp, "usage", None)
         if usage is not None:
             self._last = Usage(
                 input_tokens=getattr(usage, "input_tokens", 0) or 0,
                 output_tokens=getattr(usage, "output_tokens", 0) or 0,
             )
+        else:
+            self._last = Usage()
         text_blocks = [b.text for b in resp.content if getattr(b, "type", "") == "text"]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
usage = getattr(resp, "usage", None)
if usage is not None:
self._last = Usage(
input_tokens=getattr(usage, "input_tokens", 0) or 0,
output_tokens=getattr(usage, "output_tokens", 0) or 0,
)
text_blocks = [b.text for b in resp.content if getattr(b, "type", "") == "text"]
return "".join(text_blocks)
usage = getattr(resp, "usage", None)
if usage is not None:
self._last = Usage(
input_tokens=getattr(usage, "input_tokens", 0) or 0,
output_tokens=getattr(usage, "output_tokens", 0) or 0,
)
else:
self._last = Usage()
text_blocks = [b.text for b in resp.content if getattr(b, "type", "") == "text"]
return "".join(text_blocks)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@vendor/peek/llm/anthropic_client.py` around lines 43 - 50, The code currently
only sets self._last when resp.usage is present, leaving stale values otherwise;
update the response-handling in anthropic_client.py (look around the usage =
getattr(resp, "usage", None) block and the return path of the function that
constructs text_blocks) to explicitly clear or reset self._last when resp.usage
is None (e.g., set self._last = None or a Usage with zeroed tokens) so
last_usage() cannot return previous-call counts after a response omits usage
metadata.

Comment on lines +59 to +65
usage = getattr(resp, "usage_metadata", None)
if usage is not None:
self._last = Usage(
input_tokens=getattr(usage, "prompt_token_count", 0) or 0,
output_tokens=getattr(usage, "candidates_token_count", 0) or 0,
)
return resp.text or ""

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reset usage when Gemini usage metadata is absent.

last_usage() should represent the latest completion; current behavior can leak prior-call token counts.

Suggested fix
         usage = getattr(resp, "usage_metadata", None)
         if usage is not None:
             self._last = Usage(
                 input_tokens=getattr(usage, "prompt_token_count", 0) or 0,
                 output_tokens=getattr(usage, "candidates_token_count", 0) or 0,
             )
+        else:
+            self._last = Usage()
         return resp.text or ""
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@vendor/peek/llm/gemini_client.py` around lines 59 - 65, The code currently
leaves self._last unchanged when getattr(resp, "usage_metadata", None) is None,
causing prior token counts to leak; update the block in the gemini response
handling so that if usage is None you explicitly reset self._last =
Usage(input_tokens=0, output_tokens=0) (or an equivalent empty Usage) instead of
skipping assignment; keep using getattr(resp, "usage_metadata", None), the Usage
class and self._last identifier so the latest completion always reflects zeroed
usage when metadata is absent and return resp.text or "" as before.

Comment on lines +46 to +52
usage = getattr(resp, "usage", None)
if usage is not None:
self._last = Usage(
input_tokens=getattr(usage, "prompt_tokens", 0) or 0,
output_tokens=getattr(usage, "completion_tokens", 0) or 0,
)
return resp.choices[0].message.content or ""

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reset usage on calls where provider usage is absent.

When resp.usage is missing, self._last currently keeps prior values, so last_usage() can report stale data instead of the most recent call.

Suggested fix
         usage = getattr(resp, "usage", None)
         if usage is not None:
             self._last = Usage(
                 input_tokens=getattr(usage, "prompt_tokens", 0) or 0,
                 output_tokens=getattr(usage, "completion_tokens", 0) or 0,
             )
+        else:
+            self._last = Usage()
         return resp.choices[0].message.content or ""
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@vendor/peek/llm/openai_client.py` around lines 46 - 52, The code currently
leaves self._last set to prior values when resp.usage is None; update the call
handling in openai_client.py (the resp/usage branch around getattr(resp,
"usage", None)) so that when usage is None you explicitly reset self._last to an
empty Usage (e.g., input_tokens=0, output_tokens=0) instead of leaving the
previous value, ensuring last_usage() reflects the current call; keep the
existing branch that populates self._last when usage is present (and leave the
final return of resp.choices[0].message.content unchanged).

@apenab apenab force-pushed the feat/peek-improvement branch from be9e19b to 53f8b52 Compare May 25, 2026 17:33

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (6)
vendor/peek/llm/__init__.py (1)

1-1: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add the required future-annotations import at the top of this module.

The file starts with a normal import and misses from __future__ import annotations, which is required for all Python files in this repo.

Suggested patch
+from __future__ import annotations
+
 from peek.llm.base import LMClient

As per coding guidelines: **/*.py: Include from __future__ import annotations in all Python files.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@vendor/peek/llm/__init__.py` at line 1, Add the required future-annotations
import to this module by inserting "from __future__ import annotations" as the
very first statement in the file (before the existing "from peek.llm.base import
LMClient") so that forward references in type hints are treated as postponed for
the entire module; update the top of vendor/peek/llm/__init__.py to include this
import above the LMClient import.
vendor/peek/__init__.py (1)

1-3: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add the required future-annotations import at module top.

from __future__ import annotations is missing here and should be the first statement in the module.

Suggested patch
+from __future__ import annotations
+
 """PEEK: Context Map as an Orientation Cache for Long-Context LLM Agents."""

As per coding guidelines: **/*.py: Include from __future__ import annotations in all Python files.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@vendor/peek/__init__.py` around lines 1 - 3, Add the missing future
annotations import as the very first statement in the module: insert "from
__future__ import annotations" at the top of vendor/peek/__init__.py before any
docstring or other code so that the module-level forward-reference behavior
applies; ensure it appears above the existing module docstring and the
subsequent "from peek.core import (" import to comply with the project's Python
coding guidelines.
vendor/peek/llm/anthropic_client.py (1)

43-50: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reset cached usage when the response omits usage metadata.

If resp.usage is absent, _last keeps previous-call values, so last_usage() can report stale token counts.

Suggested patch
         usage = getattr(resp, "usage", None)
         if usage is not None:
             self._last = Usage(
                 input_tokens=getattr(usage, "input_tokens", 0) or 0,
                 output_tokens=getattr(usage, "output_tokens", 0) or 0,
             )
+        else:
+            self._last = Usage()
         text_blocks = [b.text for b in resp.content if getattr(b, "type", "") == "text"]
         return "".join(text_blocks)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@vendor/peek/llm/anthropic_client.py` around lines 43 - 50, The code only
updates self._last when resp.usage exists, leaving stale values otherwise;
change the logic so that when getattr(resp, "usage", None) is None you
explicitly reset self._last to a new Usage with zeroed fields (e.g.,
Usage(input_tokens=0, output_tokens=0)); update the branch around usage =
getattr(resp, "usage", None) to set self._last = Usage(0,0) in the else path so
last_usage() returns zeros instead of previous-call values.
vendor/peek/core/policy.py (1)

26-30: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Guard optional tiktoken import in _default_tokenizer() to prevent init crashes.

CachePolicy() currently crashes when tiktoken is not installed, even though it’s configured as an optional extra; this is already showing up in CI failures.

Suggested patch
 def _default_tokenizer() -> TokenCounter:
-    import tiktoken
-
-    enc = tiktoken.get_encoding("o200k_base")
-    return lambda s: len(enc.encode(s))
+    try:
+        import tiktoken
+    except ModuleNotFoundError:
+        # Fallback keeps optional integration usable without hard dependency.
+        return lambda s: len(s.encode("utf-8"))
+
+    enc = tiktoken.get_encoding("o200k_base")
+    return lambda s: len(enc.encode(s))

Also applies to: 73-74

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@vendor/peek/core/policy.py` around lines 26 - 30, _guard the optional
tiktoken import inside _default_tokenizer so importing the module never raises
when tiktoken is absent: wrap the import tiktoken in a try/except ImportError
and return a safe fallback TokenCounter (e.g., a simple character/word-count
lambda) if import fails; update the same pattern where tiktoken is imported
again (lines referenced around the second occurrence) and ensure CachePolicy or
other callers still work with the fallback tokenizer rather than causing an
import-time crash.
vendor/peek/core/__init__.py (1)

1-1: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add the required future-annotations import at module top.

This module is missing from __future__ import annotations, so it currently violates the repository Python-file convention.

Suggested patch
+from __future__ import annotations
+
 from peek.core.cartographer import Cartographer
 from peek.core.context_map import ContextMap, normalize_section
 from peek.core.distiller import Distiller

As per coding guidelines: **/*.py: Include from __future__ import annotations in all Python files.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@vendor/peek/core/__init__.py` at line 1, Add the required future import at
the top of the module: insert "from __future__ import annotations" as the first
import in vendor/peek/core/__init__.py so the module conforms to the repository
convention; ensure it appears before the existing "from peek.core.cartographer
import Cartographer" import and that no other code precedes it.
tests/test_peek_integration.py (1)

17-22: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Extend PEEK_AVAILABLE to include tiktoken.

import peek alone is insufficient here; CI shows CachePolicy(...) still fails with ModuleNotFoundError: No module named 'tiktoken', so these tests run when they should skip.

Suggested fix
 try:
     import peek  # noqa: F401
+    import tiktoken  # noqa: F401
 
     PEEK_AVAILABLE = True
 except ImportError:
     PEEK_AVAILABLE = False
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_peek_integration.py` around lines 17 - 22, PEEK_AVAILABLE
currently only checks for import peek but CI fails due to missing tiktoken;
update the availability check so PEEK_AVAILABLE is True only if both peek and
tiktoken are importable (e.g., in the try block attempt to import peek and
import tiktoken, or use importlib.util.find_spec('tiktoken') alongside importing
peek) so tests that depend on CachePolicy/tiktoken are correctly skipped when
tiktoken is absent.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/peek-bench/PEEK-DIAGNOSIS.md`:
- Around line 83-87: The fenced code block containing the message "[cu-00010]
Corpus-level label balance is exact..." is missing a language tag which triggers
markdownlint MD040; update that fenced block (the triple-backtick block that
wraps that message) to include a language identifier such as text (i.e., change
``` to ```text) so the block is explicitly marked as plain text and the linter
warning is resolved.

In `@examples/peek_bench/analyze_peek_trace.py`:
- Around line 232-237: The code is truncating CachePolicy scores to ints when
appending to score history; in the loop that iterates t["map_after"]["items"]
and updates life["score_history"] (using scores = t["map_after"].get("scores",
{}), by_id, item["id"], and q), remove the int(...) cast and append the raw
float score (e.g., scores.get(item["id"], 0.0)) so the full floating-point
decay/penalty signal is preserved.
- Around line 441-443: The loop calls load_context_traces(contexts[cid]) and
immediately passes the result to print_context_report(cid, loaded[cid],
top_items=args.top_items) without checking for empty or None trace data; if a
ctx_* folder has no q*.json files print_context_report will crash on
aggregate/lifetime access. Fix by guarding after loading: check that loaded[cid]
(the result of load_context_traces) is not empty/None and contains the expected
trace entries before calling print_context_report; if empty, skip reporting for
that cid or emit a clear message. Ensure you reference the variables/functions
involved (loaded, load_context_traces, print_context_report, contexts, cid,
args.top_items) when making the change.

In `@tests/test_peek_integration.py`:
- Around line 265-267: The current test allows an empty supplement because it
uses "assert supp == "" or 'Context Map' in supp"; change the assertion in
tests/test_peek_integration.py to require a non-empty supplement and that it
contains the expected marker: ensure the test checks that supp is truthy
(non-empty) and that "Context Map" is present (e.g., replace the existing
condition with a combined requirement such as "supp and 'Context Map' in supp"),
so the test fails when the supplement is empty or missing the expected content.

---

Duplicate comments:
In `@tests/test_peek_integration.py`:
- Around line 17-22: PEEK_AVAILABLE currently only checks for import peek but CI
fails due to missing tiktoken; update the availability check so PEEK_AVAILABLE
is True only if both peek and tiktoken are importable (e.g., in the try block
attempt to import peek and import tiktoken, or use
importlib.util.find_spec('tiktoken') alongside importing peek) so tests that
depend on CachePolicy/tiktoken are correctly skipped when tiktoken is absent.

In `@vendor/peek/__init__.py`:
- Around line 1-3: Add the missing future annotations import as the very first
statement in the module: insert "from __future__ import annotations" at the top
of vendor/peek/__init__.py before any docstring or other code so that the
module-level forward-reference behavior applies; ensure it appears above the
existing module docstring and the subsequent "from peek.core import (" import to
comply with the project's Python coding guidelines.

In `@vendor/peek/core/__init__.py`:
- Line 1: Add the required future import at the top of the module: insert "from
__future__ import annotations" as the first import in
vendor/peek/core/__init__.py so the module conforms to the repository
convention; ensure it appears before the existing "from peek.core.cartographer
import Cartographer" import and that no other code precedes it.

In `@vendor/peek/core/policy.py`:
- Around line 26-30: _guard the optional tiktoken import inside
_default_tokenizer so importing the module never raises when tiktoken is absent:
wrap the import tiktoken in a try/except ImportError and return a safe fallback
TokenCounter (e.g., a simple character/word-count lambda) if import fails;
update the same pattern where tiktoken is imported again (lines referenced
around the second occurrence) and ensure CachePolicy or other callers still work
with the fallback tokenizer rather than causing an import-time crash.

In `@vendor/peek/llm/__init__.py`:
- Line 1: Add the required future-annotations import to this module by inserting
"from __future__ import annotations" as the very first statement in the file
(before the existing "from peek.llm.base import LMClient") so that forward
references in type hints are treated as postponed for the entire module; update
the top of vendor/peek/llm/__init__.py to include this import above the LMClient
import.

In `@vendor/peek/llm/anthropic_client.py`:
- Around line 43-50: The code only updates self._last when resp.usage exists,
leaving stale values otherwise; change the logic so that when getattr(resp,
"usage", None) is None you explicitly reset self._last to a new Usage with
zeroed fields (e.g., Usage(input_tokens=0, output_tokens=0)); update the branch
around usage = getattr(resp, "usage", None) to set self._last = Usage(0,0) in
the else path so last_usage() returns zeros instead of previous-call values.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 39e0efc3-ef6f-4791-8115-2a0ebea1f852

📥 Commits

Reviewing files that changed from the base of the PR and between be9e19b and 53f8b52.

📒 Files selected for processing (29)
  • .gitignore
  • CLAUDE.md
  • docs/peek-bench/PEEK-DIAGNOSIS.md
  • docs/peek-bench/PEEK-EXPERIMENTS.md
  • examples/peek_bench/__init__.py
  • examples/peek_bench/analyze_peek_trace.py
  • examples/peek_bench/run_oolong_rlm_vs_peek.py
  • pyproject.toml
  • src/pyrlm_runtime/peek_integration.py
  • tests/test_peek_integration.py
  • vendor/peek/LICENSE
  • vendor/peek/VENDOR.md
  • vendor/peek/__init__.py
  • vendor/peek/_io.py
  • vendor/peek/core/__init__.py
  • vendor/peek/core/cartographer.py
  • vendor/peek/core/context_map.py
  • vendor/peek/core/distiller.py
  • vendor/peek/core/evictor.py
  • vendor/peek/core/policy.py
  • vendor/peek/core/types.py
  • vendor/peek/llm/__init__.py
  • vendor/peek/llm/anthropic_client.py
  • vendor/peek/llm/base.py
  • vendor/peek/llm/gemini_client.py
  • vendor/peek/llm/openai_client.py
  • vendor/peek/prompts/cartographer.txt
  • vendor/peek/prompts/distiller.txt
  • vendor/peek/prompts/initial_context_map.txt
✅ Files skipped from review due to trivial changes (5)
  • .gitignore
  • vendor/peek/LICENSE
  • vendor/peek/prompts/initial_context_map.txt
  • CLAUDE.md
  • vendor/peek/VENDOR.md

Comment on lines +83 to +87
```
[cu-00010] Corpus-level label balance is exact: correct = 39,060 and
incorrect = 39,060.
```

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add a language to the fenced code block.

This triggers markdownlint MD040 and is trivial to fix.

Suggested fix
-```
+```text
 [cu-00010] Corpus-level label balance is exact: correct = 39,060 and
             incorrect = 39,060.
</details>

<details>
<summary>🧰 Tools</summary>

<details>
<summary>🪛 markdownlint-cli2 (0.22.1)</summary>

[warning] 83-83: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

</details>

</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @docs/peek-bench/PEEK-DIAGNOSIS.md around lines 83 - 87, The fenced code
block containing the message "[cu-00010] Corpus-level label balance is exact..."
is missing a language tag which triggers markdownlint MD040; update that fenced
block (the triple-backtick block that wraps that message) to include a language
identifier such as text (i.e., change totext) so the block is explicitly
marked as plain text and the linter warning is resolved.


</details>

<!-- fingerprinting:phantom:poseidon:hawk -->

<!-- This is an auto-generated comment by CodeRabbit -->

Comment on lines +232 to +237
scores = t["map_after"].get("scores", {})
for item in t["map_after"]["items"]:
life = by_id.get(item["id"])
if life is not None:
life["score_history"].append((q, int(scores.get(item["id"], 0))))
# Death detection: items evicted at this query

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Do not truncate score history to integers.

CachePolicy scores are floats; converting to int loses decay/penalty signal and can misclassify “sticky-zero” behavior.

Suggested fix
-                life["score_history"].append((q, int(scores.get(item["id"], 0))))
+                life["score_history"].append((q, float(scores.get(item["id"], 0.0))))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
scores = t["map_after"].get("scores", {})
for item in t["map_after"]["items"]:
life = by_id.get(item["id"])
if life is not None:
life["score_history"].append((q, int(scores.get(item["id"], 0))))
# Death detection: items evicted at this query
scores = t["map_after"].get("scores", {})
for item in t["map_after"]["items"]:
life = by_id.get(item["id"])
if life is not None:
life["score_history"].append((q, float(scores.get(item["id"], 0.0))))
# Death detection: items evicted at this query
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/peek_bench/analyze_peek_trace.py` around lines 232 - 237, The code
is truncating CachePolicy scores to ints when appending to score history; in the
loop that iterates t["map_after"]["items"] and updates life["score_history"]
(using scores = t["map_after"].get("scores", {}), by_id, item["id"], and q),
remove the int(...) cast and append the raw float score (e.g.,
scores.get(item["id"], 0.0)) so the full floating-point decay/penalty signal is
preserved.

Comment thread examples/peek_bench/analyze_peek_trace.py
Comment on lines +265 to +267
# After update, the Cartographer added a roadmap entry — map is non-empty
assert supp == "" or "Context Map" in supp

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Make the non-empty supplement test strict.

This assertion passes even when the supplement is empty, which hides regressions.

Suggested fix
-        assert supp == "" or "Context Map" in supp
+        assert "Context Map" in supp
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_peek_integration.py` around lines 265 - 267, The current test
allows an empty supplement because it uses "assert supp == "" or 'Context Map'
in supp"; change the assertion in tests/test_peek_integration.py to require a
non-empty supplement and that it contains the expected marker: ensure the test
checks that supp is truthy (non-empty) and that "Context Map" is present (e.g.,
replace the existing condition with a combined requirement such as "supp and
'Context Map' in supp"), so the test fails when the supplement is empty or
missing the expected content.

@apenab apenab force-pushed the feat/peek-improvement branch from 53f8b52 to 78e24f0 Compare May 25, 2026 18:38

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (10)
vendor/peek/core/__init__.py (1)

1-1: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add required future-annotations import at module top.

This module is missing from __future__ import annotations.

Suggested fix
+from __future__ import annotations
+
 from peek.core.cartographer import Cartographer
 from peek.core.context_map import ContextMap, normalize_section

As per coding guidelines: **/*.py: Use from __future__ import annotations in all Python files.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@vendor/peek/core/__init__.py` at line 1, The module is missing the required
future annotations import; add the line "from __future__ import annotations" at
the very top of this module before any other code so that type annotations are
deferred, ensuring compliance with the project's guideline for files like the
one that imports Cartographer (see the import "from peek.core.cartographer
import Cartographer").
vendor/peek/llm/openai_client.py (1)

46-52: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reset _last when usage metadata is missing.

Without a reset path, last_usage() can return stale token counts from a previous call.

Suggested fix
         usage = getattr(resp, "usage", None)
         if usage is not None:
             self._last = Usage(
                 input_tokens=getattr(usage, "prompt_tokens", 0) or 0,
                 output_tokens=getattr(usage, "completion_tokens", 0) or 0,
             )
+        else:
+            self._last = Usage()
         return resp.choices[0].message.content or ""
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@vendor/peek/llm/openai_client.py` around lines 46 - 52, The code currently
only sets self._last when resp.usage exists, leaving stale values for
last_usage(); update the handler in the method that reads resp to reset
self._last to None (or Usage.zero) when usage metadata is missing so
last_usage() does not return previous counts—modify the block that checks
getattr(resp, "usage", None) to explicitly assign self._last = None (or a zeroed
Usage) in the else/missing case; reference symbols: self._last, Usage, and
last_usage().
vendor/peek/llm/anthropic_client.py (1)

43-50: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Clear _last when resp.usage is absent.

Current flow can keep stale usage from prior responses.

Suggested fix
         usage = getattr(resp, "usage", None)
         if usage is not None:
             self._last = Usage(
                 input_tokens=getattr(usage, "input_tokens", 0) or 0,
                 output_tokens=getattr(usage, "output_tokens", 0) or 0,
             )
+        else:
+            self._last = Usage()
         text_blocks = [b.text for b in resp.content if getattr(b, "type", "") == "text"]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@vendor/peek/llm/anthropic_client.py` around lines 43 - 50, The code currently
only updates self._last when resp.usage exists, which can leave a stale Usage
instance; modify the response handling in the method that processes resp (the
block referencing resp, usage, and constructing Usage) so that if getattr(resp,
"usage", None) is None you explicitly set self._last = None, otherwise set
self._last = Usage(input_tokens=..., output_tokens=...); keep the existing
text_blocks extraction/return logic unchanged so only the _last clearing
behavior is added.
vendor/peek/_io.py (1)

14-33: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Enforce dict-only returns in extract_json.

json.loads can return non-dict JSON (e.g., list/scalar), but this function promises dict[str, Any] | None.

Suggested fix
 def extract_json(text: str) -> dict[str, Any] | None:
     """Best-effort JSON extraction from an LLM response."""
+    def _parse_object(candidate: str) -> dict[str, Any] | None:
+        try:
+            parsed = json.loads(candidate)
+        except json.JSONDecodeError:
+            return None
+        return parsed if isinstance(parsed, dict) else None
+
     s = text.strip()
-    try:
-        return json.loads(s)
-    except json.JSONDecodeError:
-        pass
+    parsed = _parse_object(s)
+    if parsed is not None:
+        return parsed
 
     for block in _scan_fenced_blocks(text):
-        try:
-            return json.loads(block.strip())
-        except json.JSONDecodeError:
-            continue
+        parsed = _parse_object(block.strip())
+        if parsed is not None:
+            return parsed
 
     for blob in _scan_balanced_braces(text):
-        try:
-            return json.loads(blob)
-        except json.JSONDecodeError:
-            continue
+        parsed = _parse_object(blob)
+        if parsed is not None:
+            return parsed
     return None
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@vendor/peek/_io.py` around lines 14 - 33, The extract_json function currently
returns whatever json.loads yields (which may be list or scalar) but its
signature promises dict[str, Any] | None; change the logic in extract_json to
only return a value when the parsed result is an instance of dict and otherwise
continue searching; specifically, after each json.loads call in the initial
parse, in the loop over _scan_fenced_blocks, and in the loop over
_scan_balanced_braces, check isinstance(parsed, dict) and return parsed only
when True, otherwise treat it as a JSONDecode/invalid type and continue; keep
the overall return None if no dict is found.
vendor/peek/llm/gemini_client.py (1)

59-65: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reset usage state when Gemini omits usage_metadata.

last_usage() should always reflect the latest completion, including zeroed usage when metadata is absent.

Suggested fix
         usage = getattr(resp, "usage_metadata", None)
         if usage is not None:
             self._last = Usage(
                 input_tokens=getattr(usage, "prompt_token_count", 0) or 0,
                 output_tokens=getattr(usage, "candidates_token_count", 0) or 0,
             )
+        else:
+            self._last = Usage()
         return resp.text or ""
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@vendor/peek/llm/gemini_client.py` around lines 59 - 65, The current logic in
Gemini client only updates self._last when resp.usage_metadata exists, so
last_usage() can incorrectly return stale data when the API omits
usage_metadata; inside the response handling block (the code setting usage =
getattr(resp, "usage_metadata", None) in gemini_client.py) ensure that when
usage is None you explicitly reset self._last = Usage(input_tokens=0,
output_tokens=0) (i.e., assign a zeroed Usage object) so last_usage() always
reflects the most recent completion; update the branch that currently only sets
self._last on non-None usage to also set the zeroed Usage when usage is None.
vendor/peek/core/cartographer.py (1)

62-63: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Strip whitespace-only content before accepting ADD/REPLACE.

content currently passes when it is only spaces, so empty-value edits can still be emitted.

Suggested patch
         if kind == "ADD":
             section = op.get("section")
             content = op.get("content")
-            if not isinstance(section, str) or not isinstance(content, str) or not content:
+            if not isinstance(section, str) or not isinstance(content, str):
                 continue
+            content = content.strip()
+            if not content:
+                continue
             section_norm = normalize_section(section)
             if section_norm not in _ALLOWED_SECTIONS:
                 continue
             out.append(Operation(type="ADD", section=section_norm, content=content))
@@
         elif kind == "REPLACE":
             item_id = op.get("item_id") or op.get("bullet_id")
             content = op.get("content")
-            if isinstance(item_id, str) and item_id and isinstance(content, str) and content:
-                out.append(Operation(type="REPLACE", item_id=item_id, content=content))
+            if isinstance(item_id, str) and item_id and isinstance(content, str):
+                content = content.strip()
+                if content:
+                    out.append(Operation(type="REPLACE", item_id=item_id, content=content))

Also applies to: 75-76

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@vendor/peek/core/cartographer.py` around lines 62 - 63, The guard that skips
non-string or empty edits should also treat whitespace-only strings as empty:
replace the current checks that use "not content" with a check on "not
content.strip()" so content consisting only of spaces is rejected; update both
occurrences that reference "section" and "content" (the IF that currently reads
"if not isinstance(section, str) or not isinstance(content, str) or not content:
continue" and the similar check at the other occurrence) to use content.strip()
instead.
vendor/peek/core/policy.py (1)

26-30: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Guard optional tiktoken import in the default tokenizer path.

Line 27 unconditionally imports tiktoken, so CachePolicy() fails at Line 74 when the optional extra is absent (already reflected in CI failures).

Suggested patch
 def _default_tokenizer() -> TokenCounter:
-    import tiktoken
-
-    enc = tiktoken.get_encoding("o200k_base")
-    return lambda s: len(enc.encode(s))
+    try:
+        import tiktoken
+
+        enc = tiktoken.get_encoding("o200k_base")
+        return lambda s: len(enc.encode(s))
+    except Exception:
+        # Optional dependency fallback.
+        return lambda s: len(s.encode("utf-8"))
#!/bin/bash
set -euo pipefail

echo "=== policy tokenizer call path ==="
rg -n -C2 'def _default_tokenizer|import tiktoken|self\.token_counter = _default_tokenizer' vendor/peek/core/policy.py

echo
echo "=== dependency declaration context ==="
rg -n -C3 'optional-dependencies|peek|tiktoken' pyproject.toml

Also applies to: 74-74

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@vendor/peek/core/policy.py` around lines 26 - 30, The _default_tokenizer
currently does an unconditional import of tiktoken which causes CachePolicy
initialization (where self.token_counter = _default_tokenizer) to fail if the
optional tiktoken dependency is missing; update _default_tokenizer to guard the
import (try/except ImportError) and return a safe fallback tokenizer (e.g., a
simple len(s) or whitespace-based counter) when tiktoken is not available,
ensuring CachePolicy and any call sites using token_counter work without the
optional extra.
vendor/peek/__init__.py (1)

1-3: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add postponed-annotations import at module top.

This module is missing from __future__ import annotations.

Suggested patch
+from __future__ import annotations
+
 """PEEK: Context Map as an Orientation Cache for Long-Context LLM Agents."""

As per coding guidelines: **/*.py: Use from __future__ import annotations in all Python files.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@vendor/peek/__init__.py` around lines 1 - 3, Add the future annotations
import at the top of the module: insert "from __future__ import annotations" as
the first import in vendor/peek/__init__.py (above the module docstring or
immediately after a shebang/encoding if present) so all forward references in
this module (e.g., types used in the public API that involve peek.core imports)
are postponed; ensure it appears before the existing "from peek.core import ("
block.
tests/test_peek_integration.py (2)

261-266: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Weak assertion allows empty supplement to pass.

The test name is test_system_prompt_supplement_nonempty_after_update, but the assertion supp == "" or "Context Map" in supp passes when supp is empty. Since the stub's _CARTOGRAPHER_RESPONSE adds a roadmap entry, the map should be non-empty after update.

Proposed fix
-        assert supp == "" or "Context Map" in supp
+        assert "Context Map" in supp
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_peek_integration.py` around lines 261 - 266, The test
test_system_prompt_supplement_nonempty_after_update currently allows an empty
supplement because it asserts supp == "" or "Context Map" in supp; change this
to assert the supplement is non-empty and includes the expected roadmap marker
produced by the stub (e.g., replace the assertion with assert "Context Map" in
supp or assert supp != "" and "Context Map" in supp) so that
system_prompt_supplement (examined after session.update_from_run with
_make_simple_trace and the stubbed _CARTOGRAPHER_RESPONSE) must contain the
roadmap entry.

17-22: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

PEEK_AVAILABLE check must include tiktoken — all CachePolicy tests are failing.

The pipeline failures confirm that import peek succeeds but CachePolicy(...) raises ModuleNotFoundError: No module named 'tiktoken'. The tiktoken package is a runtime dependency of CachePolicy.__post_init__() (via _default_tokenizer()), so tests skip the wrong condition.

Proposed fix
 try:
     import peek  # noqa: F401
+    import tiktoken  # noqa: F401

     PEEK_AVAILABLE = True
 except ImportError:
     PEEK_AVAILABLE = False
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_peek_integration.py` around lines 17 - 22, The test's
PEEK_AVAILABLE check only tries to import peek but must also ensure runtime
tokenizer dependency is present; update the test (PEEK_AVAILABLE) to attempt
importing both peek and tiktoken (or otherwise call CachePolicy.__post_init__
dependency path) so that ModuleNotFoundError from
CachePolicy.__post_init__/_default_tokenizer is caught and PEEK_AVAILABLE is set
False when tiktoken is missing; locate the variable PEEK_AVAILABLE in
tests/test_peek_integration.py and change the try/except to import peek and
import tiktoken (or perform a lightweight check that _default_tokenizer will
succeed) so CachePolicy tests are skipped when tiktoken is not installed.
🧹 Nitpick comments (1)
tests/test_peek_integration.py (1)

322-327: 💤 Low value

Test name and body are mismatched.

test_load_raises_without_peek doesn't test the ImportError path — it just creates a session normally via PeekSession.create(). The test duplicates test_create_factory_uses_model_adapter and the name is misleading.

Consider either renaming to reflect what it actually tests, or replacing with a test that patches _require_peek to raise and verifies the exception propagates.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_peek_integration.py` around lines 322 - 327, The test named
test_load_raises_without_peek is misleading because it does not exercise the
ImportError path; either rename it to reflect that it verifies
PeekSession.create succeeds when peek is importable (e.g.,
test_create_session_with_peek_present) or change its body to patch the helper
_require_peek to raise ImportError and assert that PeekSession.create(...)
propagates that error; locate the test function by name
test_load_raises_without_peek and update either the name or replace the body to
use monkeypatch to make _require_peek raise and assert the expected exception
from PeekSession.create.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@examples/peek_bench/analyze_peek_trace.py`:
- Around line 380-391: The aggregation helper agg(rows: list[dict[str, Any]])
can divide by len(rows) and call max()/rows[-1] on empty lists; update agg to
first check if not rows and return a safe default dict (e.g., zeros for
avg_items/avg_added/avg_evicted/avg_ops/helpful_tags/harmful_tags/neutral_tags/max_dup_pairs/final_items
or None where appropriate) so it never raises ZeroDivisionError or ValueError,
and ensure compare_contexts continues to call agg for each context without
additional checks. Use the existing symbol agg to implement this empty-list
guard and populate the same keys as the current return shape.

---

Duplicate comments:
In `@tests/test_peek_integration.py`:
- Around line 261-266: The test
test_system_prompt_supplement_nonempty_after_update currently allows an empty
supplement because it asserts supp == "" or "Context Map" in supp; change this
to assert the supplement is non-empty and includes the expected roadmap marker
produced by the stub (e.g., replace the assertion with assert "Context Map" in
supp or assert supp != "" and "Context Map" in supp) so that
system_prompt_supplement (examined after session.update_from_run with
_make_simple_trace and the stubbed _CARTOGRAPHER_RESPONSE) must contain the
roadmap entry.
- Around line 17-22: The test's PEEK_AVAILABLE check only tries to import peek
but must also ensure runtime tokenizer dependency is present; update the test
(PEEK_AVAILABLE) to attempt importing both peek and tiktoken (or otherwise call
CachePolicy.__post_init__ dependency path) so that ModuleNotFoundError from
CachePolicy.__post_init__/_default_tokenizer is caught and PEEK_AVAILABLE is set
False when tiktoken is missing; locate the variable PEEK_AVAILABLE in
tests/test_peek_integration.py and change the try/except to import peek and
import tiktoken (or perform a lightweight check that _default_tokenizer will
succeed) so CachePolicy tests are skipped when tiktoken is not installed.

In `@vendor/peek/__init__.py`:
- Around line 1-3: Add the future annotations import at the top of the module:
insert "from __future__ import annotations" as the first import in
vendor/peek/__init__.py (above the module docstring or immediately after a
shebang/encoding if present) so all forward references in this module (e.g.,
types used in the public API that involve peek.core imports) are postponed;
ensure it appears before the existing "from peek.core import (" block.

In `@vendor/peek/_io.py`:
- Around line 14-33: The extract_json function currently returns whatever
json.loads yields (which may be list or scalar) but its signature promises
dict[str, Any] | None; change the logic in extract_json to only return a value
when the parsed result is an instance of dict and otherwise continue searching;
specifically, after each json.loads call in the initial parse, in the loop over
_scan_fenced_blocks, and in the loop over _scan_balanced_braces, check
isinstance(parsed, dict) and return parsed only when True, otherwise treat it as
a JSONDecode/invalid type and continue; keep the overall return None if no dict
is found.

In `@vendor/peek/core/__init__.py`:
- Line 1: The module is missing the required future annotations import; add the
line "from __future__ import annotations" at the very top of this module before
any other code so that type annotations are deferred, ensuring compliance with
the project's guideline for files like the one that imports Cartographer (see
the import "from peek.core.cartographer import Cartographer").

In `@vendor/peek/core/cartographer.py`:
- Around line 62-63: The guard that skips non-string or empty edits should also
treat whitespace-only strings as empty: replace the current checks that use "not
content" with a check on "not content.strip()" so content consisting only of
spaces is rejected; update both occurrences that reference "section" and
"content" (the IF that currently reads "if not isinstance(section, str) or not
isinstance(content, str) or not content: continue" and the similar check at the
other occurrence) to use content.strip() instead.

In `@vendor/peek/core/policy.py`:
- Around line 26-30: The _default_tokenizer currently does an unconditional
import of tiktoken which causes CachePolicy initialization (where
self.token_counter = _default_tokenizer) to fail if the optional tiktoken
dependency is missing; update _default_tokenizer to guard the import (try/except
ImportError) and return a safe fallback tokenizer (e.g., a simple len(s) or
whitespace-based counter) when tiktoken is not available, ensuring CachePolicy
and any call sites using token_counter work without the optional extra.

In `@vendor/peek/llm/anthropic_client.py`:
- Around line 43-50: The code currently only updates self._last when resp.usage
exists, which can leave a stale Usage instance; modify the response handling in
the method that processes resp (the block referencing resp, usage, and
constructing Usage) so that if getattr(resp, "usage", None) is None you
explicitly set self._last = None, otherwise set self._last =
Usage(input_tokens=..., output_tokens=...); keep the existing text_blocks
extraction/return logic unchanged so only the _last clearing behavior is added.

In `@vendor/peek/llm/gemini_client.py`:
- Around line 59-65: The current logic in Gemini client only updates self._last
when resp.usage_metadata exists, so last_usage() can incorrectly return stale
data when the API omits usage_metadata; inside the response handling block (the
code setting usage = getattr(resp, "usage_metadata", None) in gemini_client.py)
ensure that when usage is None you explicitly reset self._last =
Usage(input_tokens=0, output_tokens=0) (i.e., assign a zeroed Usage object) so
last_usage() always reflects the most recent completion; update the branch that
currently only sets self._last on non-None usage to also set the zeroed Usage
when usage is None.

In `@vendor/peek/llm/openai_client.py`:
- Around line 46-52: The code currently only sets self._last when resp.usage
exists, leaving stale values for last_usage(); update the handler in the method
that reads resp to reset self._last to None (or Usage.zero) when usage metadata
is missing so last_usage() does not return previous counts—modify the block that
checks getattr(resp, "usage", None) to explicitly assign self._last = None (or a
zeroed Usage) in the else/missing case; reference symbols: self._last, Usage,
and last_usage().

---

Nitpick comments:
In `@tests/test_peek_integration.py`:
- Around line 322-327: The test named test_load_raises_without_peek is
misleading because it does not exercise the ImportError path; either rename it
to reflect that it verifies PeekSession.create succeeds when peek is importable
(e.g., test_create_session_with_peek_present) or change its body to patch the
helper _require_peek to raise ImportError and assert that
PeekSession.create(...) propagates that error; locate the test function by name
test_load_raises_without_peek and update either the name or replace the body to
use monkeypatch to make _require_peek raise and assert the expected exception
from PeekSession.create.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 07a9469a-64c3-42b3-aead-bea9302bc71b

📥 Commits

Reviewing files that changed from the base of the PR and between 53f8b52 and 78e24f0.

📒 Files selected for processing (28)
  • CLAUDE.md
  • docs/peek-bench/PEEK-DIAGNOSIS.md
  • docs/peek-bench/PEEK-EXPERIMENTS.md
  • examples/peek_bench/__init__.py
  • examples/peek_bench/analyze_peek_trace.py
  • examples/peek_bench/run_oolong_rlm_vs_peek.py
  • pyproject.toml
  • src/pyrlm_runtime/peek_integration.py
  • tests/test_peek_integration.py
  • vendor/peek/LICENSE
  • vendor/peek/VENDOR.md
  • vendor/peek/__init__.py
  • vendor/peek/_io.py
  • vendor/peek/core/__init__.py
  • vendor/peek/core/cartographer.py
  • vendor/peek/core/context_map.py
  • vendor/peek/core/distiller.py
  • vendor/peek/core/evictor.py
  • vendor/peek/core/policy.py
  • vendor/peek/core/types.py
  • vendor/peek/llm/__init__.py
  • vendor/peek/llm/anthropic_client.py
  • vendor/peek/llm/base.py
  • vendor/peek/llm/gemini_client.py
  • vendor/peek/llm/openai_client.py
  • vendor/peek/prompts/cartographer.txt
  • vendor/peek/prompts/distiller.txt
  • vendor/peek/prompts/initial_context_map.txt
✅ Files skipped from review due to trivial changes (4)
  • vendor/peek/prompts/initial_context_map.txt
  • vendor/peek/VENDOR.md
  • vendor/peek/LICENSE
  • CLAUDE.md

Comment on lines +380 to +391
def agg(rows: list[dict[str, Any]]) -> dict[str, float]:
return {
"avg_items": sum(r["items_after"] for r in rows) / len(rows),
"avg_added": sum(r["added"] for r in rows) / len(rows),
"avg_evicted": sum(r["evicted"] for r in rows) / len(rows),
"avg_ops": sum(r["ops_applied"] for r in rows) / len(rows),
"helpful_tags": sum(r["tag_helpful"] for r in rows),
"harmful_tags": sum(r["tag_harmful"] for r in rows),
"neutral_tags": sum(r["tag_neutral"] for r in rows),
"max_dup_pairs": max(r["duplicate_pairs"] for r in rows),
"final_items": rows[-1]["items_after"],
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Guard against empty trace lists in compare_contexts.

If either context has no traces, agg() will raise ZeroDivisionError on division by len(rows).

Suggested fix
     def agg(rows: list[dict[str, Any]]) -> dict[str, float]:
+        if not rows:
+            return {
+                "avg_items": 0.0, "avg_added": 0.0, "avg_evicted": 0.0,
+                "avg_ops": 0.0, "helpful_tags": 0.0, "harmful_tags": 0.0,
+                "neutral_tags": 0.0, "max_dup_pairs": 0.0, "final_items": 0.0,
+            }
         return {
             "avg_items": sum(r["items_after"] for r in rows) / len(rows),
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/peek_bench/analyze_peek_trace.py` around lines 380 - 391, The
aggregation helper agg(rows: list[dict[str, Any]]) can divide by len(rows) and
call max()/rows[-1] on empty lists; update agg to first check if not rows and
return a safe default dict (e.g., zeros for
avg_items/avg_added/avg_evicted/avg_ops/helpful_tags/harmful_tags/neutral_tags/max_dup_pairs/final_items
or None where appropriate) so it never raises ZeroDivisionError or ValueError,
and ensure compare_contexts continues to call agg for each context without
additional checks. Use the existing symbol agg to implement this empty-list
guard and populate the same keys as the current return shape.

github-actions Bot added 2 commits June 3, 2026 16:35
Documents the pyrlm-runtime project's structure, code style,
testing conventions, REPL backends, design constraints (the
"library is generic" rule), and the empirical research
methodology that gates benchmark-driven work.
Adds PEEK orientation-cache (arXiv:2605.19932) support to pyrlm-runtime,
vendored at vendor/peek/ from zhuohangu/peek commit 57de91ac (Apache-2.0
preserved with VENDOR.md). PeekSession bridges our ModelAdapter to peek's
LMClient protocol and injects the map through the existing system_prompt
_supplement slot on RLM — no changes to the core API.

What's included:

- vendor/peek/: vendored upstream copy with two patches applied inline,
  documented with `# peek-patch` comments:
  * vendor/peek/core/evictor.py — score-decay patch (C3): existing scores
    are multiplied by SCORE_DECAY=0.85 before each tagging pass, and
    neutral tags contribute -NEUTRAL_PENALTY=-0.5 instead of 0. Constants
    1.0/0.0 recover upstream behaviour exactly.
  * vendor/peek/_io.py — extract_json now scans fenced code blocks with
    an O(n) state machine instead of `re.findall(r"```...(.*?)\s*```",
    re.DOTALL | re.IGNORECASE)`. The regex catastrophically backtracks on
    LLM output with an unclosed fence; reproduced as a 45-minute hung
    benchmark run.

- src/pyrlm_runtime/peek_integration.py: PeekSession, _PeekLMClientAdapter,
  and trace_to_peek_trajectory. Optional trace_dir writes per-update JSON
  snapshots (query, trajectory, map before/after with item scores, parsed
  Distiller and Cartographer outputs, diff) for offline diagnosis.

- examples/peek_bench/run_oolong_rlm_vs_peek.py: benchmark harness comparing
  RLM vs RLM+PEEK on oolongbench/oolong-synth (the only OOLONG dataset on
  HuggingFace as of 2026-05-25). Supports explicit context selection,
  fully-online evolution, and per-query JSON tracing via --trace.

- examples/peek_bench/analyze_peek_trace.py: diagnostic analyzer reporting
  per-query churn, Distiller tag distribution, Cartographer operation
  breakdown, per-item lifetimes, and duplicate rate.

- tests/test_peek_integration.py: 24 tests (FakeAdapter / stub LMClient
  driven), including regression tests for the trace_dir behaviour and
  the catastrophic-backtracking bug fix.

- docs/peek-bench/PEEK-EXPERIMENTS.md: full experiment log — pre-committed
  hypothesis and decision rule, Pilot, Phase 2 (N=27), Phase 3 (N=5 large
  contexts), Phase 4 (vendoring + C3 patch + C5' ablation + bug fix).

- docs/peek-bench/PEEK-DIAGNOSIS.md: trace-based diagnosis of PEEK
  upstream failure modes — silent-failures audit (refuted), Distiller
  blindness (confirmed and addressed by C3), hallucination, etc.

Phase 4.C.1 (C3 score decay) result: +5.7pp aggregate improvement vs
upstream in an in-session A/B (n=5 large contexts). agnews +18.3pp,
multinli +7.4pp, metaphors +4.0pp; imdb and negation within the LLM
noise floor. Trace-level mechanism confirmed (metaphors neutral-tag
rate dropped 29% -> 4%).

Phase 4.C.2 (C5' section-weighted decay + rr-* birth bonus) result:
REJECTED across 3 paired A/B trials (mean -3.2pp vs C3 alone, dominated
by per-sub-dataset noise of ±5–20pp on N=1). Mechanism kept in vendored
library only via the validated C3 globals; per-section dicts removed
from the working tree. Full ablation documented in PEEK-EXPERIMENTS.md.

peek-ai vendored — no PyPI install needed. tiktoken>=0.7.0 added as the
`peek` extra. pyproject.toml updated to import vendor/peek/ as `peek`.
@apenab apenab force-pushed the feat/peek-improvement branch from 78e24f0 to 169853e Compare June 3, 2026 14:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant