feat(tlon): add native management tool#3
Draft
wca4a wants to merge 2 commits into
Draft
Conversation
This was referenced May 15, 2026
wca4a
pushed a commit
that referenced
this pull request
May 22, 2026
Three issues flagged by the Copilot review on this PR: 1. Double JSON emit on stage failure (Copilot #1, #2). When -Stage <name> ran a worker that threw, Invoke-Stage's finally emitted a JSON result frame AND the entry-point catch emitted a second error frame -- producing two concatenated JSON objects on stdout and breaking the one-line-per-invocation contract that drivers parse against. Same issue applied to -Json mode on a full install (every stage's finally plus a final error frame missing duration_ms/skipped). Fix: Invoke-Stage's finally now sets $script:_StageEmittedErrorFrame when it emits a failure frame; the entry-point catch checks the flag and skips its own emit, still exit 1. 2. $prevEAP uninitialized on early try-block throw (Copilot #3). In Install-Uv, Test-Python, Test-Node's winget fallback, _Run-NpmInstall, and the playwright block, '$prevEAP = $ErrorActionPreference' lived as the first statement INSIDE the try. If anything between 'try {' and that line threw (Write-Info on an unusual host, the npx-finding loop, etc.), the catch's 'if ($prevEAP) { ... }' restore was a no-op and EAP could remain relaxed. Fix: hoist '$prevEAP = $ErrorActionPreference' to the line immediately before 'try {' in all five sites. Catch's restore is now always meaningful regardless of where in the try the throw originated. No change to Invoke-Stage's success path or to the four lint-clean EAP sites (Test-Node was the only winget-related catch). All 19 metadata smoke tests still pass.
wca4a
pushed a commit
that referenced
this pull request
May 22, 2026
…ogging
The system prompt's 'Conversation started:' line carried minute precision
(%I:%M %p), making it byte-unstable across every rebuild path. Within a
CLI session the in-memory cache held, but on the gateway path (fresh
AIAgent per turn → restore from session DB), any silent failure in the
read or write path dropped the cache stem and forced a full re-prefill
on every subsequent turn. Local prefix-caching backends (llama.cpp /
vLLM) saw this as KV-cache invalidation; remote prefix-caching providers
saw it as an Anthropic-style cache miss.
Three changes:
1. Date-only timestamp ('Sunday, May 17, 2026' instead of '... 03:42 PM').
System prompt now byte-stable for the full day. The model can still
query exact time via tools when it actually needs it. Credit:
@iamfoz (PR NousResearch#20451).
2. Loud logging on session DB write failures. The update_system_prompt
call used to log at DEBUG, hiding disk-full / locked-database / schema
drift behind a silent fall-through that forced fresh rebuilds on
every subsequent turn. Now WARN with the session id and exception so
persistent issues show up in agent.log without verbose mode.
3. Three-way stored-state distinction on read. The previous
'session_row.get("system_prompt") or None' collapsed three states
into one (missing row / null column / empty string). Now we tell them
apart and WARN when a continuing session lands on null/empty (which
means the previous turn's write never persisted — every subsequent
turn rebuilds and the prefix cache misses every time).
The restore block is extracted into _restore_or_build_system_prompt()
so the prefix-cache path can be unit-tested in isolation.
E2E proof: fresh AIAgent constructed for turn 2 across a minute-boundary
sleep restores byte-identical bytes from the session DB. NULL stored
prompt fires the new warning. Date-only timestamp survives the rebuild
path. All on real SessionDB, no mocks.
Tests:
- tests/agent/test_system_prompt_restore.py (10 new tests)
- tests/run_agent/test_run_agent.py::TestBuildSystemPrompt::
test_datetime_is_date_only_not_minute_precision
Closes NousResearch#20451 (date-only), NousResearch#18547 (prefix stabilization),
NousResearch#8689 (stabilize timestamp across compression), NousResearch#15866 (timestamp
caching question), NousResearch#8687 (compression timestamp), NousResearch#27339
(claim #3: live timestamp in cached system prompt).
Co-authored-by: Martyn Forryan <9133432+iamfoz@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stacked on #2. Split from NousResearch#26300.
Scope:
tools/tlon_tool.pyfor Tlon groups, channels, invites, roles/admins, contacts, settings, history, activity, expose, hooks, notebook posts, uploads, and raw scry/poke/thread operations.group_create_owned/group_create_with_adminsadmin assignment tests and toolset wiring.Validation:
scripts/run_tests.sh tests/tools/test_tlon_tool.py tests/test_toolsets.pyResult: 35 passed.