Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 30 additions & 0 deletions .env.solo.example
Original file line number Diff line number Diff line change
Expand Up @@ -18,3 +18,33 @@ SLOPOMETRY_ENABLE_COMPLEXITY_FEEDBACK=true

# Include development guidelines from CLAUDE.md in feedback
SLOPOMETRY_FEEDBACK_DEV_GUIDELINES=false

# Memory Extraction (for solo find-memories and show-memories)
# LLM endpoint for extracting memories from transcripts
SLOPOMETRY_MEMORY_LLM_ENDPOINT=https://your-llm-endpoint.com/v1
SLOPOMETRY_MEMORY_LLM_MODEL=your-model-name
SLOPOMETRY_MEMORY_LLM_API_KEY=your-api-key-here

# Memory Embedding (for uniqueness scoring)
# Embedding endpoint for computing memory similarity
SLOPOMETRY_MEMORY_EMBEDDING_ENDPOINT=https://your-embedding-endpoint.com/v1
SLOPOMETRY_MEMORY_EMBEDDING_MODEL=your-embedding-model
SLOPOMETRY_MEMORY_EMBEDDING_API_KEY=your-embedding-api-key-here

# Memory Freshness & Staleness
# Similarity thresholds for reconciliation (dedupe/merge/supersede)
# SLOPOMETRY_FRESHNESS_THRESHOLD_FLOOR=0.45
# SLOPOMETRY_FRESHNESS_THRESHOLD_CEILING=0.95

# Max memories loaded for freshness validation and staleness audit
# SLOPOMETRY_MEMORY_QUERY_LIMIT=200

# Max chars of transcript sent to LLM for staleness audit
# SLOPOMETRY_MEMORY_TRANSCRIPT_TRUNCATION_CHARS=15000

# Number of recent transcripts used as context for prune-memories
# SLOPOMETRY_MEMORY_PRUNE_TRANSCRIPT_WINDOW=3

# Max tokens for LLM responses
# SLOPOMETRY_MEMORY_RECONCILIATION_MAX_TOKENS=200
# SLOPOMETRY_MEMORY_STALENESS_AUDIT_MAX_TOKENS=1000
30 changes: 17 additions & 13 deletions .env.summoner.example
Original file line number Diff line number Diff line change
Expand Up @@ -11,20 +11,13 @@ SLOPOMETRY_ENABLE_COMPLEXITY_FEEDBACK=true
SLOPOMETRY_FEEDBACK_DEV_GUIDELINES=false

# LLM Integration (required for userstorify and AI features)
# Set offline_mode=false to enable external LLM requests
# Set offline_mode=false to enable external LLM requests.
# Single agent (MiniMax-M3, served via vLLM with the OpenAI-compatible API).
# The endpoint below is the public ingress for the in-cluster MXFP4 deployment.
SLOPOMETRY_OFFLINE_MODE=false
SLOPOMETRY_LLM_PROXY_URL=https://your-proxy.example.com
SLOPOMETRY_LLM_PROXY_API_KEY=your-api-key
SLOPOMETRY_LLM_RESPONSES_URL=https://your-proxy.example.com/responses

# User Story Generation
# Available agents: gpt_oss_120b, gemini, minimax
SLOPOMETRY_USER_STORY_AGENT=gpt_oss_120b

# Anthropic Provider (e.g. sglang with MiniMax-M2.1)
# Provides access to MiniMax models via custom Anthropic-compatible endpoints
SLOPOMETRY_ANTHROPIC_URL=https://your-sglang-endpoint.example.com
SLOPOMETRY_ANTHROPIC_API_KEY=your-anthropic-api-key
SLOPOMETRY_LLM_PROXY_URL=https://llm2.droidcraft.org/minimax-m3-mxfp4-vllm/v1
SLOPOMETRY_LLM_PROXY_API_KEY=your-vllm-api-key
SLOPOMETRY_LLM_MODEL_NAME=olka-fi/MiniMax-M3-MXFP4

# Interactive Rating for Dataset Quality Control
# Prompts you to rate generated user stories (1-5)
Expand All @@ -40,3 +33,14 @@ SLOPOMETRY_HF_DEFAULT_REPO=username/slopometry-dataset
SLOPOMETRY_MAX_PARALLEL_WORKERS=6
# Maximum commits to analyze for baseline computation
SLOPOMETRY_BASELINE_MAX_COMMITS=100

# Memory Extraction (for solo find-memories, prune-memories, show-memories)
# Uses the same LLM proxy endpoint as summoner features
SLOPOMETRY_MEMORY_LLM_ENDPOINT=https://llm2.droidcraft.org/minimax-m3-mxfp4-vllm/v1
SLOPOMETRY_MEMORY_LLM_MODEL=olka-fi/MiniMax-M3-MXFP4
SLOPOMETRY_MEMORY_LLM_API_KEY=your-vllm-api-key

# Embedding endpoint (for memory similarity and uniqueness scoring)
SLOPOMETRY_MEMORY_EMBEDDING_ENDPOINT=https://your-embedding-endpoint.com/v1
SLOPOMETRY_MEMORY_EMBEDDING_MODEL=your-embedding-model
SLOPOMETRY_MEMORY_EMBEDDING_API_KEY=your-embedding-api-key
8 changes: 6 additions & 2 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,8 +45,9 @@ uv tool install . --reinstall
- **CLI** (`src/slopometry/cli.py`): Hybrid CLI with flat core commands (install, uninstall, status, latest, shell-completion) and persona subcommands (solo, summoner)
- **Database** (`src/slopometry/core/database.py`): SQLite storage with platform-specific default locations
- **Hook Handler** (`src/slopometry/core/hook_handler.py`): Script invoked by Claude Code hooks to capture events
- **Models** (`src/slopometry/core/models.py`): Pydantic models for HookEvent, SessionStatistics
- **Models** (`src/slopometry/core/models/`): Pydantic models for HookEvent, SessionStatistics, MemoryEntry
- **Settings** (`src/slopometry/core/settings.py`): Pydantic-settings configuration with .env support
- **Memory Freshness** (`src/slopometry/solo/services/memory_freshness.py`): LLM-driven reconciliation (keep_both/merge/supersede/dedupe) and staleness audit for memory candidates
- **LLM Wrapper** (`src/slopometry/summoner/services/llm_wrapper.py`): AI agents for analyzing git diffs and generating user stories

### How It Works
Expand Down Expand Up @@ -127,7 +128,7 @@ echo '{"session_id": "test123", "transcript_path": "/tmp/transcript.jsonl", "too

## Adding New Tool Types

1. Add to `ToolType` enum in models.py
1. Add to `ToolType` enum in `src/slopometry/core/models/core.py`
2. Update `TOOL_TYPE_MAP` in hook_handler.py
3. No database migration needed (sqlite-utils handles schema)

Expand All @@ -148,6 +149,9 @@ The experiment tracking feature includes:
- `solo ls`: List recent sessions
- `solo show <session-id>`: Show detailed session statistics
- `latest`: Show latest session statistics
- `solo find-memories`: Scan transcripts, extract memory candidates, run freshness validation, and save
- `solo prune-memories`: Audit existing memories for staleness and retire stale ones
- `solo show-memories`: List and manage memories for a project

### Key Components
- **CLI Calculator**: Measures "Completeness Likelihood Improval" (0-1.0 scale)
Expand Down
90 changes: 74 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ A tool that lurks in the shadows, tracks and analyzes Claude Code sessions provi

**NEWS:**

* **Jun 2026: Dropping support for closed-source models for all Summoner features*: Since there is now a precendent for silent sabotage by providers, based on flavor of the week media posture - we can no longer rely on closed systems for features that require meta-reasoning or need to run reliably. We appreciate Anthropic being up-front about this in the model card though!

* **April 2026: Behavioral pattern detection.** Sessions are now scanned for ownership dodging ("pre-existing", "not introduced by") and simple workaround ("simplest", "for now", "quick fix") phrases in assistant output, reported as per-minute rates. Rates are persisted per-repo and `current-impact` shows rolling average trends. Display reordered: plans, token impact, and behavioral patterns now appear first. Also: newly written files no longer incorrectly flagged as blind spots, and single-method class detection skips data classes with only `@property` methods.

* **February 2026: OpenCode 1.2.10+ now supported for solo features, including stop hook feedback! See [plugin doc](plugins/opencode/README.md).**
Expand Down Expand Up @@ -44,7 +46,7 @@ Worst offenders and overall slop at a glance
**See more examples and FAQ in details below**:
<details>

### Q: I don't need to verify when my tests are passing, right?
### Q: I don't need to verify when my tests are passing, right?

A: lmao

Expand All @@ -53,9 +55,22 @@ What clevery ways you ask? Silent exception swallowing upstream ofc!

Slopometry forces agents to state the purpose of swallowed exceptions and skipped tests, this is a simple LLM-as-judge call for your RL pipeline (you're welcome)

A handler only counts as *swallowed* if it does **no processing of any kind** — only `pass`/`continue`/`break`/`...`. Recovering a fallback value (`except ImportError: torch = None`) or counting the failure (`errors += 1`) is real handling and is not flagged. When a silent handler is genuinely correct, mark it `# slopometry: allow-silent` to acknowledge it — but slopometry counts those markers per file and **blocks on any increase**, so an agent can't reward-hack by mass-suppressing real swallows.
A handler only counts as *swallowed* if it does **no processing of any kind** — only `pass`/`continue`/`break`/`...`. Recovering a fallback value (`except ImportError: torch = None`) or counting the failure (`errors += 1`) is real handling and is not flagged.

#### Acknowledging Silent Handlers

When a silent handler is genuinely correct (e.g., context manager cleanup that always succeeds), mark it with `# slopometry: allow-silent`:

```python
try:
acquire_lock()
except Exception:
pass # slopometry: allow-silent - lock already released on context exit
```

Slopometry counts those markers per file and **blocks on any increase**, so an agent can't reward-hack by mass-suppressing real swallows. If you see a blocking increase, review the NEW markers and confirm each is justified.

Here is Opus 4.5, which is writing 90% of your production code by 2026:
Here is Opus 4.5, which is writing 90% of your production code by 2026:
![silent-errors](assets/force-review-silent-errors.png)
![silent-errors2](assets/force-review-silent-errors-2.png)

Expand Down Expand Up @@ -107,6 +122,8 @@ A: There are advanced features for temporal and cross-project measurement of slo

Seriously, please do not open PRs with support for any kind of unserious languages. Just fork and pretend you made it. We are ok with that. Thank you.

**Concurrent sessions**: Stop hook feedback is designed for a single active session per project. Running two OpenCode or Claude Code sessions in the same project directory simultaneously may cause feedback suppression (shared per-project cache), dropped stop events (per-project lock contention), and incorrect `edited_files` scoping between sessions.

# Installation

Both Anthropic models and MiniMax-M2 are fully supported as the `claude code` drivers.
Expand All @@ -129,11 +146,9 @@ uv tool update-shell
```

# Restart your terminal or run:
```bash
source ~/.zshrc # for zsh
# or: source ~/.bashrc # for bash

# After making code changes, reinstall to update the global tool
uv tool install . --reinstall --find-links "https://github.com/Droidcraft/rust-code-analysis/releases/expanded_assets/python-2026.1.31"
```

## Quick Start
Expand All @@ -159,6 +174,15 @@ slopometry latest
# Save session artifacts (transcript, plans, tasks) to .slopometry/<session_id>/
slopometry solo save-transcript # latest
slopometry solo save-transcript <session_id>

# Memory extraction: scan transcripts and extract durable facts (requires LLM)
slopometry solo find-memories

# Audit existing memories for staleness — fixed bugs, completed work (requires LLM)
slopometry solo prune-memories

# Browse and manage memories
slopometry solo show-memories
```

![slopometry-roles.png](assets/slopometry-roles.png)
Expand Down Expand Up @@ -218,7 +242,45 @@ curl -o ~/.config/slopometry/.env https://raw.githubusercontent.com/TensorTempla
```


### Development Installation
Core settings:

- `SLOPOMETRY_DATABASE_PATH`: Custom database location (optional)
- Default locations:
- Linux: `~/.local/share/slopometry/slopometry.db` (or `$XDG_DATA_HOME/slopometry/slopometry.db` if set)
- macOS: `~/Library/Application Support/slopometry/slopometry.db`
- Windows: `%LOCALAPPDATA%\slopometry\slopometry.db`
- `SLOPOMETRY_ENABLE_COMPLEXITY_ANALYSIS`: Collect complexity metrics (default: `true`)
- `SLOPOMETRY_ENABLE_COMPLEXITY_FEEDBACK`: Provide feedback to Claude (default: `false`)

### LLM-dependent features

By default, slopometry runs in **offline mode** (`SLOPOMETRY_OFFLINE_MODE=true`), which disables all external LLM calls. The following features require an LLM endpoint and will refuse to run until you set `SLOPOMETRY_OFFLINE_MODE=false` and configure endpoints:

- **`solo find-memories`** — scans transcripts, extracts memory candidates via LLM, runs freshness reconciliation against existing memories, and retires stale ones
- **`solo prune-memories`** — audits existing memories for staleness against recent transcripts
- **`summoner userstorify`** — generates user stories from git diffs
- **`summoner user-story-export --upload-to-hf`** — uploads dataset to Hugging Face

To enable:

```bash
# Disable offline mode
SLOPOMETRY_OFFLINE_MODE=false

# Chat LLM endpoint (OpenAI-compatible API)
SLOPOMETRY_MEMORY_LLM_ENDPOINT=https://your-llm-endpoint.com/v1
SLOPOMETRY_MEMORY_LLM_MODEL=your-model-name
SLOPOMETRY_MEMORY_LLM_API_KEY=your-api-key

# Embedding endpoint (for memory similarity and uniqueness scoring)
SLOPOMETRY_MEMORY_EMBEDDING_ENDPOINT=https://your-embedding-endpoint.com/v1
SLOPOMETRY_MEMORY_EMBEDDING_MODEL=your-embedding-model
SLOPOMETRY_MEMORY_EMBEDDING_API_KEY=your-embedding-api-key
```

# Development

For working on slopometry itself (not just installing it):

```bash
git clone https://github.com/TensorTemplar/slopometry
Expand All @@ -227,15 +289,11 @@ uv sync --extra dev
uv run pytest
```

Customize via `.env` file or environment variables:
After making code changes, reinstall to update the global tool:

- `SLOPOMETRY_DATABASE_PATH`: Custom database location (optional)
- Default locations:
- Linux: `~/.local/share/slopometry/slopometry.db` (or `$XDG_DATA_HOME/slopometry/slopometry.db` if set)
- macOS: `~/Library/Application Support/slopometry/slopometry.db`
- Windows: `%LOCALAPPDATA%\slopometry\slopometry.db`
- `SLOPOMETRY_ENABLE_COMPLEXITY_ANALYSIS`: Collect complexity metrics (default: `true`)
- `SLOPOMETRY_ENABLE_COMPLEXITY_FEEDBACK`: Provide feedback to Claude (default: `false`)
```bash
uv tool install . --reinstall --find-links "https://github.com/Droidcraft/rust-code-analysis/releases/expanded_assets/python-2026.1.31"
```

# Cite

Expand All @@ -256,5 +314,5 @@ Customize via `.env` file or environment variables:
[x] - Add plan evolution log based on claude's todo shenanigans
[ ] - Rename the readme.md to wontreadme.md because it takes more than 15 seconds or whatever the attention span is nowadays to read it all. Maybe make it all one giant picture? Anyway, stop talking to yourself in the roadmap.
[ ] - Finish git worktree-based [NFP-CLI](https://tensortemplar.substack.com/p/humans-are-no-longer-embodied-amortization) (TM) training objective implementation so complexity metrics can be used as additional process reward for training code agents
[ ] - Extend stop hook feedback with LLM-as-Judge to support guiding agents based on smells and style guide
[x] - Memory extraction with LLM-driven freshness reconciliation and staleness auditing
[ ] - Not go bankrupt from having to maintain open source in my free time, no wait...
Loading
Loading