Skip to content

fix: prevent rtk read from corrupting JSON/data files (#464)#522

Open
ousamabenyounes wants to merge 1 commit intortk-ai:developfrom
ousamabenyounes:fix/read-json-corruption
Open

fix: prevent rtk read from corrupting JSON/data files (#464)#522
ousamabenyounes wants to merge 1 commit intortk-ai:developfrom
ousamabenyounes:fix/read-json-corruption

Conversation

@ousamabenyounes
Copy link
Contributor

Summary

Fixes #464rtk read package.json corrupts JSON files when string values contain /* or */.

Root cause: .json files were classified as Language::Unknown, which uses /* / */ as block comment delimiters. The string "packages/*" was interpreted as opening a block comment, and "**/package.json" as closing it — everything between was silently deleted.

Fix: Add Language::Data variant with no comment patterns. JSON, YAML, TOML, XML, CSV, Markdown, and other data formats skip all comment stripping and code filtering entirely.

Before/After

# BEFORE: rtk read corrupts the JSON structure
$ rtk read package.json
{
  "name": "my-monorepo",
  "workspaces": {
    "packages": [
      "sort-package-json",                    # <-- from lint-staged!
      "biome check --write --no-errors-on-unmatched"
    ]
  }
}
# scripts: MISSING, lint-staged: MISSING, catalog: MISSING

# AFTER: JSON structure fully preserved
$ rtk read package.json
{
  "name": "my-monorepo",
  "workspaces": {
    "packages": [
      "packages/*"                            # <-- correct
    ],
    "catalog": { ... }
  },
  "scripts": {                                # <-- preserved
    "build": "bun run --workspaces build",
    "lint": "bun run --workspaces lint"
  },
  "lint-staged": {                            # <-- preserved
    "**/package.json": [ ... ]
  }
}

Why this matters

package.json is read dozens of times per Claude session. When corrupted:

  • Claude sees wrong workspace config, missing scripts, broken lint rules
  • Makes decisions based on false metadata (wrong build commands, missing dependencies)
  • Re-reads the file in a loop trying to reconcile the corruption = wasted tokens

Also affects: tsconfig.json, docker-compose.yml, Cargo.lock, .env, schema.graphql, *.sql, etc.

Affected extensions

json, jsonc, json5, yaml, yml, toml, xml, csv, tsv, graphql, gql, sql, md, markdown, txt, env, lock

Test plan

  • test_language_detection_data_formats — all data extensions map to Language::Data
  • test_json_no_comment_stripping — reproduces exact bug: rtk read corrupts package.json when JSON strings contain /* or */ #464 scenario (packages/*, scripts, lint-staged)
  • test_json_aggressive_filter_preserves_structure — aggressive filter also safe for JSON
  • Full suite: 767 passed, 0 failed
  • Manual test: rtk read package.json with the exact fixture from the issue
  • validate-docs.sh passes

Generated with Claude Code

@ousamabenyounes ousamabenyounes changed the base branch from master to develop March 12, 2026 03:17
@pszymkowiak
Copy link
Collaborator

LGTM — good fix for a nasty bug. Adding Language::Data with empty comment patterns is the right approach.

Build passes, 55 filter tests pass. The exact reproduction case from #464 (packages/* treated as block comment) is covered.

One nit (non-blocking): the version bumps in ARCHITECTURE.md/CLAUDE.md/README.md are unrelated noise — ideally drop them from this PR since release-please handles versions. But not a blocker.

@aeppling for merge.

@pszymkowiak pszymkowiak requested a review from aeppling March 12, 2026 20:25
@ousamabenyounes ousamabenyounes force-pushed the fix/read-json-corruption branch from cded07a to 5f4ea62 Compare March 12, 2026 21:53
@ousamabenyounes
Copy link
Contributor Author

Thanks! Removed the version bumps from ARCHITECTURE.md, CLAUDE.md and README.md — rebased on latest develop first, so the PR now only touches src/filter.rs (1 file, 42+/10-).

Note: the Benchmark and Documentation Validation checks are failing due to pre-existing issues on develop (missing truncate function in cargo_cmd.rs:838, and version string mismatch in docs). Not related to this PR.

Add Language::Data variant for data formats (JSON, YAML, TOML, XML, CSV, etc.)
with empty comment patterns to prevent comment stripping. AggressiveFilter
falls back to MinimalFilter for data files.

Fixes rtk-ai#464

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Ousama Ben Younes <benyounes.ousama@gmail.com>
@ousamabenyounes ousamabenyounes force-pushed the fix/read-json-corruption branch from 5f4ea62 to 9533614 Compare March 13, 2026 11:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: rtk read corrupts package.json when JSON strings contain /* or */

2 participants