Skip to content

feat: add durable git snapshot history#98

Merged
steipete merged 3 commits into
mainfrom
codex/crawlkit-history-migration
Jun 19, 2026
Merged

feat: add durable git snapshot history#98
steipete merged 3 commits into
mainfrom
codex/crawlkit-history-migration

Conversation

@steipete

Copy link
Copy Markdown
Collaborator

Summary

  • add immutable publish --tag snapshots and non-mutating update --ref historical restores
  • move Git history/tree/ref and FTS escaping mechanics onto CrawlKit
  • preserve Discord DM/privacy filters, incremental import planning, media, and embedding behavior
  • retry concurrent atomic branch-and-tag pushes and keep share repositories private
  • refresh Go dependencies

Proof

  • go test ./...
  • go test -race ./...
  • go vet ./...
  • staticcheck, deadcode, gofumpt, and gosec passed
  • autoreview clean

@steipete steipete requested a review from a team as a code owner June 19, 2026 11:43
@clawsweeper

clawsweeper Bot commented Jun 19, 2026

Copy link
Copy Markdown

Codex review: needs changes before merge. Reviewed June 19, 2026, 8:00 AM ET / 12:00 UTC.

Summary
The PR adds publish --tag and update --ref, migrates share Git/history/FTS helpers to CrawlKit, updates share docs/tests, and bumps Go dependencies.

Reproducibility: yes. by source inspection: update --ref routes to ImportAt, which rewrites RepoPath to a temp directory before calling Import; Import enriches legacy manifests from opts.RepoPath and then stores that manifest. A focused test can reproduce it by stripping file_manifests from a tagged snapshot and checking stored import state before the next normal update.

Review metrics: 3 noteworthy metrics.

  • Files Touched: 12 files changed. The PR spans CLI, share internals, docs, tests, and dependency metadata, so maintainers should review the final merge result broadly.
  • New CLI Surface: 2 flags added. publish --tag and update --ref become user-facing command contracts that need compatibility validation.
  • Dependency Refresh: 1 direct CrawlKit bump plus indirect updates. The implementation relies on new CrawlKit mirror helpers, so the selected pseudo-version matters before merge.

Merge readiness
Overall: 🦐 gold shrimp
Proof: 🌊 off-meta tidepool
Patch quality: 🦐 gold shrimp
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • [P2] Fix legacy historical import manifest enrichment before merge.
  • Remove or rehome the CHANGELOG.md entries unless a release owner sponsors them.
  • [P2] Validate normal publish/update plus tagged publish and historical restore on the final branch.

Risk before merge

  • [P1] Historical imports of legacy snapshots that lack file_manifests can store an unenriched manifest and lose changed-tail incremental planning on the next normal update.
  • [P1] The PR changes share Git fetch/push/write behavior and moves more of that contract onto a CrawlKit pseudo-version, so existing private remotes need upgrade proof beyond green unit tests.
  • [P1] The PR edits CHANGELOG.md, which is release-owned in the OpenClaw review contract and can create release-note drift unless a release owner intentionally keeps it.

Maintainer options:

  1. Fix Historical Import State (recommended)
    Repair update --ref so legacy manifests are enriched from the original repo and resolved commit before stored import state is written, then prove normal and tagged share update flows on the final branch.
  2. Accept The CrawlKit Contract Change
    Maintainers may intentionally accept the broader CrawlKit-backed share Git behavior after owning the upgrade risk for existing private share repositories.
  3. Pause The CLI Surface
    If durable tags and historical restores are not the desired share API, pause this PR in favor of a narrower CrawlKit hardening or dependency-only change.
Copy recommended automerge instruction
@clawsweeper automerge

Special instructions:
Fix `update --ref` so legacy manifests imported from a historical ref are enriched with Git blob fingerprints from the original share repo and resolved commit before `MarkImported`, add a regression test that strips `file_manifests`, and remove or rehome the PR's `CHANGELOG.md` feature entries unless a release owner sponsors them.

Next step before merge

  • [P2] A narrow repair can address the historical import manifest-state bug and release-owned changelog cleanup; maintainer still owns final compatibility and product acceptance.

Security
Cleared: The diff touches Git ref materialization and dependency sources, but path validation rejects absolute, parent, and NUL paths, and I found no concrete security or supply-chain blocker.

Review findings

  • [P2] Preserve git fingerprints for historical imports — internal/share/share.go:868
  • [P3] Drop the release-owned changelog entries — CHANGELOG.md:7-13
Review details

Best possible solution:

Land the feature after fixing historical manifest enrichment for update --ref, removing or rehoming release-owned changelog edits unless sponsored, and validating normal plus tagged share workflows on upgrade/private repositories.

Do we have a high-confidence way to reproduce the issue?

Yes by source inspection: update --ref routes to ImportAt, which rewrites RepoPath to a temp directory before calling Import; Import enriches legacy manifests from opts.RepoPath and then stores that manifest. A focused test can reproduce it by stripping file_manifests from a tagged snapshot and checking stored import state before the next normal update.

Is this the best way to solve the issue?

No; the implementation is close, but historical imports should enrich from the original repo and resolved commit, or store an already-enriched manifest state. The release-owned changelog entries should also move to PR metadata unless a release owner explicitly wants them landed.

Full review comments:

  • [P2] Preserve git fingerprints for historical imports — internal/share/share.go:868
    When --ref targets a legacy snapshot without file_manifests, this passes a temp directory into Import; Import then cannot enrich the manifest from Git before MarkImported stores it. The next normal update loses the documented Git-object fallback and may replace every snapshot table instead of using changed-tail import.
    Confidence: 0.9
  • [P3] Drop the release-owned changelog entries — CHANGELOG.md:7-13
    This PR adds feature-release notes directly to CHANGELOG.md, but OpenClaw treats the changelog as release-owned during normal PR review. Keep the release-note context in the PR body or commit message unless a release owner intentionally sponsors these entries.
    Confidence: 0.84

Overall correctness: patch is incorrect
Overall confidence: 0.88

AGENTS.md: not found in the target repository.

Codex review notes: model internal, reasoning high; reviewed against 5857f2736f21.

Label changes

Label changes:

  • add P2: This is a normal-priority feature with a concrete historical-import regression and limited but real share workflow impact.
  • add merge-risk: 🚨 compatibility: Merging changes share Git fetch, push, ref, and dependency behavior that existing share repositories depend on.
  • add rating: 🦐 gold shrimp: Overall readiness is 🦐 gold shrimp; proof is 🌊 off-meta tidepool and patch quality is 🦐 gold shrimp.
  • add status: ⏳ waiting on author: ClawSweeper has contributor-facing work open and is waiting for author action. Not applicable: The contributor proof gate does not apply because GitHub reports author association COLLABORATOR; the PR body lists test/static validation but no separate live proof is required by this gate.
  • remove rating: 🐚 platinum hermit: Current PR rating is rating: 🦐 gold shrimp, so this older rating label is no longer current.
  • remove status: 👀 ready for maintainer look: Current PR status label is status: ⏳ waiting on author.

Label justifications:

  • P2: This is a normal-priority feature with a concrete historical-import regression and limited but real share workflow impact.
  • merge-risk: 🚨 compatibility: Merging changes share Git fetch, push, ref, and dependency behavior that existing share repositories depend on.
  • rating: 🦐 gold shrimp: Overall readiness is 🦐 gold shrimp; proof is 🌊 off-meta tidepool and patch quality is 🦐 gold shrimp.
  • status: ⏳ waiting on author: ClawSweeper has contributor-facing work open and is waiting for author action. Not applicable: The contributor proof gate does not apply because GitHub reports author association COLLABORATOR; the PR body lists test/static validation but no separate live proof is required by this gate.
Evidence reviewed

Acceptance criteria:

  • [P1] go test ./internal/share -run TestImportAt.
  • [P1] go test ./internal/cli -run TestShareCommandsPublishSubscribeAndUpdate.
  • [P1] go test ./...

What I checked:

  • Target policy check: No target repository AGENTS.md was present, and no .agents/maintainer-notes/ directory exists in this checkout. (5857f2736f21)
  • Current main lacks the exact feature: A current-main search found no share CLI/history symbols for publish --tag, update --ref, CreateImmutableTag, ImportAt, PushSnapshot, or ReadFileAt; only unrelated --tags release docs and media fetch hits appeared. (5857f2736f21)
  • PR adds new CLI surface: The PR head adds publish --tag and update --ref flags and routes historical restores through share.ImportAt. (internal/cli/share_commands.go:24, 16414d0d8d19)
  • Historical import path still has the fingerprint issue: ImportAt materializes the requested ref into a temp directory, rewrites historicalOpts.RepoPath to that temp directory, clears the remote/tag, and then calls the normal Import path. (internal/share/share.go:864, 16414d0d8d19)
  • Normal import enriches from opts.RepoPath before storing manifest state: Current main's Import calls enrichManifestFromGit(ctx, opts.RepoPath, "HEAD", manifest) and later stores that manifest via MarkImported, so passing a non-Git temp directory prevents legacy manifest enrichment. (internal/share/share.go:220, 5857f2736f21)
  • Existing review comment matches the current head: The Codex review thread on the latest head reports the same ImportAt legacy file_manifests issue at internal/share/share.go:868. (internal/share/share.go:868, 16414d0d8d19)

Likely related people:

  • steipete: Current-main blame attributes the share CLI/import/search surfaces to Peter Steinberger, and recent history includes the safe CrawlKit tail replacement plus CrawlKit dependency bumps in the same share area. (role: feature owner and recent area contributor; confidence: high; commits: 37a9338fdcdb, 7ba9fa595787, 831a12cc7b71; files: internal/share/share.go, internal/cli/share_commands.go, internal/store/query.go)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 28261ac667

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/share/share.go
Comment on lines +865 to +868
historicalOpts.RepoPath = tempDir
historicalOpts.Remote = ""
historicalOpts.Tag = ""
return Import(ctx, s, historicalOpts)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve git fingerprints for historical imports

When --ref targets a legacy snapshot that lacks file_manifests, this passes a temporary directory that is not a Git repo into Import. Import then calls enrichManifestFromGit against that temp dir, so it cannot add the git: blob fingerprints before MarkImported stores the manifest; the next normal update sees missing fingerprints and replaces every snapshot table instead of using the changed-tail incremental plan, defeating the documented Git-object fallback for large legacy archives. Enrich from the original repo/commit before importing or store the enriched manifest state.

Useful? React with 👍 / 👎.

@clawsweeper clawsweeper Bot added rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. feature: ✨ showcase ClawSweeper spotlight: unusually compelling feature idea for maintainer attention. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. labels Jun 19, 2026
@steipete steipete force-pushed the codex/crawlkit-history-migration branch from 28261ac to 16414d0 Compare June 19, 2026 11:55
@clawsweeper clawsweeper Bot added rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. P2 Normal priority bug or improvement with limited blast radius. merge-risk: 🚨 compatibility 🚨 Merging this PR could break existing users, config, migrations, defaults, or upgrades. and removed rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. labels Jun 19, 2026

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 16414d0d8d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/share/share.go
Comment on lines +821 to +822
if err := mirror.Fetch(ctx, mirrorOptions(opts)); err != nil {
return Manifest{}, err

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Try local historical refs before fetching

When update --ref names a tag or commit that is already present in the local share repo, this unconditional fetch makes the restore depend on origin/network availability before ReadFileAt can read the local object. In an offline or temporarily unreachable-origin environment, restoring a previously fetched durable snapshot tag fails even though the required Git objects are local and the checkout does not need to move. Consider resolving/reading the local ref first, or tolerating fetch failure when the requested ref exists locally.

Useful? React with 👍 / 👎.

@steipete steipete merged commit 758334e into main Jun 19, 2026
10 checks passed
@steipete steipete deleted the codex/crawlkit-history-migration branch June 19, 2026 12:11
@steipete

Copy link
Copy Markdown
Collaborator Author

Landed in 758334ea487a2482a46d51b63bef476042a8f45b.

Proof: full Go tests, race tests, vet, staticcheck, deadcode, gofumpt, gosec, and autoreview passed locally. GitHub test, lint, deps, Docker, release-check, CodeQL, and secret scans all passed before merge.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 26a9eb17cd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +97 to +99
createdTag, err := share.CreateImmutableTag(r.ctx, opts)
if err != nil {
return err

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preflight tag collisions before committing

When publish --tag reuses a local or fetched remote tag that points to a different commit, this is the first check that rejects the collision, but Export and Commit have already created a new snapshot commit. The command then fails while leaving an untagged local archive commit behind, which a later publish/push can send to the branch without the requested immutable tag; check the existing tag target before mutating or committing the snapshot.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature: ✨ showcase ClawSweeper spotlight: unusually compelling feature idea for maintainer attention. merge-risk: 🚨 compatibility 🚨 Merging this PR could break existing users, config, migrations, defaults, or upgrades. P2 Normal priority bug or improvement with limited blast radius. rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant