Setup caveat
I originally noticed this on a workstation where only parts of the common
git dir are mounted into the hub worktree, which produces split-brain states
in .crosslink/.hub-cache/. The mechanism that put SQLite ahead of the JSON
files in that case was almost certainly an interaction between my custom
mount setup and crosslink's hub-cache, not a defect in crosslink itself.
This report is therefore a structural observation, not triage of an
upstream-caused incident: the --repair operation's clear-then-rebuild
semantics drop any SQLite row that lacks a JSON counterpart, regardless of
how the divergence was reached. Filing because the structural shape is
worth flagging on its own; the fitness-for-purpose framing should be
crosslink's call, not mine.
Summary
crosslink integrity hydration --repair clears SQLite and rehydrates from
the on-disk JSON files. The command does not inspect whether SQLite holds
rows that the JSON does not; any such rows are dropped when SQLite is
cleared, and the rehydration cannot restore them because they are not in
the source it reads from. No warning before the operation runs, no record
of what was deleted.
Offending code
check_hydration — the repair branch calls db.clear_shared_data() and then
hydrate_to_sqlite(&cache_dir, db):
|
fn check_hydration(crosslink_dir: &Path, db: &Database, repair: bool) -> Result<CheckResult> { |
|
let cache_dir = crosslink_dir.join(HUB_CACHE_DIR); |
|
if !cache_dir.exists() { |
|
return Ok(CheckResult { |
|
name: "hydration".to_string(), |
|
status: CheckStatus::Skipped("sync not configured".to_string()), |
|
}); |
|
} |
|
|
|
let issues_dir = cache_dir.join("issues"); |
|
let json_issues = read_all_issue_files(&issues_dir)?; |
|
let json_issue_count = json_issues |
|
.iter() |
|
.filter(|i| i.display_id.is_some()) |
|
.count() as i64; |
|
let db_issue_count = db.get_issue_count()?; |
|
|
|
// Count milestones: per-file first, fall back to legacy single-file |
|
let milestones_dir = cache_dir.join("meta").join("milestones"); |
|
let json_milestone_entries = read_all_milestone_files(&milestones_dir)?; |
|
let json_milestone_count = if json_milestone_entries.is_empty() { |
|
let legacy_path = cache_dir.join("meta").join("milestones.json"); |
|
let legacy = read_milestones_file(&legacy_path)?; |
|
legacy.milestones.len() as i64 |
|
} else { |
|
json_milestone_entries.len() as i64 |
|
}; |
|
let db_milestone_count = db.get_milestone_count()?; |
|
|
|
let issues_ok = json_issue_count == db_issue_count; |
|
let milestones_ok = json_milestone_count == db_milestone_count; |
|
|
|
if issues_ok && milestones_ok { |
|
return Ok(CheckResult { |
|
name: "hydration".to_string(), |
|
status: CheckStatus::Pass, |
|
}); |
|
} |
|
|
|
let mut issues = Vec::new(); |
|
if !issues_ok { |
|
issues.push(format!( |
|
"{json_issue_count} issues in JSON, {db_issue_count} in SQLite" |
|
)); |
|
} |
|
if !milestones_ok { |
|
issues.push(format!( |
|
"{json_milestone_count} milestones in JSON, {db_milestone_count} in SQLite" |
|
)); |
|
} |
|
let details = issues.join("; "); |
|
|
|
if !repair { |
|
return Ok(CheckResult { |
|
name: "hydration".to_string(), |
|
status: CheckStatus::Fail(details), |
|
}); |
|
} |
|
|
|
db.clear_shared_data()?; |
|
let stats = hydrate_to_sqlite(&cache_dir, db)?; |
|
|
|
Ok(CheckResult { |
|
name: "hydration".to_string(), |
|
status: CheckStatus::Repaired(format!( |
|
"re-hydrated {} issues, {} comments", |
|
stats.issues, stats.comments |
|
)), |
|
}) |
|
} |
The specific clear-then-rehydrate pair is lines 237-238:
db.clear_shared_data()?;
let stats = hydrate_to_sqlite(&cache_dir, db)?;
The drift-detection logic immediately above (lines 207-215) compares only
counts of issues and milestones; a content-level disagreement (SQLite has
issue #24 blocked by #23, JSON does not) does not register as drift, so the
"pass" branch fires and the repair never runs. Conversely, when counts do
disagree the repair runs unconditionally — without inspecting whether SQLite
or JSON is the side with the extra rows.
Reproducer
The destructive structure can be exercised by constructing SQLite-only state
directly, since I don't have a public-CLI path that produces it on current
main:
crosslink issue create "first" -q # → L1
crosslink issue create "second" -q # → L2
# Insert a block directly into the SQLite store, bypassing the JSON write:
sqlite3 .crosslink/issues.db \
"INSERT INTO blockers (issue_id, blocker_id) VALUES (2, 1);"
crosslink issue show L2 | grep -i blocked # → Blocked by: L1
crosslink integrity hydration --repair # → "re-hydrated 2 issues, 0 comments"
crosslink issue show L2 | grep -i blocked # → Blocked by: (none)
The repair runs without emitting a warning about the row it deleted, and the
deleted row is unrecoverable from crosslink's own state.
Observed behavior
- The repair is asymmetric: JSON wins over SQLite. The command name doesn't
communicate that asymmetry.
- The drift-detection step only compares counts, so content drift isn't
surfaced when running integrity hydration without --repair. There's no
built-in way to inspect what the repair would do before running it.
- There is no audit trail of deletions. After
--repair, nothing records
what was removed.
Suggested fixes
If the maintainers decide this is worth changing, in priority order:
-
Detect drift by content, not just count. Compare JSON-derived state to
SQLite-derived state at the row level. If SQLite has rows that JSON does
not, surface that as a distinct condition ("SQLite has unrepresented
state"), separate from the symmetric count-mismatch case.
-
Default --repair to non-destructive. When SQLite contains rows that
JSON does not, refuse to run without an explicit --accept-data-loss (or
similar) flag. The current behavior deletes by default.
-
Re-emit in the other direction when possible. When SQLite has rows
JSON doesn't, write them back to the JSON files (and the git log) instead
of deleting them.
-
Snapshot SQLite before clearing. Even when destructive repair is what
the user wants, dropping the previous state to
.crosslink/integrity/hydration-backup-<ts>.sqlite makes deletions
recoverable.
Related
Setup caveat
I originally noticed this on a workstation where only parts of the common
git dir are mounted into the hub worktree, which produces split-brain states
in
.crosslink/.hub-cache/. The mechanism that put SQLite ahead of the JSONfiles in that case was almost certainly an interaction between my custom
mount setup and crosslink's hub-cache, not a defect in crosslink itself.
This report is therefore a structural observation, not triage of an
upstream-caused incident: the
--repairoperation's clear-then-rebuildsemantics drop any SQLite row that lacks a JSON counterpart, regardless of
how the divergence was reached. Filing because the structural shape is
worth flagging on its own; the fitness-for-purpose framing should be
crosslink's call, not mine.
Summary
crosslink integrity hydration --repairclears SQLite and rehydrates fromthe on-disk JSON files. The command does not inspect whether SQLite holds
rows that the JSON does not; any such rows are dropped when SQLite is
cleared, and the rehydration cannot restore them because they are not in
the source it reads from. No warning before the operation runs, no record
of what was deleted.
Offending code
check_hydration— the repair branch callsdb.clear_shared_data()and thenhydrate_to_sqlite(&cache_dir, db):crosslink/crosslink/src/commands/integrity_cmd.rs
Lines 178 to 247 in 12eb7b9
The specific clear-then-rehydrate pair is lines 237-238:
The drift-detection logic immediately above (lines 207-215) compares only
counts of issues and milestones; a content-level disagreement (SQLite has
issue #24 blocked by #23, JSON does not) does not register as drift, so the
"pass" branch fires and the repair never runs. Conversely, when counts do
disagree the repair runs unconditionally — without inspecting whether SQLite
or JSON is the side with the extra rows.
Reproducer
The destructive structure can be exercised by constructing SQLite-only state
directly, since I don't have a public-CLI path that produces it on current
main:The repair runs without emitting a warning about the row it deleted, and the
deleted row is unrecoverable from crosslink's own state.
Observed behavior
communicate that asymmetry.
surfaced when running
integrity hydrationwithout--repair. There's nobuilt-in way to inspect what the repair would do before running it.
--repair, nothing recordswhat was removed.
Suggested fixes
If the maintainers decide this is worth changing, in priority order:
Detect drift by content, not just count. Compare JSON-derived state to
SQLite-derived state at the row level. If SQLite has rows that JSON does
not, surface that as a distinct condition ("SQLite has unrepresented
state"), separate from the symmetric count-mismatch case.
Default
--repairto non-destructive. When SQLite contains rows thatJSON does not, refuse to run without an explicit
--accept-data-loss(orsimilar) flag. The current behavior deletes by default.
Re-emit in the other direction when possible. When SQLite has rows
JSON doesn't, write them back to the JSON files (and the git log) instead
of deleting them.
Snapshot SQLite before clearing. Even when destructive repair is what
the user wants, dropping the previous state to
.crosslink/integrity/hydration-backup-<ts>.sqlitemakes deletionsrecoverable.
Related
crosslink issue blockexits non-zero when the blocker is already set #600 andgit_commit_in_cache_with_argsreports empty error messages for the most common failure mode #601 are bugs in the same area of the code (theSharedWritergit-cache write path). I have not established a path by which either
directly produces SQLite-only drift on current
main.add_blockerflow is not transactional across JSON / git / SQLite #604 —add_blockerflow is not transactional across JSON / git / SQLite.Structurally orthogonal to this report (that one is about how drift is
produced; this one is about how drift is destroyed), but the two together
describe the round trip from "writer cannot guarantee consistency" to
"integrity check silently deletes the surviving side".