`crosslink integrity hydration --repair` deletes SQLite rows when the JSON files do not have them

## Setup caveat

I originally noticed this on a workstation where only parts of the common
git dir are mounted into the hub worktree, which produces split-brain states
in `.crosslink/.hub-cache/`. The mechanism that put SQLite ahead of the JSON
files in that case was almost certainly an interaction between my custom
mount setup and crosslink's hub-cache, not a defect in crosslink itself.

This report is therefore a structural observation, not triage of an
upstream-caused incident: the `--repair` operation's clear-then-rebuild
semantics drop any SQLite row that lacks a JSON counterpart, regardless of
how the divergence was reached. Filing because the structural shape is
worth flagging on its own; the fitness-for-purpose framing should be
crosslink's call, not mine.

## Summary

`crosslink integrity hydration --repair` clears SQLite and rehydrates from
the on-disk JSON files. The command does not inspect whether SQLite holds
rows that the JSON does not; any such rows are dropped when SQLite is
cleared, and the rehydration cannot restore them because they are not in
the source it reads from. No warning before the operation runs, no record
of what was deleted.

## Offending code

`check_hydration` — the repair branch calls `db.clear_shared_data()` and then
`hydrate_to_sqlite(&cache_dir, db)`:

https://github.com/forecast-bio/crosslink/blob/12eb7b917e9ef726f40eb2f9b36cf87fa38efa4d/crosslink/src/commands/integrity_cmd.rs#L178-L247

The specific clear-then-rehydrate pair is lines 237-238:

```rust
db.clear_shared_data()?;
let stats = hydrate_to_sqlite(&cache_dir, db)?;
```

The drift-detection logic immediately above (lines 207-215) compares only
counts of issues and milestones; a content-level disagreement (SQLite has
issue #24 blocked by #23, JSON does not) does not register as drift, so the
"pass" branch fires and the repair never runs. Conversely, when counts do
disagree the repair runs unconditionally — without inspecting whether SQLite
or JSON is the side with the extra rows.

## Reproducer

The destructive structure can be exercised by constructing SQLite-only state
directly, since I don't have a public-CLI path that produces it on current
`main`:

```sh
crosslink issue create "first"  -q                # → L1
crosslink issue create "second" -q                # → L2
# Insert a block directly into the SQLite store, bypassing the JSON write:
sqlite3 .crosslink/issues.db \
  "INSERT INTO blockers (issue_id, blocker_id) VALUES (2, 1);"

crosslink issue show L2 | grep -i blocked          # → Blocked by: L1
crosslink integrity hydration --repair             # → "re-hydrated 2 issues, 0 comments"
crosslink issue show L2 | grep -i blocked          # → Blocked by: (none)
```

The repair runs without emitting a warning about the row it deleted, and the
deleted row is unrecoverable from crosslink's own state.

## Observed behavior

- The repair is asymmetric: JSON wins over SQLite. The command name doesn't
  communicate that asymmetry.
- The drift-detection step only compares counts, so content drift isn't
  surfaced when running `integrity hydration` without `--repair`. There's no
  built-in way to inspect what the repair *would* do before running it.
- There is no audit trail of deletions. After `--repair`, nothing records
  what was removed.

## Suggested fixes

If the maintainers decide this is worth changing, in priority order:

1. **Detect drift by content, not just count.** Compare JSON-derived state to
   SQLite-derived state at the row level. If SQLite has rows that JSON does
   not, surface that as a distinct condition ("SQLite has unrepresented
   state"), separate from the symmetric count-mismatch case.

2. **Default `--repair` to non-destructive.** When SQLite contains rows that
   JSON does not, refuse to run without an explicit `--accept-data-loss` (or
   similar) flag. The current behavior deletes by default.

3. **Re-emit in the other direction when possible.** When SQLite has rows
   JSON doesn't, write them back to the JSON files (and the git log) instead
   of deleting them.

4. **Snapshot SQLite before clearing.** Even when destructive repair is what
   the user wants, dropping the previous state to
   `.crosslink/integrity/hydration-backup-<ts>.sqlite` makes deletions
   recoverable.

## Related

- #600 and #601 are bugs in the same area of the code (the `SharedWriter`
  git-cache write path). I have not established a path by which either
  directly produces SQLite-only drift on current `main`.
- #604 — `add_blocker` flow is not transactional across JSON / git / SQLite.
  Structurally orthogonal to this report (that one is about how drift is
  produced; this one is about how drift is destroyed), but the two together
  describe the round trip from "writer cannot guarantee consistency" to
  "integrity check silently deletes the surviving side".


	fn check_hydration(crosslink_dir: &Path, db: &Database, repair: bool) -> Result<CheckResult> {
	let cache_dir = crosslink_dir.join(HUB_CACHE_DIR);
	if !cache_dir.exists() {
	return Ok(CheckResult {
	name: "hydration".to_string(),
	status: CheckStatus::Skipped("sync not configured".to_string()),
	});
	}

	let issues_dir = cache_dir.join("issues");
	let json_issues = read_all_issue_files(&issues_dir)?;
	let json_issue_count = json_issues
	.iter()
	.filter(\|i\| i.display_id.is_some())
	.count() as i64;
	let db_issue_count = db.get_issue_count()?;

	// Count milestones: per-file first, fall back to legacy single-file
	let milestones_dir = cache_dir.join("meta").join("milestones");
	let json_milestone_entries = read_all_milestone_files(&milestones_dir)?;
	let json_milestone_count = if json_milestone_entries.is_empty() {
	let legacy_path = cache_dir.join("meta").join("milestones.json");
	let legacy = read_milestones_file(&legacy_path)?;
	legacy.milestones.len() as i64
	} else {
	json_milestone_entries.len() as i64
	};
	let db_milestone_count = db.get_milestone_count()?;

	let issues_ok = json_issue_count == db_issue_count;
	let milestones_ok = json_milestone_count == db_milestone_count;

	if issues_ok && milestones_ok {
	return Ok(CheckResult {
	name: "hydration".to_string(),
	status: CheckStatus::Pass,
	});
	}

	let mut issues = Vec::new();
	if !issues_ok {
	issues.push(format!(
	"{json_issue_count} issues in JSON, {db_issue_count} in SQLite"
	));
	}
	if !milestones_ok {
	issues.push(format!(
	"{json_milestone_count} milestones in JSON, {db_milestone_count} in SQLite"
	));
	}
	let details = issues.join("; ");

	if !repair {
	return Ok(CheckResult {
	name: "hydration".to_string(),
	status: CheckStatus::Fail(details),
	});
	}

	db.clear_shared_data()?;
	let stats = hydrate_to_sqlite(&cache_dir, db)?;

	Ok(CheckResult {
	name: "hydration".to_string(),
	status: CheckStatus::Repaired(format!(
	"re-hydrated {} issues, {} comments",
	stats.issues, stats.comments
	)),
	})
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`crosslink integrity hydration --repair` deletes SQLite rows when the JSON files do not have them #602

Setup caveat

Summary

Offending code

Reproducer

Observed behavior

Suggested fixes

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

crosslink integrity hydration --repair deletes SQLite rows when the JSON files do not have them #602

Description

Setup caveat

Summary

Offending code

Reproducer

Observed behavior

Suggested fixes

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

`crosslink integrity hydration --repair` deletes SQLite rows when the JSON files do not have them #602