Write tombstones on delete so SSTable data is correctly masked#104
Write tombstones on delete so SSTable data is correctly masked#104
Conversation
delete() and delete_ns() previously only removed the key from the in-memory memtable, leaving any value written to an SSTable in a prior flush permanently visible to subsequent reads. Fix by writing a timestamped tombstone record (the 8-byte timestamp followed by the TOMBSTONE sentinel) through insert_internal, which persists the deletion to the WAL and, on the next flush, to an SSTable so it can mask older versions of the key on disk. Update get() and scan_ns() to recognise tombstone values and return None / exclude them from results. Also filter tombstones from scan() so the public API does not expose deletion markers to callers. https://claude.ai/code/session_011eyXTLxHcSEAeV6PRG3izJ
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: db29c4b9ba
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const TOMBSTONE: &[u8] = b"\x00TOMBSTONE"; | ||
|
|
||
| fn is_tombstone(val: &[u8]) -> bool { | ||
| val.len() == 8 + TOMBSTONE.len() && &val[8..] == TOMBSTONE |
There was a problem hiding this comment.
Distinguish tombstones from legitimate payload bytes
is_tombstone identifies deletions solely by checking whether the stored value bytes equal timestamp + "\x00TOMBSTONE", which means any client that legitimately writes that exact binary payload (e.g., via insert/insert_ts with raw bytes) will have its data treated as deleted by get, scan, and scan_ns. This introduces a silent correctness bug for binary values; tombstones need an encoding that cannot collide with user data.
Useful? React with 👍 / 👎.
delete() and delete_ns() previously only removed the key from the in-memory memtable, leaving any value written to an SSTable in a prior flush permanently visible to subsequent reads.
Fix by writing a timestamped tombstone record (the 8-byte timestamp followed by the TOMBSTONE sentinel) through insert_internal, which persists the deletion to the WAL and, on the next flush, to an SSTable so it can mask older versions of the key on disk.
Update get() and scan_ns() to recognise tombstone values and return None / exclude them from results. Also filter tombstones from scan() so the public API does not expose deletion markers to callers.
https://claude.ai/code/session_011eyXTLxHcSEAeV6PRG3izJ