Skip to content

Propagate WAL write failures instead of silently discarding them#105

Merged
mweiden merged 3 commits intomainfrom
claude/fix-wal-propagate-MvFam
Feb 28, 2026
Merged

Propagate WAL write failures instead of silently discarding them#105
mweiden merged 3 commits intomainfrom
claude/fix-wal-propagate-MvFam

Conversation

@mweiden
Copy link
Copy Markdown
Owner

@mweiden mweiden commented Feb 20, 2026

insert_internal() used let _ = self.wal.append(...) which caused the key to be written to the memtable even when the WAL failed, making recovery after a crash impossible for those writes.

Change insert_internal, insert_ts, insert, insert_ns_ts, and insert_ns to return std::io::Result<()> and return an Err immediately on WAL failure so the data is never inserted into the memtable.

Update callers:

  • exec_insert, exec_update, exec_delete (query.rs): propagate as QueryError::Other so the client receives an explicit error.
  • register_table, save_schema (query.rs): log at ERROR level since these helpers currently have no return type.
  • lwt_commit (cluster.rs): log at ERROR level since it is fire-and-forget.

https://claude.ai/code/session_011eyXTLxHcSEAeV6PRG3izJ

insert_internal() used `let _ = self.wal.append(...)` which caused
the key to be written to the memtable even when the WAL failed, making
recovery after a crash impossible for those writes.

Change insert_internal, insert_ts, insert, insert_ns_ts, and insert_ns
to return std::io::Result<()> and return an Err immediately on WAL
failure so the data is never inserted into the memtable.

Update callers:
- exec_insert, exec_update, exec_delete (query.rs): propagate as
  QueryError::Other so the client receives an explicit error.
- register_table, save_schema (query.rs): log at ERROR level since
  these helpers currently have no return type.
- lwt_commit (cluster.rs): log at ERROR level since it is fire-and-forget.

https://claude.ai/code/session_011eyXTLxHcSEAeV6PRG3izJ
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6618d9b09f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/cluster.rs Outdated
Comment thread src/query.rs Outdated
The previous fix logged the error but still cleared the slot, so a
chosen LWT value was permanently dropped on this replica whenever
the WAL append failed (e.g. I/O error in sync mode). The coordinator
could still report success because it achieved quorum, leaving this
replica silently diverged with no retry path.

Fix by only removing the composite key from the Paxos slot map when
the WAL write succeeds. On failure the slot is kept in memory so that
the next prepare round from any coordinator can re-commit the chosen
value through the normal Paxos flow.

https://claude.ai/code/session_011eyXTLxHcSEAeV6PRG3izJ
Both helpers previously swallowed write errors: after the fail-fast
insert_internal change, a WAL failure would leave _tables/_schemas
unpopulated while CREATE TABLE still returned success, producing a
false-positive DDL acknowledgement and broken metadata state.

Change register_table and save_schema to return std::io::Result<()>
and propagate WAL failures to their callers. The CreateTable branch
in exec_statement now maps those errors to QueryError::Other so the
client receives an explicit failure instead of a silent lie.

https://claude.ai/code/session_011eyXTLxHcSEAeV6PRG3izJ
@mweiden mweiden merged commit 82f54e8 into main Feb 28, 2026
1 check passed
@mweiden mweiden deleted the claude/fix-wal-propagate-MvFam branch February 28, 2026 05:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants