Add tamper-proof hash chain to audit trail#6
Conversation
Each message stores a SHA-256 hash chained to the previous message, making modifications, deletions, and reordering detectable. Includes VerifyChain() for offline integrity verification. Works with both plaintext and encrypted stores, and survives proxy restarts. Closes #2.
govulncheck flagged 16 CVEs in Go 1.22 stdlib (crypto/tls, net/url, os/exec, etc). All fixed in Go 1.24+.
There was a problem hiding this comment.
Pull request overview
Adds a tamper-evident integrity mechanism to the audit messages log by introducing a SHA-256 hash chain, along with verification and tests, and bumps the Go toolchain version used in CI.
Changes:
- Extend
messagesschema withsequence,prev_hash, andhash, compute chained hashes on insert, and addVerifyChain()to validate integrity (including decrypt-then-verify when encryption is enabled). - Add unit tests covering valid chains, tampering, deletion, empty DB, encryption, and restart behavior.
- Update documentation and CI/toolchain Go version to
1.24.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| internal/audit/store.go | Adds message hash-chain fields/state, hash computation, and VerifyChain() implementation. |
| internal/audit/store_test.go | Adds tests for hash chain verification and restart continuity. |
| README.md | Documents the new hash-chain integrity feature and updates limitations. |
| go.mod | Bumps module Go version to 1.24. |
| .github/workflows/ci.yml | Updates CI to use Go 1.24. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // computeHash creates a SHA-256 hash of the message content chained with the previous hash. | ||
| func computeHash(id, sessionID, direction, raw, prevHash string) string { | ||
| h := sha256.New() | ||
| h.Write([]byte(prevHash)) | ||
| h.Write([]byte(id)) | ||
| h.Write([]byte(sessionID)) | ||
| h.Write([]byte(direction)) | ||
| h.Write([]byte(raw)) | ||
| return hex.EncodeToString(h.Sum(nil)) | ||
| } |
There was a problem hiding this comment.
computeHash concatenates variable-length fields without separators/length-prefixing, so different tuples of (sessionID,direction,raw,...) can produce the same byte stream without needing a SHA-256 collision (e.g., boundary-shifting attacks). Also, the hash currently omits timestamp, jsonrpc_id, method, and sequence, so tampering with those columns would not be detected. Use an unambiguous encoding (length-prefixed/binary or canonical JSON of a struct) and include all fields whose integrity you want to protect.
There was a problem hiding this comment.
Fixed — now uses length-prefixed encoding (8-byte big-endian length prefix per field) and includes all fields: jsonrpc_id, method, sequence in addition to the existing ones.
| sequence INTEGER, | ||
| prev_hash TEXT NOT NULL DEFAULT '', | ||
| hash TEXT NOT NULL DEFAULT '' | ||
| ); | ||
|
|
||
| CREATE INDEX IF NOT EXISTS idx_messages_session ON messages(session_id, timestamp); |
There was a problem hiding this comment.
sequence is nullable and has no uniqueness guarantee, but VerifyChain() relies on ORDER BY sequence and scans into a non-nullable int. A NULL/duplicate sequence (possible during migration or manual DB edits) will either error or make ordering ambiguous. Consider making sequence INTEGER NOT NULL with a UNIQUE constraint (or primary key) and add an index on it for efficient startup resume and verification.
| sequence INTEGER, | |
| prev_hash TEXT NOT NULL DEFAULT '', | |
| hash TEXT NOT NULL DEFAULT '' | |
| ); | |
| CREATE INDEX IF NOT EXISTS idx_messages_session ON messages(session_id, timestamp); | |
| sequence INTEGER NOT NULL, | |
| prev_hash TEXT NOT NULL DEFAULT '', | |
| hash TEXT NOT NULL DEFAULT '' | |
| ); | |
| CREATE INDEX IF NOT EXISTS idx_messages_session ON messages(session_id, timestamp); | |
| CREATE UNIQUE INDEX IF NOT EXISTS idx_messages_session_sequence ON messages(session_id, sequence); |
There was a problem hiding this comment.
Fixed — sequence is now NOT NULL with a UNIQUE index. Also added a schema migration (ALTER TABLE) for existing databases.
| s.mu.Lock() | ||
| defer s.mu.Unlock() | ||
|
|
||
| s.sequence++ | ||
| prevHash := s.lastHash | ||
| hash := computeHash(id, sessionID, direction, stored, prevHash) | ||
| s.lastHash = hash | ||
|
|
||
| _, err := s.db.Exec( | ||
| "INSERT INTO messages (id, session_id, direction, timestamp, jsonrpc_id, method, raw) VALUES (?, ?, ?, ?, ?, ?, ?)", | ||
| "INSERT INTO messages (id, session_id, direction, timestamp, jsonrpc_id, method, raw, sequence, prev_hash, hash) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)", | ||
| id, sessionID, direction, time.Now().UTC(), nullIfEmpty(jsonrpcID), nullIfEmpty(method), s.encrypt(stored), | ||
| s.sequence, prevHash, hash, | ||
| ) |
There was a problem hiding this comment.
Hash chaining state (s.lastHash/s.sequence) is maintained only in-process. If two processes (or two Store instances) write to the same SQLite DB concurrently, they can both resume the same last hash/sequence and generate conflicting sequence/prev_hash/hash, breaking the chain. To make this robust, compute the next sequence + previous hash inside a single DB transaction with an appropriate write lock (e.g., BEGIN IMMEDIATE) and derive prev_hash from the last committed row.
There was a problem hiding this comment.
Acknowledged. By design each proxy instance owns its own Store — concurrent multi-process writes to the same DB isn't a supported use case. The UNIQUE constraint on sequence will catch any accidental collision. Adding a comment to document this assumption.
| // decrypt decrypts a string if it looks encrypted. Returns as-is otherwise. | ||
| func (s *Store) decrypt(ciphertext string) string { | ||
| if s.enc == nil { | ||
| return ciphertext | ||
| } | ||
| decrypted, err := s.enc.Decrypt(ciphertext) | ||
| if err != nil { | ||
| return ciphertext // not encrypted or wrong key | ||
| } | ||
| return decrypted |
There was a problem hiding this comment.
decrypt() silently returns the ciphertext on decrypt errors (including wrong key), which makes VerifyChain() report a generic "hash mismatch" rather than a clear decryption/key problem. This can be misleading during offline verification for encrypted DBs. Consider having VerifyChain() treat decrypt failures as an explicit verification error (or include a distinct status/error message for decryption failure) when s.enc != nil.
There was a problem hiding this comment.
Fixed — VerifyChain now uses a separate decryptVerify() that returns errors explicitly. Decrypt failures now produce a clear 'decryption failed (wrong key or corrupted data)' status instead of a misleading hash mismatch.
| - **Minimal stderr logging**: error messages do not leak paths, session IDs, or message contents | ||
| - **Encryption at rest**: sensitive columns (`raw`, `arguments`, `result`, `error`) are encrypted with AES-256-GCM when an encryption key is provided. Enable by setting the `BEACON_ENCRYPTION_KEY` environment variable. Without a key, data is stored in plaintext (redaction still applies). | ||
| - **Parameterized SQL**: all database queries use parameterized statements — no SQL injection risk | ||
| - **Tamper-proof hash chain**: each audit message includes a SHA-256 hash chained to the previous message. Any modification, deletion, or reordering of records is detectable. Verify with `VerifyChain()` — works offline against the DB file. |
There was a problem hiding this comment.
README calls this a "Tamper-proof hash chain", but the implementation is only tamper-evident and (as noted below) cannot prevent an attacker with DB access from rewriting the entire chain. To avoid overstating guarantees, rename this to "tamper-evident" (or similar) and ensure the description matches the actual threat model.
| - **Tamper-proof hash chain**: each audit message includes a SHA-256 hash chained to the previous message. Any modification, deletion, or reordering of records is detectable. Verify with `VerifyChain()` — works offline against the DB file. | |
| - **Tamper-evident hash chain**: each audit message includes a SHA-256 hash chained to the previous message. Any modification, deletion, or reordering of existing records is detectable. Verify with `VerifyChain()` — works offline against the DB file. |
There was a problem hiding this comment.
Good catch — renamed to 'tamper-evident' throughout.
| sessionID, _ := store.CreateSession("test", "cat") | ||
| store.LogMessage(sessionID, "client_to_server", "1", "initialize", `{"jsonrpc":"2.0"}`) | ||
| store.LogMessage(sessionID, "server_to_client", "1", "", `{"jsonrpc":"2.0","result":{}}`) | ||
| store.LogMessage(sessionID, "client_to_server", "2", "tools/call", `{"jsonrpc":"2.0"}`) | ||
|
|
There was a problem hiding this comment.
These new tests ignore errors from CreateSession and LogMessage. If an insert fails (e.g., due to a schema/migration issue), the test failure can become harder to diagnose or may not fail at the intended assertion. Please check and assert err for these calls in the new hash-chain tests for clearer failures.
There was a problem hiding this comment.
Fixed — all CreateSession and LogMessage calls now check and assert errors.
| // Resume hash chain from the most recent message (supports restarts) | ||
| var lastHash sql.NullString | ||
| var seq sql.NullInt64 | ||
| err = db.QueryRow("SELECT hash, sequence FROM messages ORDER BY sequence DESC LIMIT 1").Scan(&lastHash, &seq) | ||
| if err == nil { | ||
| store.lastHash = lastHash.String | ||
| store.sequence = seq.Int64 | ||
| } | ||
| // err == sql.ErrNoRows is fine — empty DB, chain starts fresh | ||
|
|
||
| return store, nil |
There was a problem hiding this comment.
The new resume query assumes the messages table already has sequence and hash columns, but the schema is applied via CREATE TABLE IF NOT EXISTS, which does not add columns on upgrade. On an existing DB, SELECT hash, sequence ... will fail (e.g. "no such column"), and Open() currently ignores that error—later LogMessage() will fail on insert. Add an explicit migration step (e.g., PRAGMA table_info(messages) + ALTER TABLE ... ADD COLUMN), and decide how to backfill/handle pre-chain rows so upgraded databases remain usable.
There was a problem hiding this comment.
Fixed — added migrateHashChain() that checks PRAGMA table_info(messages) and runs ALTER TABLE ADD COLUMN for missing columns. Pre-chain rows will have empty hashes (sequence=0, hash='') which is acceptable — the chain starts from the first post-migration message.
- Use length-prefixed encoding in computeHash to prevent boundary attacks - Include all fields (jsonrpc_id, method, sequence) in hash computation - Make sequence column NOT NULL with UNIQUE index - Add schema migration for existing DBs (ALTER TABLE for hash columns) - Surface decryption failures explicitly in VerifyChain - Rename "tamper-proof" to "tamper-evident" in README - Check all errors in tests
Summary
prev_hashandhash, linking it to the previous recordVerifyChain()method walks the chain and detects modifications, deletions, or reorderingCloses #2
Test plan
TestVerifyChain_ValidChain— intact chain passesTestVerifyChain_DetectsModifiedContent— tampered raw content detectedTestVerifyChain_DetectsDeletedMessage— deleted message breaks chain linkageTestVerifyChain_EmptyDB— empty DB is validTestVerifyChain_WithEncryption— works with encryption enabledTestHashChain_SurvivesRestart— chain continues correctly after close/reopen