Skip to content

Bug: Growing-value overwrite causes spurious OCC abort in MassTrans.hh #56

@SoumojitDalui

Description

@SoumojitDalui

Summary

Overwriting an existing key with a larger value causes a spurious OCC transaction abort (-ERR backend). Shrinking and same-size overwrites always succeed. This affects both auto-committed SET operations and MULTI/EXEC transactions.

Reproduction

SET key "A"        → OK (creates key)
SET key "BBBBB"    → FAIL: -ERR backend (value grew from 1 to 5 bytes)
SET key "C"        → OK (shrinking back to 1 byte works)

Characterization

Tested all 64 combinations of from→to sizes {1, 2, 5, 10, 20, 50, 100, 500}:

  • Growing transitions: 24/28 fail (all cases where the encoded value crosses a size/allocation bucket boundary)
  • Shrinking transitions: 0/28 fail (always succeeds)
  • Same-size transitions: 0/8 fail (always succeeds)

Size transition matrix (. = ok, X = OCC abort):

     to:    1     2     5    10    20    50   100   500
   1:  .     .     X     X     X     X     X     X
   2:  .     .     X     X     X     X     X     X
   5:  .     .     .     .     .     X     X     X
  10:  .     .     .     .     .     X     X     X
  20:  .     .     .     .     .     X     X     X
  50:  .     .     .     .     .     .     X     X
 100:  .     .     .     .     .     .     .     X
 500:  .     .     .     .     .     .     .     .

Values within the same allocation bucket can be overwritten freely. Values that cross a bucket boundary always fail.

Root Cause

File: src/mako/benchmarks/sto/MassTrans.hh

The bug is a stale TransItem left in the OCC transaction set after a value resize. The sequence is:

Step 1 — handlePutFound() (line ~707-741):

item = t_item(e)               // TransItem keyed by OLD location 'e'
Version v = e->version()        // Read OLD location's version
item.observe(tversion_type(v))  // Record version as read observation  <-- BUG: too early
reallyHandlePutFound(item, e, key, value)

Step 2 — reallyHandlePutFound() (line ~651-702):

needsResize = e->needsResize(value)  // TRUE for growing values
e->version() |= invalid_bit          // Mark OLD location INVALID
new_location = e->resizeIfNeeded(value)  // Allocate NEW location
lp.value() = new_location             // Update Masstree leaf pointer
e->deallocate_rcu(...)                 // Schedule OLD location for freeing
item = Sto::new_item(this, new_location)  // Create NEW TransItem

Step 3 — At commit, check() is called for the OLD TransItem:

auto e = item.key<versioned_value*>()  // Gets OLD 'e' pointer
validityCheck(item, e)                   // Checks e->version()

validityCheck() returns false because e->version() has invalid_bit set (from the resize), and this is an update (not an insert). Transaction aborts.

Why same-size overwrites work: needsResize() returns false, so the entire resize block is skipped. No new location is created, no invalid bit is set, and the single TransItem validates cleanly.

Impact

Any application that overwrites keys with growing values will experience transaction aborts:

  • Counters ("9" → "10" crosses a boundary)
  • JSON documents (size varies)
  • Append-like operations (value grows over time)

Note on Origin

MassTrans.hh is part of the Sto (Software Transactional Objects) wrapper layer originally from readablesystems/sto. This bug may also exist in upstream Sto.

Suggested Fix

Move the observe() call from before reallyHandlePutFound() to after it. This ensures the observation reads the version of whichever location item points to after a potential resize.

-    // make sure this item doesn't get deleted
-    if (!item.has_read() && !has_insert(item))
-    {
-      Version v = e->version();
-      fence();
-      item.observe(tversion_type(v));
-    }
     if (SET) {
       reallyHandlePutFound(item, e, key, value);
     }
+    // Observe AFTER reallyHandlePutFound. If resize occurred, item now
+    // points to new_location. Old TransItem has no read → skipped at commit.
+    if (!item.has_read() && !has_insert(item))
+    {
+      auto current_e = item.item().template key<versioned_value*>();
+      Version v = current_e->version();
+      fence();
+      item.observe(tversion_type(v));
+    }

Why this works:

  • No resize: item still points to original e → observes e->version() (unchanged behavior)
  • Resize: item was reassigned to new location → observes new_location->version() (correct)
  • Old TransItem (keyed by invalidated e) has no has_read() → skipped by commit validation

A reference implementation of this fix exists in commit cd4b90ee on mako-dev.

Test Environment

  • Server: build/makoCon (Redis-compatible Mako server)
  • Storage: In-memory Masstree (no RocksDB persistence)
  • Client: Python 3.10.12 with redis-py 7.1.0
  • Host: Linux 5.15.0-133-generic (x86_64)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions