Skip to content

CNDB-15669: implement mayHaveContent metadata flag#4

Open
lesnik2u wants to merge 9 commits into
blambov:CNDB-15669from
lesnik2u:CNDB-15669-mayHaveContent
Open

CNDB-15669: implement mayHaveContent metadata flag#4
lesnik2u wants to merge 9 commits into
blambov:CNDB-15669from
lesnik2u:CNDB-15669-mayHaveContent

Conversation

@lesnik2u

@lesnik2u lesnik2u commented Feb 18, 2026

Copy link
Copy Markdown

Introduces a metadata flag in encoded cursor positions to indicate whether a node potentially carries content. This allows the trie iteration infrastructure to skip expensive content resolution calls for nodes that are guaranteed to be structural.

Those are the results of 50 benchmarks averaged:

Benchmark average before:

Benchmark                                (BATCH)  (count)  (deletionPattern)  (deletionSpec)  (deletionsRatio)  (flush)     (memtableClass)  (partitions)  (threadCount)  (useNet)  Mode  Cnt  Score   Error  Units
ReadTestWidePartitions.readGreaterMatch     1000  1000000             RANDOM           EQUAL                 0    INMEM        TrieMemtable          1000              1     false  avgt   10  8.139 ± 0.230  ms/op
ReadTestWidePartitions.readGreaterMatch     1000  1000000             RANDOM           EQUAL                 0    INMEM        TrieMemtable             4              1     false  avgt   10  8.158 ± 0.399  ms/op
ReadTestWidePartitions.readGreaterMatch     1000  1000000             RANDOM           EQUAL                 0    INMEM  TrieMemtableStage2          1000              1     false  avgt   10  8.113 ± 0.832  ms/op
ReadTestWidePartitions.readGreaterMatch     1000  1000000             RANDOM           EQUAL                 0    INMEM  TrieMemtableStage2             4              1     false  avgt   10  7.804 ± 1.454  ms/op
ReadTestWidePartitions.readGreaterMatch     1000  1000000             RANDOM           EQUAL                 0    INMEM  TrieMemtableStage1          1000              1     false  avgt   10  7.568 ± 1.111  ms/op
ReadTestWidePartitions.readGreaterMatch     1000  1000000             RANDOM           EQUAL                 0    INMEM  TrieMemtableStage1             4              1     false  avgt   10  7.154 ± 0.371  ms/op

Benchmark average after:

ReadTestWidePartitions.readGreaterMatch     1000  1000000             RANDOM           EQUAL                 0    INMEM        TrieMemtable          1000              1     false  avgt   10  8.003 ± 0.313  ms/op
ReadTestWidePartitions.readGreaterMatch     1000  1000000             RANDOM           EQUAL                 0    INMEM        TrieMemtable             4              1     false  avgt   10  7.908 ± 0.643  ms/op
ReadTestWidePartitions.readGreaterMatch     1000  1000000             RANDOM           EQUAL                 0    INMEM  TrieMemtableStage2          1000              1     false  avgt   10  8.061 ± 1.140  ms/op
ReadTestWidePartitions.readGreaterMatch     1000  1000000             RANDOM           EQUAL                 0    INMEM  TrieMemtableStage2             4              1     false  avgt   10  7.601 ± 0.424  ms/op
ReadTestWidePartitions.readGreaterMatch     1000  1000000             RANDOM           EQUAL                 0    INMEM  TrieMemtableStage1          1000              1     false  avgt   10  7.458 ± 0.435  ms/op
ReadTestWidePartitions.readGreaterMatch     1000  1000000             RANDOM           EQUAL                 0    INMEM  TrieMemtableStage1             4              1     false  avgt   10  7.547 ± 0.373  ms/op

Flamegraphs for TrieMemtable seem to backup those results eg.
Before:
Screenshot 2026-02-18 at 14 33 33
After:
Screenshot 2026-02-18 at 14 36 39

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file was CLRF for some reason.. Lmk if I should change it back to CLRF
src/java/org/apache/cassandra/db/tries/DepthAdjustedCursor.java: ASCII text, with CRLF line terminators

@blambov blambov left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work.

// the head alone would not reflect.
currentPosition = pos;
applyToSelectedElementsInHeap((self, cursor, index) -> {
currentPosition |= cursor.encodedPosition();

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try using self.currentPosition to avoid capturing this in the closure.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

applyToSelectedElementsInHeap((self, cursor, index) -> {
currentPosition |= cursor.encodedPosition();
}, 0);
currentPosition = Cursor.unionFlags(pos, currentPosition, Cursor.FLAGS_MASK);

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be necessary as the position bits of all the selected cursors must match. Maybe replace with a comment?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, added.

/// Flag indicating whether this position may have content.
long MAY_HAVE_CONTENT_BIT = 1L;

/// reverse direction.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The beginning of this doc is missing.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added it back, removed by mistake when rebasing

depth(encodedPosition),
incomingTransition(encodedPosition),
isOnReturnPath(encodedPosition) ? "↑" : " ",
(encodedPosition & MAY_HAVE_CONTENT_BIT) != 0 ? "*" : " ",

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this "C" for content.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

state = c2 != null ? State.AT_BOTH : State.C1_ONLY;
currentPosition = c1.encodedPosition();
if (c2 != null)
currentPosition = Cursor.unionFlags(currentPosition, c2.encodedPosition(), Cursor.FLAGS_MASK);

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a version of unionFlags to be used when it is known that the position bits match, using a simple or.

Actually I don't think we should call this any other way -- so perhaps do something like

if (DEBUG) assertEquals(0, compare(pos1, pos2));
return pos1 | pos2;

in the method?

If you have the time to experiment, it would also be a good idea to check how costly each version is (with if (DEBUG), with an assertion without the if and assertions enabled (which is how Cassandra is normally run in production), and without the assertion altogether, or the version that masks the flags).

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrote a simple benchmark here are the results

[junit-timeout] ================================================================================
[junit-timeout] UnionFlags Performance Benchmark Suite
[junit-timeout] ================================================================================
[junit-timeout] Assertions enabled: true
[junit-timeout] DEBUG flag: false
[junit-timeout] Operations per iteration: 1000000
[junit-timeout] Warmup iterations: 5
[junit-timeout] Measurement iterations: 20
[junit-timeout] ================================================================================
[junit-timeout]
[junit-timeout] Warming up...
[junit-timeout] Warmup complete.
[junit-timeout]
[junit-timeout] Running benchmarks...
[junit-timeout]
[junit-timeout]
[junit-timeout] ================================================================================
[junit-timeout] RESULTS
[junit-timeout] ================================================================================
[junit-timeout] Current Masking Approach                :       9,154,872 ns total,     0.46 ns/op,   2,184,629,124 ops/sec
[junit-timeout] Simple OR with DEBUG Assertion          :       6,673,580 ns total,     0.33 ns/op,   2,996,892,223 ops/sec
[junit-timeout] Simple OR with Assertion                :       9,689,000 ns total,     0.48 ns/op,   2,064,196,512 ops/sec
[junit-timeout] Simple OR (No Assertion)                :       6,795,041 ns total,     0.34 ns/op,   2,943,322,932 ops/sec

if (!Cursor.isExhausted(nextPosition))
prepareNextPosition(currentPosition);
return currentPosition;
return encodedPosition();

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we apply the adjustment to currentPosition once here and return it unchanged in encodedPosition?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use the same adjustNextPosition mechanism as SingletonOrderedCursor, which will also simplify the latter?

currentPosition = Cursor.positionForDescentWithByte(pos, current);
nextPosition = Cursor.exhaustedPosition(currentPosition);
return currentPosition;
return encodedPosition();

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should always result in reaching the end state, i.e. we can directly add the flag to currentPosition above.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

case ROOT:
case NONE:
default:
if (!Cursor.isExhausted(encodedPosition))

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the NONE case negation will not change the boundary status of the state (negating a boundary still results in a boundary).
In the ROOT case we have to flip the flag (if a boundary existed we drop it, it one didn't we add it).

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it

class Negated implements TrieSetCursor
{
final TrieSetCursor source;
final Direction direction;

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, removed

break;
default:
return checkOverride(source.advance());
checkOverride(source.advance());

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checkOverride already calls encodedPosition() when needed.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

@blambov blambov force-pushed the CNDB-15669 branch 2 times, most recently from ea33ff1 to 290c584 Compare February 20, 2026 10:35
@lesnik2u lesnik2u force-pushed the CNDB-15669-mayHaveContent branch 2 times, most recently from 6f452e7 to 3df0ae5 Compare February 20, 2026 12:04
@lesnik2u lesnik2u requested a review from blambov February 20, 2026 12:06
// the head alone would not reflect.

// Optimization: if the head already has all flags set, no need to walk the heap
if ((pos & Cursor.FLAGS_MASK) == Cursor.FLAGS_MASK)

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this to be helpful we need to use a mask of only the flags that we can actually set.


/// Collects and caches the current position by unioning flags from all cursors at the same position.
/// This is called after advancing to ensure the position is always up-to-date.
private long collectAndCachePosition()

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I would call this collectAndCachePositionFlags().

{
currentPosition = nextPosition;
if (nodeContent != null || isLeaf(fullNode))
if (nodeContent != null)

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIU the only reason to have this now is the constructor. Let's do this there instead of checking for every advance.

(I was tempted to assert that the flag is as expected, but we already do it in DEBUG mode thus even that is not needed.)

return Cursor.exhaustedPosition(encodedPosition);
case ROOT:
// In ROOT case, set flag if negated state is a boundary
if (!Cursor.isExhausted(encodedPosition) && state().isBoundary())

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just do encodedPosition ^= MAY_HAVE_CONTENT_BIT.

The position cannot be exhausted when overriding is ROOT (run the tries package with an assertion if you want to be certain).

@lesnik2u lesnik2u Feb 25, 2026

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!isExhausted() check was redundant but we can't do just encodedPosition ^= MAY_HAVE_CONTENT_BIT
It assumes that the content status of the negated cursor is always the exact opposite of the original cursor's
IIUC then this would be a counter example

Scenario Source Set Source isBoundary()? (content bit) Source succeedingIncluded? Negated Set Needs Boundary? (correct bit) Does source_bit ^ 1 work?
1 ("a", "b") No (0) No Yes (1) Yes (0 ^ 1 = 1)
2 (null, "b") No (0) Yes No (0) No (0 ^ 1 = 1)

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't isBoundary() true for the source root in the second example?

@Override
public RangeState state()
{
Direction dir = direction();

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost all calls of this method will go through the NONE path. I'd rather we didn't call direction() for them.

Move this to the ROOT case.

@lesnik2u lesnik2u force-pushed the CNDB-15669-mayHaveContent branch from ce51443 to 3d92f1f Compare March 13, 2026 16:01
@lesnik2u lesnik2u requested a review from blambov March 13, 2026 16:03
// In ROOT case, set flag if negated state is a boundary
if (!Cursor.isExhausted(encodedPosition) && state().isBoundary())
encodedPosition |= MAY_HAVE_CONTENT_BIT;
if (!Cursor.isExhausted(pos) && state().isBoundary())

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really possible that we are in the root state and the source is exhausted?
(By contract a cursor cannot start in an exhausted state.)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both the source and negated cursors can have boundaries at the ROOT position, then XOR would be incorrect or am I mistaken here? eg.

Negated boundaries: [null, 616263, 616667, 616761, 616a62, null]
Negated set
Forward:
 -> START
61 -> CONTAINED
  62 -> CONTAINED
    63 -> END
  66 -> NOT_CONTAINED
    67↑ -> START
  67 -> CONTAINED
    61 -> END
  6a -> NOT_CONTAINED
    62↑ -> START
↑ -> END
Reverse:
 -> END
61 -> CONTAINED
  6a -> CONTAINED
    62 -> START
  67 -> NOT_CONTAINED
    61↑ -> END
  66 -> CONTAINED
    67 -> START
  62 -> NOT_CONTAINED
    63↑ -> END
↑ -> START
Tail for  FORWARD
  tail bounds [null, 616263, 616667, 616761, 616a62]
Forward:

java.lang.AssertionError: Non-null content for position without MAY_HAVE_CONTENT_BIT: depth 0 incomingTransition 00   FORWARD
TrieSetCursor$Negated pos depth 0 incomingTransition 00   FORWARD at  state START

@blambov blambov left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we also want to do the position check whenever we get mutationCursor.content() in all InMemoryTrie mutators.


// Always check if we are seeing new content; if we do, that's an easy state update.
S content = content();
S content = (position & MAY_HAVE_CONTENT_BIT) != 0 ? content() : null;

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to do this as content() is just this.content -- maybe inline it here instead?

@blambov blambov force-pushed the CNDB-15669 branch 2 times, most recently from 970213d to 66c45f1 Compare March 23, 2026 12:21
@lesnik2u lesnik2u force-pushed the CNDB-15669-mayHaveContent branch from af326a8 to 4df869e Compare March 23, 2026 13:54
@lesnik2u lesnik2u requested a review from blambov March 24, 2026 12:05
@blambov blambov force-pushed the CNDB-15669 branch 2 times, most recently from a4cd17f to f2e5ca8 Compare March 24, 2026 16:56
@lesnik2u lesnik2u force-pushed the CNDB-15669-mayHaveContent branch 3 times, most recently from ac98486 to 0734395 Compare March 27, 2026 12:22
@blambov blambov force-pushed the CNDB-15669 branch 3 times, most recently from 952e794 to cfce0f0 Compare April 3, 2026 13:50
lesnik2u added 3 commits May 26, 2026 14:08
…lag in trie cursors. This metadata flag allows skipping content lookups for structural nodes and is correctly propagated and unioned across all cursor implementations.
}

// MAY_HAVE_CONTENT_BIT optimization: only call content() if flag indicates potential content
if ((currentPosition & MAY_HAVE_CONTENT_BIT) != 0)

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's a good idea for the deletion branch to be presented before content that covers it. Dumps, for one, look weird with this order.

assertFresh();
T content = content(); // handle content on the root node
long currentPosition = encodedPosition();
T content = (currentPosition & MAY_HAVE_CONTENT_BIT) != 0 ? content() : null;

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to put this in a static Cursor.content(cursor, cursorPosition) method without changing the performance.

We can also add a line in the content() javadoc to say it's preferable to get this via the static. If you prefer, call it Cursor.checkFlagAndGetContent or something similar.

{
assertFresh();
T content = content(); // handle content on the root node
long currentPosition = encodedPosition();

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about making assertFresh return the position (which it has to fetch anyway) and rename it to something like getPositionAndAssertFresh?

{
if (!Cursor.isExhausted(position))
{
currentPosition = position;

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICS callers already update currentPosition, this should not be necessary.

if (rootAscentContent != null)
addBacktrack(NONE, 0, -1);
updateActiveAndReturn(encodedPosition());
setNodeState(currentPosition, rootDescentContent, currentFullNode, currentNode);

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about

assert currentFullNode == root && currentNode == root;
long rootPos = encodedPosition();
if (rootDescentContent != null)
    setNodeState(rootPos | MAY_HAVE_CONTENT_BIT, rootDescentContent, root, root);
updateActiveAndReturn(rootPos);

so that we don't need to change the visibility of currentPosition?

I have messed up this construction a bit, making it do unnecessary work and call updateActiveAndReturn twice; if you prefer, add a separate constructor that doesn't do any of the usual preparation so that we can set the values we need directly.

}

@Override
public long encodedPosition()

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't the base class already deal with the content flag? What does the deletion branch have to do with it?

{
return Cursor.isRootPosition(encodedPosition()) && deletionBranch != null
long pos = encodedPosition();
return (Cursor.isRootPosition(pos) || Cursor.compare(pos, matchingPositionAtRoot) == 0) && deletionBranch != null

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we presenting the deletion branch again at after the prefix?

{
if (Cursor.isRootPosition(encodedPosition()))
long pos = encodedPosition();
if (Cursor.isRootPosition(pos) || Cursor.compare(pos, matchingPositionAtRoot) == 0)

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this change either. If we are at the root, we should present the deletion branch. If not, we should present the live path only.

There is a problem with the existing code, though, when the input cursor could return deletion branches (e.g. for prefixedBySeparately(prefix, false)) which I'm fixing in the base PR.

if (!Cursor.isExhausted(nextPosition))
prepareNextPosition(currentPosition);
return currentPosition;
return encodedPosition();

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use the same adjustNextPosition mechanism as SingletonOrderedCursor, which will also simplify the latter?

}
else
return checkOverride(source.skipTo(encodedSkipPosition));
checkOverride(source.skipTo(encodedSkipPosition));

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should return because checkOverride already calls updateCurrentPosition().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants