Skip to content

CASSANDRA-19433 Nodetool cleanup can drop data Accord might need#4692

Open
alanwang67 wants to merge 18 commits intoapache:trunkfrom
alanwang67:alanwang599/CASSANDRA-19433/trunk
Open

CASSANDRA-19433 Nodetool cleanup can drop data Accord might need#4692
alanwang67 wants to merge 18 commits intoapache:trunkfrom
alanwang67:alanwang599/CASSANDRA-19433/trunk

Conversation

@alanwang67
Copy link
Copy Markdown
Contributor

Checks that range for SSTables being cleaned up is not still being used by Accord CommandStores prior to cleaning it up. apache/cassandra-accord#270 needs to be merged prior.

patch by Alan Wang;

reviewed by for CASSANDRA-19433

Co-authored-by: Name1
Co-authored-by: Name2

Copy link
Copy Markdown
Contributor

@aweisberg aweisberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is partial feedback, have to get to a meeting, but this should unblock some progress.

if (!AccordService.isSetup())
return false;

Future<List<Ranges>> futureAccordOwnedRanges = AccordService.toFuture(AccordService.instance().node().commandStores().getInUseRanges());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to fetch the in use ranges for every single sstable of which there are thousands. It should probably fetch them once at the start and then keep referring to it. Also if the usage here is synchronous then the Accord method should also be synchronous or have a synchronous version you can call so that the boilerplate of getting the async value is handled inside the helper function.

Comment thread src/java/org/apache/cassandra/db/compaction/CompactionManager.java Outdated
if (sstable.getFirst().equals(sstable.getLast()))
return accordOwnedRanges.intersects(new TokenKey(tableId, sstable.getFirst().getToken()));

Ranges sstableRanges = Ranges.of(TokenRange.create(tableId, sstable.getFirst().getToken(), sstable.getLast().getToken()));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you actually need to wrap this in Ranges in order to test intersection?

logger.info("Finished {} for {}.{} successfully", operationType, keyspace, table);
return AllSSTableOpStatus.SUCCESSFUL;

// Some SSTables were not fully cleaned up because Accord is still using those ranges
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is parallelAllSSTablesOperation and cleanup is just one kind of thing it might do?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, though we can only have an incomplete operation with cleanup, but let me fix the comment to be more general. Do we want to extend this to the other operations as well?

boolean needsCleanupTransient = !transientRanges.isEmpty() && sstable.isRepaired() && needsCleanup(sstable, transientRanges);
totalSSTables++;

if (sstableContainsRangesNeededByAccord(cfStore.getTableId(), sstable))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This entirely excludes sstables from cleanup assuming they are entirely needed by Accord when in reality it's just an intersecting section that is needed. Cleanup will actually rewrite sstables to remove the parts that can be cleaned up.

Being unable to clean up an sstable doesn't represent an incomplete operation unless the table is also no longer owned? Just because Accord needs it doesn't mean we were going to clean it up. Unless I am missing something earlier about already having filtered the sstables to be the ones intersecting non-owned ranges. I'm not sure if we are iterating all or just some.

Copy link
Copy Markdown
Contributor Author

@alanwang67 alanwang67 Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I think what I will do instead is to just add the range that Accord is using to the full range. This way when cleanup is done we will still remove ranges from the SSTables even if Accord partially owns a piece of it and we can just remove this if statement all together.

Copy link
Copy Markdown
Contributor

@aweisberg aweisberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a lot going on here that I need to figure out with transient ranges. The correct behavior for transient ranges changed with new witnesses and mutation tracking.

I'm also really not clear on what doCleanupOne is doing tracking incomplete based on Accord owned ranges. I guess what it is doing is reported that a row actually intersected with an Accord owned range? I don't think we need to do that on a per row basis it can be done once in filterSSTables. If any sstable intersects an exclusively Accord owned range (not in the write placement ranges) we know cleanup will be incomplete because we expanded the set of owned ranges to include the Accord owned ranges.

ABORTED(1),
UNABLE_TO_CANCEL(2);
UNABLE_TO_CANCEL(2),
INCOMPLETE(3);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nodeprobe doesn't handle this. Cleanup should signal an error if some data wasn't cleaned up since what was requested didn't fully happen.

}

@Override
public boolean incompleteOperation()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a default method to avoid having to add this everywhere

{
CleanupStrategy cleanupStrategy = CleanupStrategy.get(cfStore, allRanges, transientRanges, txn.onlyOne().isRepaired(), FBUtilities.nowInSeconds());
doCleanupOne(cfStore, txn, cleanupStrategy, allRanges, hasIndexes);
this.incompleteOperation = doCleanupOne(cfStore, txn, cleanupStrategy, allRanges, hasIndexes, rangesInUseByAccord);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This overwrites whether it was complete for each sstable so the final value is just what happened to the last sstable. You also don't need this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, will remove this and perform this in filter operations.

}

final Set<Range<Token>> allRanges = Stream.concat(localWrites.ranges().stream(), rangesInUseByAccord.stream()).collect(Collectors.toSet());
final Set<Range<Token>> transientRanges = Stream.concat(localWrites.onlyTransient().ranges().stream(), rangesInUseByAccord.stream()).collect(Collectors.toSet());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accord doesn't support transient replication so it should really be a checkState erroring out if we are using Accord and there are transient ranges. Either way for transient ranges Accord may actually be using them even when there is cleanup work to do to remove them. The check would be more complicated because we would need Accord to report whether it is using the range transiently or fully and that distinction doesn't exist yet.

So that is why checkState and then don't expand the transient ranges to include Accord owned ranges. Just leave it as is.

Comment thread src/java/org/apache/cassandra/db/compaction/CompactionManager.java Outdated
}

@Override
public List<Ranges> getInUseRanges()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getInUseRanges treats the ranges Accord manages as being global across tables, but it's actually a per table property. This should be a map from TableId to a collection of ranges.

Copy link
Copy Markdown
Contributor

@aweisberg aweisberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there, I have a feeling this will be the last round :-)


final Set<Range<Token>> allRanges = new HashSet<>(localWrites.ranges());
Set<Range<Token>> noLongerOwnedRangesInUseByAccord = new HashSet<>();
if (AccordService.isSetup())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accord can be set up and it's fine. It's what is going on with this specific table that matters since whether Accord is enabled is on a per table basis.

It should also be a checkState since it's an invariant that should already be being enforced.

Comment thread src/java/org/apache/cassandra/db/compaction/CompactionManager.java Outdated
{
CleanupStrategy cleanupStrategy = CleanupStrategy.get(cfStore, allRanges, transientRanges, txn.onlyOne().isRepaired(), FBUtilities.nowInSeconds());
doCleanupOne(cfStore, txn, cleanupStrategy, allRanges, hasIndexes);
if (doCleanupOne(cfStore, txn, cleanupStrategy, allRanges, hasIndexes, noLongerOwnedRangesInUseByAccord))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "if" isn't needed technically. It could be an |=, but that is me being super pedantic about style.

Set<Range<Token>> noLongerOwnedRangesInUseByAccord = new HashSet<>();
if (AccordService.isSetup())
{
if (!localWrites.onlyTransient().ranges().isEmpty())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically this is too strict since it doesn't check if the transient ranges are on an Accord enabled table. You also technically don't need to do this check here because it should be enforced during schema change, but it's nice to have.

}
}

final Set<Range<Token>> allRanges = Stream.concat(localWrites.ranges().stream(), noLongerOwnedRangesInUseByAccord.stream()).collect(Collectors.toSet());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The choice of set as a data structure is just odd because Ranges don't work well as sets since it still allows overlaps and doesn't normalize. This whole section would be a little more sane if we used NormalizedRanges as the final storage, but that might be finicky so for this make sure to Normalize the ranges before constructing the final set so it's at least a sane looking set of ranges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants