CASSANDRA-19433 Nodetool cleanup can drop data Accord might need#4692
CASSANDRA-19433 Nodetool cleanup can drop data Accord might need#4692alanwang67 wants to merge 18 commits intoapache:trunkfrom
Conversation
… deleted by nodetool cleanup
aweisberg
left a comment
There was a problem hiding this comment.
This is partial feedback, have to get to a meeting, but this should unblock some progress.
| if (!AccordService.isSetup()) | ||
| return false; | ||
|
|
||
| Future<List<Ranges>> futureAccordOwnedRanges = AccordService.toFuture(AccordService.instance().node().commandStores().getInUseRanges()); |
There was a problem hiding this comment.
This is going to fetch the in use ranges for every single sstable of which there are thousands. It should probably fetch them once at the start and then keep referring to it. Also if the usage here is synchronous then the Accord method should also be synchronous or have a synchronous version you can call so that the boilerplate of getting the async value is handled inside the helper function.
| if (sstable.getFirst().equals(sstable.getLast())) | ||
| return accordOwnedRanges.intersects(new TokenKey(tableId, sstable.getFirst().getToken())); | ||
|
|
||
| Ranges sstableRanges = Ranges.of(TokenRange.create(tableId, sstable.getFirst().getToken(), sstable.getLast().getToken())); |
There was a problem hiding this comment.
Do you actually need to wrap this in Ranges in order to test intersection?
| logger.info("Finished {} for {}.{} successfully", operationType, keyspace, table); | ||
| return AllSSTableOpStatus.SUCCESSFUL; | ||
|
|
||
| // Some SSTables were not fully cleaned up because Accord is still using those ranges |
There was a problem hiding this comment.
This is parallelAllSSTablesOperation and cleanup is just one kind of thing it might do?
There was a problem hiding this comment.
I see, though we can only have an incomplete operation with cleanup, but let me fix the comment to be more general. Do we want to extend this to the other operations as well?
| boolean needsCleanupTransient = !transientRanges.isEmpty() && sstable.isRepaired() && needsCleanup(sstable, transientRanges); | ||
| totalSSTables++; | ||
|
|
||
| if (sstableContainsRangesNeededByAccord(cfStore.getTableId(), sstable)) |
There was a problem hiding this comment.
This entirely excludes sstables from cleanup assuming they are entirely needed by Accord when in reality it's just an intersecting section that is needed. Cleanup will actually rewrite sstables to remove the parts that can be cleaned up.
Being unable to clean up an sstable doesn't represent an incomplete operation unless the table is also no longer owned? Just because Accord needs it doesn't mean we were going to clean it up. Unless I am missing something earlier about already having filtered the sstables to be the ones intersecting non-owned ranges. I'm not sure if we are iterating all or just some.
There was a problem hiding this comment.
I see, I think what I will do instead is to just add the range that Accord is using to the full range. This way when cleanup is done we will still remove ranges from the SSTables even if Accord partially owns a piece of it and we can just remove this if statement all together.
…o owned ranges and let existing logic handle
aweisberg
left a comment
There was a problem hiding this comment.
There is a lot going on here that I need to figure out with transient ranges. The correct behavior for transient ranges changed with new witnesses and mutation tracking.
I'm also really not clear on what doCleanupOne is doing tracking incomplete based on Accord owned ranges. I guess what it is doing is reported that a row actually intersected with an Accord owned range? I don't think we need to do that on a per row basis it can be done once in filterSSTables. If any sstable intersects an exclusively Accord owned range (not in the write placement ranges) we know cleanup will be incomplete because we expanded the set of owned ranges to include the Accord owned ranges.
| ABORTED(1), | ||
| UNABLE_TO_CANCEL(2); | ||
| UNABLE_TO_CANCEL(2), | ||
| INCOMPLETE(3); |
There was a problem hiding this comment.
Nodeprobe doesn't handle this. Cleanup should signal an error if some data wasn't cleaned up since what was requested didn't fully happen.
| } | ||
|
|
||
| @Override | ||
| public boolean incompleteOperation() |
There was a problem hiding this comment.
Use a default method to avoid having to add this everywhere
| { | ||
| CleanupStrategy cleanupStrategy = CleanupStrategy.get(cfStore, allRanges, transientRanges, txn.onlyOne().isRepaired(), FBUtilities.nowInSeconds()); | ||
| doCleanupOne(cfStore, txn, cleanupStrategy, allRanges, hasIndexes); | ||
| this.incompleteOperation = doCleanupOne(cfStore, txn, cleanupStrategy, allRanges, hasIndexes, rangesInUseByAccord); |
There was a problem hiding this comment.
This overwrites whether it was complete for each sstable so the final value is just what happened to the last sstable. You also don't need this?
There was a problem hiding this comment.
Got it, will remove this and perform this in filter operations.
| } | ||
|
|
||
| final Set<Range<Token>> allRanges = Stream.concat(localWrites.ranges().stream(), rangesInUseByAccord.stream()).collect(Collectors.toSet()); | ||
| final Set<Range<Token>> transientRanges = Stream.concat(localWrites.onlyTransient().ranges().stream(), rangesInUseByAccord.stream()).collect(Collectors.toSet()); |
There was a problem hiding this comment.
Accord doesn't support transient replication so it should really be a checkState erroring out if we are using Accord and there are transient ranges. Either way for transient ranges Accord may actually be using them even when there is cleanup work to do to remove them. The check would be more complicated because we would need Accord to report whether it is using the range transiently or fully and that distinction doesn't exist yet.
So that is why checkState and then don't expand the transient ranges to include Accord owned ranges. Just leave it as is.
| } | ||
|
|
||
| @Override | ||
| public List<Ranges> getInUseRanges() |
There was a problem hiding this comment.
getInUseRanges treats the ranges Accord manages as being global across tables, but it's actually a per table property. This should be a map from TableId to a collection of ranges.
aweisberg
left a comment
There was a problem hiding this comment.
Almost there, I have a feeling this will be the last round :-)
|
|
||
| final Set<Range<Token>> allRanges = new HashSet<>(localWrites.ranges()); | ||
| Set<Range<Token>> noLongerOwnedRangesInUseByAccord = new HashSet<>(); | ||
| if (AccordService.isSetup()) |
There was a problem hiding this comment.
Accord can be set up and it's fine. It's what is going on with this specific table that matters since whether Accord is enabled is on a per table basis.
It should also be a checkState since it's an invariant that should already be being enforced.
| { | ||
| CleanupStrategy cleanupStrategy = CleanupStrategy.get(cfStore, allRanges, transientRanges, txn.onlyOne().isRepaired(), FBUtilities.nowInSeconds()); | ||
| doCleanupOne(cfStore, txn, cleanupStrategy, allRanges, hasIndexes); | ||
| if (doCleanupOne(cfStore, txn, cleanupStrategy, allRanges, hasIndexes, noLongerOwnedRangesInUseByAccord)) |
There was a problem hiding this comment.
The "if" isn't needed technically. It could be an |=, but that is me being super pedantic about style.
| Set<Range<Token>> noLongerOwnedRangesInUseByAccord = new HashSet<>(); | ||
| if (AccordService.isSetup()) | ||
| { | ||
| if (!localWrites.onlyTransient().ranges().isEmpty()) |
There was a problem hiding this comment.
Specifically this is too strict since it doesn't check if the transient ranges are on an Accord enabled table. You also technically don't need to do this check here because it should be enforced during schema change, but it's nice to have.
| } | ||
| } | ||
|
|
||
| final Set<Range<Token>> allRanges = Stream.concat(localWrites.ranges().stream(), noLongerOwnedRangesInUseByAccord.stream()).collect(Collectors.toSet()); |
There was a problem hiding this comment.
The choice of set as a data structure is just odd because Ranges don't work well as sets since it still allows overlaps and doesn't normalize. This whole section would be a little more sane if we used NormalizedRanges as the final storage, but that might be finicky so for this make sure to Normalize the ranges before constructing the final set so it's at least a sane looking set of ranges.
Checks that range for SSTables being cleaned up is not still being used by Accord CommandStores prior to cleaning it up. apache/cassandra-accord#270 needs to be merged prior.
patch by Alan Wang;
reviewed by for CASSANDRA-19433
Co-authored-by: Name1
Co-authored-by: Name2