Minor compaction by GWphua · Pull Request #19016 · apache/druid

GWphua · 2026-02-12T10:26:41Z

Motivation

Submitting a compaction task with SpecificSegmentsSpec (segment IDs) would cause Druid to lock, read, and rewrite all segments in the umbrella interval, defeating the purpose of targeting specific segments.

This results in very long compaction tasks, as the entire interval's segments are being considered for compaction. After the changes are being introduced, we are able to select multiple small segments to compact instead of processing all segments in the interval. This reduces the time taken for compaction from ~3h to ~5min.

Description

This PR introduces minor compaction to the native engine. Users are expected to submit a MinorCompactionInputSpec targeting only the specified segments.

Segments in the same interval that are not in the spec are upgraded in-place via MarkSegmentToUpgradeAction (metadata-only version bump, no
physical rewrite).

The bulk of this PR is built upon the changes made in #19059

Also, removing segment locking in the future will close all related issues, such as #9571, #10911 (there should be many more issues I'm not listing here)

Release note

You can now submit native-engine minor compaction tasks using MinorCompactionInputSpec.

Submitting compaction tasks using the 'segments' inputSpec is now deprecated. Use the 'uncompacted' inputSpec instead.

Key changed/added classes in this PR

CompactionTask
NativeCompactionRunner
IndexTask
ParallelIndexSupervisorTask
ParallelIndexSupervisorTask
SinglePhaseSubTask
MSQCompactionRunner
Tests

This PR has:

been self-reviewed.
added documentation for new or modified features or behaviors.
a release note entry in the PR description.
added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
added or updated version, license, or notice information in licenses.yaml
added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
added integration tests.
been tested in a test Druid cluster.

CompactionTaskTest TaskLockHelperTest

...xing-service/src/main/java/org/apache/druid/indexing/common/task/NativeCompactionRunner.java

gianm · 2026-02-12T16:43:00Z

It looks like this and #18996 are aiming at similar goals but are taking different approaches. A big one is that #18996 only works with MSQ compaction and this one only works with non-MSQ compaction tasks. I am wondering if they can coexist.

re: this piece,

MSQ compaction is fundamentally incompatible with minor compaction: it forces dropExisting = true, uses REPLACE ingestion mode (which acquires TIME_CHUNK locks covering the full interval), and queries via MultipleIntervalSegmentSpec.

#18996 deals with the replace issue by using the "upgrade" system that was introduced for concurrent replace (system from #14407, #15039, #15684). The segments that are not being compacted are carried through without modification ("upgraded"). It deals with the MultipleIntervalSegmentSpec issue by using a new feature in TableInputSpec to be able to reference specific segments (#18922).

GWphua · 2026-02-13T06:41:05Z

Thanks for pointing this out @gianm, I see that #18996 happen to fix compaction on the MSQ side, and that's pretty neat! I do not have much experience with MSQ, given that we are still using Druid v27 (yea, its old... but we are upgrading soon).

In our production servers, we used this PR by making a script to select segments, and issue minor compaction specs. There are still further plans to incorporate segment selection with automatic compaction.

Would like to ask what is the direction for handling specific segments? I see that there are some discussions about SpecificSegmentsSpec feeling somewhat unused... If the new feature in TableInputSpec is applicable for my use case, I would be happy to collaborate and make changes on my side 😄

cecemei · 2026-02-24T20:00:35Z

Hi @GWphua , took a glance at your pr, several questions:

seems like this method depends on SEGMENT lock and is not compatible with TIME_CHUNK lock, is this true?
what happens to segments in the same interval but it not included in SpecificSegmentsSpec? since a new version overwrites older versions automatically, wouldn't they just become invisible?

GWphua · 2026-02-25T02:09:13Z

Thanks for taking a look @cecemei

Minor compaction depends on SEGMENT lock introduced in #7547 (See #7491 also for more info). The use of segment lock assumes that the number of segments in the same datasource and same interval will not exceed Short.MAX_VALUE.

Let's say there are 100 segments in the same interval. we specify segments 0~4 in the SpecificSegmentsSpec to be compacted:

A new segment segmentId_32768 will be created from segments 0~4, with a higher MINOR version. The shartSpec will be something like this:

{
  "type": "numbered_overwrite",
  "partitionId": 32768,
  "startRootPartitionId": 0,
  "endRootPartitionId": 5,
  "minorVersion": 1,
  "atomicUpdateGroupSize": 1
}

The rest of the segments (5-100) will still stay queryable + available because they belong in the same MAJOR version. The segments 0~4 will then be scheduled to be killed.

Let's say that we do a major compaction right after (on the partitions {5, 6, .., 99, 100, 32768} ). IIUC, all segments in the same interval will be upgraded to a new MAJOR version. If the major compaction results in N segments created, we will see partitionIds {0, 1, ..., N-2, N-1}, all of them in the newer MAJOR version.

kfaraz · 2026-03-03T09:50:11Z

@GWphua , as @cecemei points out, there are some concerns with this implementation:

It works only with SEGMENT locking.
Segment locking is known to have issues and has already been deprecated and will be removed fairly soon.
Wouldn't work with concurrent append and replace.
Wouldn't work with auto-compaction and only manually submitted compaction tasks (I see that you have already mentioned thinking how to integrate it with auto-compaction).

I would advise revising the implementation in this PR to make use of the REPLACE/APPEND (i.e. concurrent supported locks), and the upgrade mechanism similar to what is being done in #19059 (revision of #18996).
Once #19059 has been merged, you can update your PR to use those building blocks (I think minimal changes would be needed since everything would already be in place).

In short, the approach would be something like this:

Submit a compaction task (with auto-compaction or manual) with an input spec that specifies an interval as well as a set of "uncompacted" segment IDs
Ensure that the compaction task holds a REPLACE lock (set useConcurrentLocks: true in the task context). Do not use SEGMENT lock.
Identify all the segments in the interval. The segments that are already compacted should be marked to be "upgraded" once the task finishes
Launch compaction task for only the uncompacted segments
- No need to update the AbstractTask.findSegmentsToLock() since we will always lock the whole interval
- Just passing the correct segmentIds to the DruidInputSource should suffice
Upon completion, segments will be upgraded automatically and queries will start seeing new compacted data.

GWphua · 2026-03-04T02:31:36Z

@kfaraz

Thanks for the detailed rundown of the motivation behind removing segment locks, I will work on this after #19059 is merged.

I have some concerns regarding the new mechanism. Not sure if some of the things I enjoyed when using segmentLock will be preserved, answering these questions which will greatly help me understand things faster:

In an example of a interval having partitions [0, 10] and we have manually submitted 2 compaction task that compacts [0, 4] and [5, 8]. Will these 2 tasks be able to execute simultaneously, or will one task be blocked by the interval-wide lock?
I am assuming segmentId will be changed after upgrade here. Suppose compaction tasks submitted in (1) are now running, but we drafted a compaction task for segments [9, 10] based on the old segmentIds, and submitted it to Druid before the compaction tasks in (1) have completed. Will Druid run the task targeting the intended segments, or will it fail as the 'un-upgraded' segments [9,10] no longer exist?

kfaraz · 2026-03-09T09:45:14Z

Thanks, @GWphua !

In #19059 , compaction tasks will use REPLACE locks which are mutually exclusive with each other. So, at any time, only a single compaction task can be running for an interval. But there can be any number of concurrent APPEND jobs that may be writing to the same interval.

Supporting multiple concurrent REPLACE tasks on the same interval would have to be a future enhancement.

Meanwhile, with the support of minor compaction coupled with compaction supervisors (which are much more reactive in scheduling compaction jobs as compared to the old-style compaction duty on the Coordinator), I feel that compaction would be fast enough that we would not need to launch multiple jobs for the same interval together anyway.

Please let me know if that answers your question.

GWphua · 2026-03-10T10:18:34Z

Thanks @kfaraz!

Love the clarification about locks! I know that segment lock is kinda a can of worms, and since we are shifting to using this new lock mechanism which sounds more polished, I'm down to make changes to my current implementation.

As for compaction being fast, let me share a bit about the use case:

We have not been actively compacting our segments, and are recently trying to apply compaction across datasources that are using Kafka ingestion.

These datasources have months worth of data, and have 10k+ segments per segmentGranularity (hour). A single major compaction takes 8~10 hours for a single time chunk.

We thought of working with this strategy: Compacting the newer data, and let the older data expire. This strategy does not solve the issue when compaction is slower than ingestion (and also does not work for clusters using loadForever). By implementing Concurrent Minor Compactions, we can parallelize the workload.

Each minor compaction takes 5min. The shorter time, coupled with multiple minor tasks submitted simultaneously (sometimes for same time chunk), reduces what would take an estimated 24 months of compaction time down to 18 days. Providing concurrent minor compaction will help users that want the segments to be compacted urgently.

Supporting multiple concurrent REPLACE tasks on the same interval would have to be a future enhancement

The main concern when coming to compaction is whether the speed of compaction is able to keep up with our ingestion load. We have compacted all of our historical data already, and do not really have a high compaction speed requirement anymore. I would love to test the MSQ implementation of minor compaction, and if results are satisfactory then this can take a back-seat. 👍

kfaraz · 2026-03-12T04:53:34Z

The main concern when coming to compaction is whether the speed of compaction is able to keep up with our ingestion load. We have compacted all of our historical data already, and do not really have a high compaction speed requirement anymore. I would love to test the MSQ implementation of minor compaction, and if results are satisfactory then this can take a back-seat. 👍

Sounds good, @GWphua ! Thanks for providing the context.

FYI, #19059 has been merged. Please feel free to try it out.
I look forward to hearing about the results of running MSQ major/minor compaction from you. 🙂

GWphua · 2026-03-13T06:10:51Z

Hi @kfaraz

I have tested out MSQ minor + major compaction. We are able to have multiple workers running on the same interval, and this allows us to speed up the compaction process. I am very happy with this outcome.

For a single interval, major compaction tasks timed out after running >8h, and minor compaction is able to accomplish it within 10min.

kfaraz · 2026-03-13T06:42:43Z

That is great news, @GWphua !

For a single interval, major compaction tasks timed out after running >8h, and minor compaction is able to accomplish it within 10min.

Wow, those are great time savings!

We are able to have multiple workers running on the same interval, and this allows us to speed up the compaction process.

By this, do you mean launch a single minor compaction job with a large number of MSQ workers?

GWphua · 2026-03-13T06:58:01Z

4 task slots are reserved for each case. Each task slot is provided with the same cpu/mem resources.
Both tasks are working on a single interval.

For the minor case, there is an additional requirement to specify the segment Id to be retrieved for compaction. On my side, I tweaked my minor compaction script to first search compactible segments, then submit these segments for compaction.

Made-with: Cursor

GWphua · 2026-03-19T03:49:13Z

Ran native minor compaction tasks on a test cluster:

Native minor compaction tasks will be blocked by other tasks running on a single interval due to time chunk locking. (No concurrent tasks can be submitted)
Native minor compaction tasks now can target non-consecutive segments (Fixes Allow minor compaction for non-consecutive segments #9768)
Native minor compaction now can compact selected segments instead of whole time chunk (Fixes Enable auto minor compaction #9712)

kfaraz · 2026-03-19T05:48:01Z

Native minor compaction tasks will be blocked by other tasks running on a single interval due to time chunk locking. (No concurrent tasks can be submitted)

@GWphua , wouldn't native minor compaction work with useConcurrentLocks: true, same as MSQ minor compaction?
If we use that, we can have concurrent append tasks working on an interval while native minor compaction is in progress. We cannot have other compaction tasks (minor or major) for that interval though.

kfaraz

Overall looks good, left some minor queries/suggestions.
Once these are resolved, we should be good to go.

kfaraz · 2026-03-19T05:50:35Z

docs/data-management/manual-compaction.md

 |-----|-----------|-------|--------|
 |`type`|Task type. Set the value to `compact`.|none|Yes|
-|`inputSpec`|Specification of the target [interval](#interval-inputspec) or [segments](#segments-inputspec).|none|Yes|
+|`inputSpec`|Specification of the target [interval](#interval-inputspec) or [uncompacted](#uncompacted-inputspec).|none|Yes|


I wonder if we should change the type name from uncompacted to minor.
Since this feature has not been released yet, I think we still have time to fix it up.
If we do this, the field name can be changed from uncompactedSegments to segmentsToCompact.

uncompacted is slightly incorrect because there may be segments that were appended (by a concurrent APPEND job) after the minor compaction task started. The minor compaction task would not compact these segments (since they are not present in uncompactedSegments passed to the spec) and would simply be upgraded instead.

Agree.

I was thinking about minor inputSpec, and changing the field name to segments for brevity, especially since we know minor compaction is for targeting specific segments. lmk if its ok!

kfaraz · 2026-03-19T05:51:34Z

docs/data-management/manual-compaction.md

+
+### Segment `inputSpec`
+
+The segment `inputSpec` is deprecated, instructions for usage will no longer be documented. Please use the above 2 `inputSpec` instead.


Suggested change

The segment `inputSpec` is deprecated, instructions for usage will no longer be documented. Please use the above 2 `inputSpec` instead.

The segment `inputSpec` is deprecated, instructions for usage will no longer be documented. Please use the types `interval` or `uncompacted` instead.

indexing-service/src/test/java/org/apache/druid/indexing/common/task/TaskLockHelperTest.java

...xing-service/src/main/java/org/apache/druid/indexing/common/task/NativeCompactionRunner.java

kfaraz · 2026-03-19T06:12:08Z

...xing-service/src/main/java/org/apache/druid/indexing/common/task/NativeCompactionRunner.java

+    // Native minor compaction uses REPLACE ingestion mode, which uses time chunk lock.
+    if (compactionTask.getIoConfig().getInputSpec() instanceof MinorCompactionInputSpec) {
+      newContext.put(Tasks.FORCE_TIME_CHUNK_LOCK_KEY, true);
+      newContext.put(Tasks.USE_CONCURRENT_LOCKS, true);


Why should we force useConcurrentLocks here?

If useConcurrentLocks is a requirement for minor compaction, I think we should validate that in the CompactionTask constructor instead so that users are aware of the requirement.

Also, I wonder if there is any real benefit to forcing these context parameters here since IIUC, the locks would have already been acquired at this point.

locks would have already been acquired at this point.

True, Compaction Tasks without useConcurrentLocks set to true will cause this error:

org.apache.druid.java.util.common.IOE: Error with status[500 Server Error] and message[{"error":"org.apache.druid.error.DruidException: Segments to upgrade must be covered by a REPLACE lock. Only [0] out of [454] segments are covered."}]. Check overlord logs for details.

I will use a validator to return a better error message.

kfaraz · 2026-03-19T06:15:18Z

indexing-service/src/test/java/org/apache/druid/indexing/common/task/CompactionTaskTest.java

+
+    final MinorCompactionInputSpec minorSpec = new MinorCompactionInputSpec(
+        testInterval,
+        ImmutableList.of(segment6.toDescriptor(), segment7.toDescriptor(), segment8.toDescriptor())


Nit: Use List.of(), Map.of() and Set.of() for brevity.

Sure, it will also be nice to include a standard for using this over other APIs. (Thought i forsee a large amount of changes that may come with it 🥴)

kfaraz · 2026-03-19T06:36:29Z

indexing-service/src/test/java/org/apache/druid/indexing/common/task/CompactionTaskTest.java

+    final TestTaskActionClient taskActionClient = new TestTaskActionClient(allSegmentsInInterval);
+
+    // Verify findSegmentsToLock() returns ALL segments in interval (no filtering)
+    final List<DataSegment> segmentsToLock = compactionTask.findSegmentsToLock(


IIUC, the findSegmentsToLock() method is invoked only when we go down the segment locking code flow.
Ideally, CompactionTask.isReady() would call determineLockGranularityAndTryLock() which would short-circuit to return a TIME_CHUNK lock and we would never go down the path that involves methods determineSegmentGranularity() or findSegmentsToLock().

If needed, we can perform validatations in CompactionTask constructor to ensure that minor compaction is used only with ingestionMode == REPLACE. We need not support REPLACE_LEGACY (which uses isDropExisting: false) since it is a legacy mode and has been deprecated for a while. I don't think that mode makes sense in the context of compaction either.

Let me know what you think.

Since it is a legacy mode, there's no need to support it anymore. I have added validation and appropriate tests.

GWphua · 2026-03-20T06:14:23Z

Hi @kfaraz

Thanks for the suggestions. The main changes here is:

type name from uncompacted to minor
field name changed from uncompactedSegments to segments
Validation to Compaction Task construction.

Do take a look again. Thanks!

GWphua added 3 commits February 12, 2026 17:23

Test Driven Dev

ef27a95

CompactionTaskTest TaskLockHelperTest

Minor Compaction Impl

2f4cd1d

Deprecated fixes

558e049

github-actions bot added Area - Batch Ingestion Area - Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 labels Feb 12, 2026

github-advanced-security bot found potential problems Feb 12, 2026

View reviewed changes

...xing-service/src/main/java/org/apache/druid/indexing/common/task/NativeCompactionRunner.java Fixed Show fixed Hide fixed

gianm mentioned this pull request Feb 12, 2026

add incremental compaction support #18996

Draft

10 tasks

FrankChen021 approved these changes Feb 20, 2026

View reviewed changes

GWphua added the Design Review label Mar 12, 2026

GWphua marked this pull request as draft March 13, 2026 08:52

GWphua added 7 commits March 17, 2026 11:21

Merge origin/master into minor-compaction-refactor, resolve conflicts

e681b05

Made-with: Cursor

Check minor compaction

8718cce

Minor compaction impl

5d6518e

Documentations

d8fa8d6

Tidy up segmentProvider

fe7661b

We no longer need the context key

f58a242

Bug fixes

f7867dd

GWphua added 9 commits March 17, 2026 17:53

Integrate minor compaction spec

f8279ba

Checkstyle

7e0262a

Docs

8c482b1

Tidy up changes

2cc684c

Revert changes to SpecificSegmentsSpec

3a95032

Spell check

20e34a9

Force time chunk lock only for minor compaction

2851351

Docs

15f1d7b

Minor compaction test

b7d68f5

GWphua requested a review from kfaraz March 18, 2026 06:10

Native minor compaction add in default values

3b08dc9

kfaraz reviewed Mar 19, 2026

View reviewed changes

GWphua added 15 commits March 19, 2026 14:55

Junit5 for TaskLockHelperTest

ee97aea

ImmutableXXX.of() -> XXX.of() methods in Test

5c65976

Validate minor compaction task configs

82ca1f2

Compaction IO Config creation refactor

80d4d77

Validations

234f619

Test fixes

fbba2f0

Remove test asserting finding segments to lock

eee9524

Validation tests

0f728ab

Rename uncompacted -> minor compaction

15890b4

Checkstyle

18d2f5c

Solve validation errors in InputSpecTest

e1e3de1

Change name to use 'minor' for compaction

5a9a110

Minor compaction input spec fix

c2ff657

Revert unintended changes in Druid.xml

218d125

Remove uncompacted spellcheck exclusion

6153816


		### Segment `inputSpec`

		The segment `inputSpec` is deprecated, instructions for usage will no longer be documented. Please use the above 2 `inputSpec` instead.

Conversation

GWphua commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Description

Release note

Key changed/added classes in this PR

Uh oh!

Uh oh!

gianm commented Feb 12, 2026

Uh oh!

GWphua commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cecemei commented Feb 24, 2026

Uh oh!

GWphua commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kfaraz commented Mar 3, 2026

Uh oh!

GWphua commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kfaraz commented Mar 9, 2026

Uh oh!

GWphua commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kfaraz commented Mar 12, 2026

Uh oh!

GWphua commented Mar 13, 2026

Uh oh!

kfaraz commented Mar 13, 2026

Uh oh!

GWphua commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

GWphua commented Mar 19, 2026

Uh oh!

kfaraz commented Mar 19, 2026

Uh oh!

kfaraz left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

GWphua commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

GWphua commented Feb 12, 2026 •

edited

Loading

GWphua commented Feb 13, 2026 •

edited

Loading

GWphua commented Feb 25, 2026 •

edited

Loading

GWphua commented Mar 4, 2026 •

edited

Loading

GWphua commented Mar 10, 2026 •

edited

Loading

GWphua commented Mar 13, 2026 •

edited

Loading