Skip to content

Improve offset auto reset#18301

Open
FrankChen021 wants to merge 4 commits intoapache:masterfrom
FrankChen021:auto-reset
Open

Improve offset auto reset#18301
FrankChen021 wants to merge 4 commits intoapache:masterfrom
FrankChen021:auto-reset

Conversation

@FrankChen021
Copy link
Member

@FrankChen021 FrankChen021 commented Jul 21, 2025

Fixes #18282

Description

Improve the auto reset logic as proposed in the above issue:

  1. only supervisor is responsible for auto reset. task never triggers the auto reset flow
  2. task always throws OffsetOutOfRangeException if this exception is raised, but will publish ingested message under such case
  3. supervisor resets offset to earliest position if it detects offset of a particular partition is unavailable
  4. supervisor starts task more quickly if it needs to reset offsets
  5. update the obscure error message like DataSourceMetadata is not found while reset in task report to make it intuitive when supervisor kills a task because of offset reset

NOTE

this relies on #18226, which publishes segments when OffsetOutOfRangeException is raised.

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR improves the offset auto reset logic in seekable stream supervisors by centralizing reset responsibility to supervisors only and improving error messaging. The changes ensure only supervisors handle auto resets while tasks always throw OffsetOutOfRangeException, and provide clearer error messages when offsets are reset.

  • Enhanced supervisor-only auto reset logic with better offset availability checking
  • Improved error messages for task shutdown during offset resets
  • Removed task-level auto reset handling to centralize control in supervisors

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
SeekableStreamSupervisor.java Modified resetInternal method to accept autoReset flag and improved offset availability checking logic
KafkaIndexTaskRunner.java Removed task-level auto reset handling, now only throws OffsetOutOfRangeException
SeekableStreamSupervisorStateTest.java Updated test expectations for new error message format and added recordSupplier mock
KinesisSupervisorTest.java Updated test calls to resetInternal method and adjusted test expectations
KafkaSupervisorTest.java Updated test calls to resetInternal method and corrected test data expectations

…blestream/supervisor/SeekableStreamSupervisor.java

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@github-actions
Copy link

This pull request has been marked as stale due to 60 days of inactivity.
It will be closed in 4 weeks if no further activity occurs. If you think
that's incorrect or this pull request should instead be reviewed, please simply
write any comment. Even if closed, you can still revive the PR at any time or
discuss it on the dev@druid.apache.org list.
Thank you for your contributions.

@github-actions github-actions bot added the stale label Sep 20, 2025
@github-actions
Copy link

This pull request/issue has been closed due to lack of activity. If you think that
is incorrect, or the pull request requires review, you can revive the PR at any time.

@github-actions github-actions bot closed this Oct 18, 2025
@FrankChen021 FrankChen021 reopened this Feb 13, 2026
@github-actions github-actions bot removed the stale label Feb 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Correctly reset kafka offset if auto reset is enabled

2 participants