Skip to content

config: support log timeout action#68964

Open
bb7133 wants to merge 5 commits into
masterfrom
bb7133/log-timeout-action
Open

config: support log timeout action#68964
bb7133 wants to merge 5 commits into
masterfrom
bb7133/log-timeout-action

Conversation

@bb7133

@bb7133 bb7133 commented Jun 4, 2026

Copy link
Copy Markdown
Member

Issue Number: close #68968

Problem Summary:

TiDB currently uses [log].timeout to detect stuck log writes and panic when the timeout is reached. This keeps the existing fail-fast behavior, but some deployments may prefer to keep TiDB serving traffic and discard later log writes after the timeout is observed.

What changed:

  • Add [log].timeout-action to TiDB config, defaulting to panic.
  • Accept valid values panic and discard, with case-insensitive validation/normalization.
  • Pass the setting through Log.ToLogConfig().
  • Update the enterprise audit submodule so audit logs also respect [log].timeout and [log].timeout-action.

Valid values:

  • panic: keep the existing behavior and panic on timeout. This is the default.
  • discard: after the timeout is observed, discard later log write/sync calls instead of panicking.

Dependencies:

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Manual test:

  • go test ./pkg/config -run 'TestLogConfig|TestLogTimeoutAction|TestConfig$' -count=1
  • go test --tags=intest ./pkg/extension/enterprise/audit -run 'TestAuditLoggerUsesGlobalLogTimeoutConfig|TestAuditLogger$' -count=1
  • git diff --check

Side effects:

  • panic remains the default and preserves existing behavior.
  • discard mode may lose log records after a timeout is observed. It cannot cancel a goroutine already blocked in the underlying OS/file write; it only prevents later log write/sync calls from continuing to block or panic.

Documentation:

  • New config item should be documented in the TiDB configuration reference.

Release note:

Add the `[log].timeout-action` configuration item to choose whether TiDB panics or discards later log writes when `[log].timeout` is reached.

Summary by CodeRabbit

  • New Features
    • Added log.timeout-action configuration option to control logging timeout behavior: panic (default) or discard (case-insensitive), with validation for invalid values.
  • Tests
    • Added coverage for default, configured, and invalid timeout-action values.
  • Chores
    • Updated logging-related dependencies and pinned artifacts.
    • Updated build system settings and the enterprise extension reference.

@ti-chi-bot ti-chi-bot Bot added do-not-merge/needs-linked-issue do-not-merge/needs-tests-checked do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Jun 4, 2026
@pantheon-ai

pantheon-ai Bot commented Jun 4, 2026

Copy link
Copy Markdown

@bb7133 I've received your pull request and will start the review. I'll conduct a thorough review covering code quality, potential issues, and implementation details.

⏳ This process typically takes 10-30 minutes depending on the complexity of the changes.

ℹ️ Learn more details on Pantheon AI.

@ti-chi-bot ti-chi-bot Bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jun 4, 2026
@ti-chi-bot

ti-chi-bot Bot commented Jun 4, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign cfzjywxk, terry1purcell for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai

coderabbitai Bot commented Jun 4, 2026

Copy link
Copy Markdown

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 0cbff58a-46e2-4576-ada6-227999b124d4

📥 Commits

Reviewing files that changed from the base of the PR and between f219a2b and ad61f32.

📒 Files selected for processing (1)
  • pkg/config/BUILD.bazel
✅ Files skipped from review due to trivial changes (1)
  • pkg/config/BUILD.bazel

📝 Walkthrough

Walkthrough

Adds a timeout-action field to the Log config struct with values panic (default) or discard. The field is validated case-insensitively in Config.Valid() and wired into the underlying zaplog.Config. The github.com/pingcap/log dependency is bumped in go.mod and DEPS.bzl. The enterprise submodule pointer is also updated.

Changes

Log Timeout Action Configuration Feature

Layer / File(s) Summary
Config field, defaults, and dependency pins
pkg/config/config.go, go.mod, DEPS.bzl
Adds TimeoutAction string field with toml:"timeout-action" / json:"timeout-action" tags to Log struct; sets default to zaplog.LogTimeoutActionPanic; bumps github.com/pingcap/log pseudo-version in go.mod and DEPS.bzl; removes // indirect marker from lumberjack.v2.
Validation and zaplog wiring
pkg/config/config.go
Lowercases log.timeout-action in Config.Valid(), maps "" and "panic" to panic, maps "discard" to discard, returns error for any other value; wires Log.TimeoutAction into zaplog.Config.TimeoutAction inside ToLogConfig().
Tests
pkg/config/config_test.go, pkg/config/BUILD.bazel
Reformats existing ToLogConfig() assertions to include TimeoutAction; adds TestLogTimeoutAction covering default value, normalization of enum and string input ("DISCARD"), and validation failure for "invalid"; increments test shard count from 37 to 38.

Enterprise Submodule Update

Layer / File(s) Summary
Enterprise submodule pointer update
pkg/extension/enterprise
Submodule reference updated to a new commit hash.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 A rabbit hops where logs may stall,
With timeout-action set for all—
Say panic to crash, or discard to glide,
The lumberjack now walks by our side.
Modules bumped and submodules, too,
Every log write knows just what to do! 🌿

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'config: support log timeout action' directly and concisely summarizes the main change—adding a new configuration option for controlling timeout action behavior.
Description check ✅ Passed The PR description comprehensively addresses all template requirements: issue number, problem summary, detailed explanation of changes, checklist items, side effects, documentation needs, and release notes.
Linked Issues check ✅ Passed All coding requirements from issue #68968 are met: new config field with validation, default value of 'panic', case-insensitive value handling, integration with Log.ToLogConfig(), and enterprise audit updates.
Out of Scope Changes check ✅ Passed All changes directly support the feature objective: config field, validation logic, test coverage, dependency updates (go.mod, DEPS.bzl), Bazel test shard count adjustment, and submodule pointer update.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch bb7133/log-timeout-action

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ti-chi-bot ti-chi-bot Bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/needs-linked-issue do-not-merge/needs-tests-checked do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Jun 4, 2026
@codecov

codecov Bot commented Jun 5, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 50.00000% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.9516%. Comparing base (b1d4142) to head (ad61f32).

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #68964        +/-   ##
================================================
- Coverage   76.3269%   74.9516%   -1.3754%     
================================================
  Files          2041       2031        -10     
  Lines        562433     570432      +7999     
================================================
- Hits         429288     427548      -1740     
- Misses       132238     142756     +10518     
+ Partials        907        128       -779     
Flag Coverage Δ
integration 41.1641% <50.0000%> (+1.5062%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 60.4610% <ø> (ø)
parser ∅ <ø> (∅)
br 49.4128% <ø> (-13.3567%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@lance6716 lance6716 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems there's no test to verify the behaviour. Maybe inject some blocking to IO and check panic / discard really work

@tiprow

tiprow Bot commented Jun 14, 2026

Copy link
Copy Markdown

@bb7133: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
fast_test_tiprow ad61f32 link true /test fast_test_tiprow

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@ti-chi-bot

ti-chi-bot Bot commented Jun 14, 2026

Copy link
Copy Markdown

@bb7133: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-build-next-gen ad61f32 link true /test pull-build-next-gen
idc-jenkins-ci-tidb/unit-test ad61f32 link true /test unit-test
idc-jenkins-ci-tidb/build ad61f32 link true /test build
pull-unit-test-next-gen ad61f32 link true /test pull-unit-test-next-gen
idc-jenkins-ci-tidb/check_dev_2 ad61f32 link true /test check-dev2
pull-integration-realcluster-test-next-gen ad61f32 link true /test pull-integration-realcluster-test-next-gen

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support configurable action when log writes time out

3 participants