Skip to content

[fix][broker] Fix partitioned topic auto deletion http request timeout#24912

Open
oneby-wang wants to merge 5 commits intoapache:masterfrom
oneby-wang:partitioned_topic_auto_deletion_http_timeout_fix
Open

[fix][broker] Fix partitioned topic auto deletion http request timeout#24912
oneby-wang wants to merge 5 commits intoapache:masterfrom
oneby-wang:partitioned_topic_auto_deletion_http_timeout_fix

Conversation

@oneby-wang
Copy link
Copy Markdown
Contributor

@oneby-wang oneby-wang commented Oct 29, 2025

Fixes #24879

Motivation

Partitioned-topic auto deletion call chain:

  1. gc process(invoked on broker0)
  2. delete partitioned-topic admin api(invoked on broker0)
  3. delete topic-partition-0 admin api(invoked on broker0)

If brokerClient's connectionsPerBroker is 1, then broker0 wait itself to release the connection, the result is timeout. So we check topic-partition-0 existence first to avoid connection pool deadlock.

See also: #24879 (comment)

Modifications

When calling delete partitioned-topic admin api, first check topic partitions existence to avoid calling admin api by broker itself again, which would help partitioned-topic auto deletion in gc process.

Verifying this change

  • Make sure that the change passes the CI checks.

This change is already covered by existing tests, such as (please describe tests).

InactiveTopicDeleteTest#testWhenSubPartitionNotDelete() test method.

@Test
public void testWhenSubPartitionNotDelete() throws Exception {
conf.setBrokerDeleteInactiveTopicsMode(InactiveTopicDeleteMode.delete_when_no_subscriptions);
conf.setBrokerDeleteInactivePartitionedTopicMetadataEnabled(true);
conf.setBrokerDeleteInactiveTopicsFrequencySeconds(1);
super.baseSetup();
final String topic = "persistent://prop/ns-abc/testDeleteWhenNoSubscriptions";
final TopicName topicName = TopicName.get(topic);
admin.topics().createPartitionedTopic(topic, 5);
pulsarClient.newProducer().topic(topic).create().close();
pulsarClient.newConsumer().topic(topic).subscriptionName("sub").subscribe().close();
Thread.sleep(2000);
// Topic should not be deleted
Assert.assertTrue(admin.topics().getPartitionedTopicList("prop/ns-abc").contains(topic));
admin.topics().deleteSubscription(topic, "sub");
Awaitility.await().untilAsserted(() -> {
// Now the topic should be deleted
Assert.assertFalse(admin.topics().getPartitionedTopicList("prop/ns-abc").contains(topic));
});
}

But the default connectionsPerBroker is 16, which can not reproduce the problem.

public PulsarAdminBuilderImpl() {
this.conf = new ClientConfigurationData();
this.conf.setConnectionsPerBroker(16);
}

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository: oneby-wang#8

@github-actions github-actions Bot added the doc-not-needed Your PR changes do not impact docs label Oct 29, 2025
@oneby-wang oneby-wang changed the title [fix][admin] Fix partitioned topic auto deletion http request timeout [fix][broker] Fix partitioned topic auto deletion http request timeout Oct 31, 2025
@Technoboy-
Copy link
Copy Markdown
Contributor

Could you please add a test for this pr ?

@oneby-wang
Copy link
Copy Markdown
Contributor Author

oneby-wang commented Dec 3, 2025

Hi, @Technoboy-, thanks for reminding me. This PR is covered by existing test, see InactiveTopicDeleteTest#testWhenSubPartitionNotDelete() test method.

But I configured the admin connectionsPerBroker param of PulsarService incorrectly before, now it is correct, see commit: edaff51.

Without this PR's fix, InactiveTopicDeleteTest#testWhenSubPartitionNotDelete() test method will fail due to assert timeout, because admin client connection pool(just 1 connection) is exhausted due to connection pool deadlock, and test failure messages like this:

org.awaitility.core.ConditionTimeoutException: Assertion condition defined as a org.apache.pulsar.broker.service.InactiveTopicDeleteTest expected [false] but found [true] within 10 seconds.
	at org.awaitility.core.ConditionAwaiter.await(ConditionAwaiter.java:167)
	at org.awaitility.core.AssertionCondition.await(AssertionCondition.java:119)
	at org.awaitility.core.AssertionCondition.await(AssertionCondition.java:31)
	at org.awaitility.core.ConditionFactory.until(ConditionFactory.java:985)
	at org.awaitility.core.ConditionFactory.untilAsserted(ConditionFactory.java:769)

This PR fixes the issue, so it can pass now.

@oneby-wang
Copy link
Copy Markdown
Contributor Author

oneby-wang commented Dec 3, 2025

Beyond this PR's issue , one more issue exists in #24879.

Race condition when deleting partitioned topic, see #24879 (comment). I haven't figured out the side effects of the race condition yet, and haven't found a good solution to solve the race condition. It seems like a distributed lock or cas operation is needed, because multi brokers may fire the GC process at the same time, but this solution needs more discussion.

@lhotari lhotari modified the milestones: 4.2.0, 4.3.0 Apr 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

doc-not-needed Your PR changes do not impact docs ready-to-test

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Auto topic delete of partitioned topics has a MetadataCacheImpl race condition

3 participants