HttpContactPointBootstrap always dead after probing timeout

# Explain

In the cluster bootstrapping, we will create a child actor for handling HTTP probing, this actor will use the config `probingFailureTimeout` as the deadline time: 

https://github.com/apache/pekko-management/blob/7ed2b5b3be0d9ad6fb664e1f5a31c5ac9c21a2c5/management-cluster-bootstrap/src/main/scala/org/apache/pekko/management/cluster/bootstrap/internal/HttpContactPointBootstrap.scala#L99-L103

At the same time, we are using the same configuration `probingFailureTimeout` as probing future timeout too.

https://github.com/apache/pekko-management/blob/7ed2b5b3be0d9ad6fb664e1f5a31c5ac9c21a2c5/management-cluster-bootstrap/src/main/scala/org/apache/pekko/management/cluster/bootstrap/internal/HttpContactPointBootstrap.scala#L113-L116

There is only one way to handle these timeouts and deadlines, As you can see, because of the existence of a deadline, the `else ` logic will never be executed.

https://github.com/apache/pekko-management/blob/7ed2b5b3be0d9ad6fb664e1f5a31c5ac9c21a2c5/management-cluster-bootstrap/src/main/scala/org/apache/pekko/management/cluster/bootstrap/internal/HttpContactPointBootstrap.scala#L118-L127

# Discuss

I think we may need two configurations for deadline and timeout. In such cases, when there is network latency for the contact point node, the`HttpContactPointBootstrap` actor does not need to be frequently destroyed and created. At least we have some buffer time.

wdyt @pjfanning @He-Pin @mdedetrich @samueleresca 

	/**
	* If probing keeps failing until the deadline triggers, we notify the parent,
	* such that it rediscover again.
	*/
	private var probingKeepFailingDeadline: Deadline = settings.contactPoint.probingFailureTimeout.fromNow

	log.debug("Probing [{}] for seed nodes...", probeRequest.uri)
	val reply = http.singleRequest(probeRequest, settings = connectionPoolWithoutRetries).flatMap(handleResponse)
	val afterTimeout = after(settings.contactPoint.probingFailureTimeout, context.system.scheduler)(replyTimeout)
	Future.firstCompletedOf(List(reply, afterTimeout)).pipeTo(self)

	case Status.Failure(cause) =>
	log.warning("Probing [{}] failed due to: {}", probeRequest.uri, cause.getMessage)
	if (probingKeepFailingDeadline.isOverdue()) {
	log.error("Overdue of probing-failure-timeout, stop probing, signaling that it's failed")
	context.parent ! BootstrapCoordinator.Protocol.ProbingFailed(contactPoint, cause)
	context.stop(self)
	} else {
	// keep probing, hoping the request will eventually succeed
	scheduleNextContactPointProbing()
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HttpContactPointBootstrap always dead after probing timeout #209

Explain

Discuss

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

HttpContactPointBootstrap always dead after probing timeout #209

Description

Explain

Discuss

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions