Skip to content

Conversation

@sylwiaszunejko
Copy link
Collaborator

@sylwiaszunejko sylwiaszunejko commented Dec 18, 2025

This PR fixes inefficiencies in the host initialization mechanism when bootstrapping a cluster.

Previously, the driver created Host instances with connections from the contact points provided in the cluster configuration using random host IDs. After establishing the control connection and reading from system.peers, these initial Host instances were discarded and replaced with new ones created using the correct host metadata. This approach resulted in unnecessary creation and teardown of multiple connections.

Changes

  • The control connection is now initialized only using the endpoints specified in the cluster configuration.
  • After a successful control connection is established, the driver reads from system.local and system.peers.
  • Based on this metadata, Host instances are created with the correct host_id values.
  • Connections are then initialized directly on these properly constructed Host instances.

Refs: #619
Fixes: #622

@sylwiaszunejko
Copy link
Collaborator Author

Some tests are still failing, but I wanted to ask if the direction is good @dkropachev

@sylwiaszunejko
Copy link
Collaborator Author

@Lorak-mmk maybe you know, why this test assumes that the new_host should be different?

def test_get_control_connection_host(self):
        """
        Test to validate Cluster.get_control_connection_host() metadata

        @since 3.5.0
        @jira_ticket PYTHON-583
        @expected_result the control connection metadata should accurately reflect cluster state.

        @test_category metadata
        """

        host = self.cluster.get_control_connection_host()
        assert host == None

        self.session = self.cluster.connect()
        cc_host = self.cluster.control_connection._connection.host

        host = self.cluster.get_control_connection_host()
        assert host.address == cc_host
        assert host.is_up == True

        # reconnect and make sure that the new host is reflected correctly
        self.cluster.control_connection._reconnect()
        new_host = self.cluster.get_control_connection_host()
        assert host != new_host

@Lorak-mmk
Copy link

Lorak-mmk commented Dec 18, 2025

I have no idea.
In Rust Driver we have logic that if CC breaks then we try to connect it to all other hosts (because the one it was connected to is presumed non-working for now).
I see no such logic in Python Driver. This part was added in commit 2796ee5:
image

Was this test passing until now and non-flaky? If so, then perhaps there is such logic somewhere.

@Lorak-mmk
Copy link

Now that I think of it: I see that driver uses LBP to decide order of hosts to connect. See _connect_host_in_lbp and _reconnect_internal.
LBP uses by default is Round Robin, so on reconnect it will start from a different host than at the beginning, right? It would explain why each CC reconnect should land at different host in healthy cluster.

@sylwiaszunejko
Copy link
Collaborator Author

Now that I think of it: I see that driver uses LBP to decide order of hosts to connect. See _connect_host_in_lbp and _reconnect_internal. LBP uses by default is Round Robin, so on reconnect it will start from a different host than at the beginning, right? It would explain why each CC reconnect should land at different host in healthy cluster.

Makes sense, second question: in this test:

def test_profile_lb_swap(self):
        """
        Tests that profile load balancing policies are not shared

        Creates two LBP, runs a few queries, and validates that each LBP is execised
        seperately between EP's

        @since 3.5
        @jira_ticket PYTHON-569
        @expected_result LBP should not be shared.

        @test_category config_profiles
        """
        query = "select release_version from system.local where key='local'"
        rr1 = ExecutionProfile(load_balancing_policy=RoundRobinPolicy())
        rr2 = ExecutionProfile(load_balancing_policy=RoundRobinPolicy())
        exec_profiles = {'rr1': rr1, 'rr2': rr2}
        with TestCluster(execution_profiles=exec_profiles) as cluster:
            session = cluster.connect(wait_for_all_pools=True)

            # default is DCA RR for all hosts
            expected_hosts = set(cluster.metadata.all_hosts())
            rr1_queried_hosts = set()
            rr2_queried_hosts = set()

            rs = session.execute(query, execution_profile='rr1')
            rr1_queried_hosts.add(rs.response_future._current_host)
            rs = session.execute(query, execution_profile='rr2')
            rr2_queried_hosts.add(rs.response_future._current_host)

            assert rr2_queried_hosts == rr1_queried_hosts

in this tests it is assumed that both queries should use the same host, as they use different instances of RoundRobinPolicy and they start from the same host? But how this can be true if the position when we start is randomized here: https://github.com/scylladb/python-driver/blob/master/cassandra/policies.py#L182

@Lorak-mmk
Copy link

No idea. Perhaps populate is not called for those policies for some reason, and they are populated using on_up/down etc?
Try to print a log / stacktrace in populate and run this test.

Comment on lines +267 to +269
if not self.local_dc:
self.local_dc = dc
return HostDistance.LOCAL
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should not be in this PR

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sylwiaszunejko, what is the reason for having it here ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, it is not obvious, nor explained anywhere.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sylwiaszunejko , it looks like you reintroduced it in recent push.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is actually needed for any test to pass, now the distance is called before on_add/up in add_or_renew_pool and we need local_dc to have not null value there, I agree it wasn't explained enough, if it is None all Hosts are marked as ignored

@sylwiaszunejko
Copy link
Collaborator Author

Now that I think of it: I see that driver uses LBP to decide order of hosts to connect. See _connect_host_in_lbp and _reconnect_internal. LBP uses by default is Round Robin, so on reconnect it will start from a different host than at the beginning, right? It would explain why each CC reconnect should land at different host in healthy cluster.

Makes sense, second question: in this test:

def test_profile_lb_swap(self):
        """
        Tests that profile load balancing policies are not shared

        Creates two LBP, runs a few queries, and validates that each LBP is execised
        seperately between EP's

        @since 3.5
        @jira_ticket PYTHON-569
        @expected_result LBP should not be shared.

        @test_category config_profiles
        """
        query = "select release_version from system.local where key='local'"
        rr1 = ExecutionProfile(load_balancing_policy=RoundRobinPolicy())
        rr2 = ExecutionProfile(load_balancing_policy=RoundRobinPolicy())
        exec_profiles = {'rr1': rr1, 'rr2': rr2}
        with TestCluster(execution_profiles=exec_profiles) as cluster:
            session = cluster.connect(wait_for_all_pools=True)

            # default is DCA RR for all hosts
            expected_hosts = set(cluster.metadata.all_hosts())
            rr1_queried_hosts = set()
            rr2_queried_hosts = set()

            rs = session.execute(query, execution_profile='rr1')
            rr1_queried_hosts.add(rs.response_future._current_host)
            rs = session.execute(query, execution_profile='rr2')
            rr2_queried_hosts.add(rs.response_future._current_host)

            assert rr2_queried_hosts == rr1_queried_hosts

in this tests it is assumed that both queries should use the same host, as they use different instances of RoundRobinPolicy and they start from the same host? But how this can be true if the position when we start is randomized here: https://github.com/scylladb/python-driver/blob/master/cassandra/policies.py#L182

This test was working because populate was called before cc was created, so we only knew about contact points provided in cluster config (so only one host) I believe current approach (calling populate on lbp after creating cc so we can update lbp with all known hosts) is much better so we should remove this test @Lorak-mmk WDYT?

@Lorak-mmk
Copy link

In the previous approach (calling populate with one host) were the on_add calls correct (so one call for each host, besides CC host)?
If so, then both versions are correct. I think we could then switch to proposed version.

@Lorak-mmk
Copy link

You could then adjust the test, not remove it.

@sylwiaszunejko
Copy link
Collaborator Author

sylwiaszunejko commented Dec 19, 2025

In the previous approach (calling populate with one host) were the on_add calls correct (so one call for each host, besides CC host)? If so, then both versions are correct. I think we could then switch to proposed version.

on_add is called properly, but if there is only one host during populate the starting position for RoundRobinPolicy is always the same even if some hosts are added later:

if len(hosts) > 1:
            self._position = randint(0, len(hosts) - 1)

@sylwiaszunejko sylwiaszunejko force-pushed the remove_random_ids branch 2 times, most recently from adddec1 to 3e864fc Compare December 20, 2025 12:58
@sylwiaszunejko sylwiaszunejko self-assigned this Dec 22, 2025
@sylwiaszunejko sylwiaszunejko marked this pull request as ready for review December 22, 2025 13:21
@Lorak-mmk
Copy link

Please let me review before merging,

Comment on lines 462 to 469
try:
host = [live_hosts[self.host_index_to_use]]
if len(live_hosts) > self.host_index_to_use:
host = [live_hosts[self.host_index_to_use]]
except IndexError as e:
raise IndexError(
'You specified an index larger than the number of hosts. Total hosts: {}. Index specified: {}'.format(
len(live_hosts), self.host_index_to_use
)) from e
return host

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously index error (happening if len(live_hosts) <= host_index_to_use) was caught, error was printed, and then exception rethrown (presumably failing the test).
Now you introduced an if which prevents IndexError from happening at all.

  • If this change really is desirable, the code handling IndexError should be removed - it is dead.
  • Please explain reason for this change. Why this condition should now return empty plan instead of exception?

Comment on lines 221 to 234
with pytest.raises((WriteTimeout, Unavailable)):
self.session.execute(query, timeout=None)
finally:
get_node(1).resume()

# Change the scales stats_name of the cluster2
cluster2.metrics.set_stats_name('cluster2-metrics')

stats_cluster1 = self.cluster.metrics.get_stats()
stats_cluster2 = cluster2.metrics.get_stats()

# Test direct access to stats
assert 1 == self.cluster.metrics.stats.write_timeouts
assert (1 == self.cluster.metrics.stats.write_timeouts or 1 == self.cluster.metrics.stats.unavailables)
assert 0 == cluster2.metrics.stats.write_timeouts

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did the exception thrown change?

Comment on lines +267 to +269
if not self.local_dc:
self.local_dc = dc
return HostDistance.LOCAL

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, it is not obvious, nor explained anywhere.

Comment on lines +142 to +145
if not self.get_host(connection.original_endpoint):
return

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment explaining what is going on here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wrong fix, we need to address it in a separate PR, correct fix would be to pull version from the system.local when this information is absent.
Here is the issue for it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment, let me know if this can stay for now, or should I change it

Comment on lines 3862 to 3865
else:
log.info("Consider local host new found host")
peers_result.append(local_row)
# Check metadata.partitioner to see if we haven't built anything yet. If

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow I'm not even able to tell which if this else belongs to.
The message looks quite unhelpful. I would have no idea how to intrpret it if I saw it in the logs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It belonts to multiline commit that makes totaly sense:

Check metadata.partitioner to see if we haven't built anything yet. If
every node in the cluster was in the contact points, we won't discover
any new nodes, so we need this additional check.  (See PYTHON-90)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the if to have this at the beginning and changed log message to more clear I believe, please check if it works now

Let control connection use resolved contact points from
cluster config if lbp is not yet initialized.
… starting point

The `test_profile_lb_swap` test logic assumed that `populate`
was called before control connection (cc) was created, meaning
only the contact points from the cluster configuration were
known (a single host). Due to that the starting point was not random.

This commit updates the test to reflect the new behavior, where `populate`
is called on the load-balancing policy after the control connection is
created. This allows the policy to be updated with all known hosts and
ensures the starting point is properly randomized.
Previously, the driver relied on the load-balancing policy (LBP) to determine
the order of hosts to connect to. Since the default LBP is Round Robin, each
reconnection would start from a different host.

After removing fake hosts with random IDs at startup, this behavior changed.
When the LBP is not yet initialized, the driver now uses the endpoints provided
by the control connection (CC), so there is no guarantee that different hosts
will be selected on reconnection.

This change updates the test logic to first establish a connection and
initialize the LBP, and only then verify that two subsequent reconnections
land on different hosts in a healthy cluster.
Only compare hosts endpoints not whole Host instances as we
don't know hosts ids.
If `distance` is called before `on_add/up` then we would end up
with null `local_dc` value if the value wasn't specified in the
constructor.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Control connection reconnect can mark nodes down

3 participants