New failover implementation by pbrezina · Pull Request #8566 · SSSD/sssd

pbrezina · 2026-03-31T10:33:31Z

This pull request is intended to be a start of a "failover" feature branch where other developers will be able to contribute.

The main failover logic works, compiles and can be tested using a "minimal" provider that is included as an example. The purpose of the "minimal" provider is only to test the failover without the need to port full provider code and itwill be removed prior pushing the contents to the master branch. See how to set it up in minimal-provider-notes.txt and see the switch to new failover in commit minimal: switch to new failover for service lookup and user authentication - this is the minimal set of changes to get it working, but the real port should get and will require more refactoring.

The work is still not finished and there is missing functionality. This functionality, however, can be implemented in small areas of code and should not require larger changes or glues in the whole code base, so this is ready for review. Remaining work is tracked at [1]. Feel free to take any of these tickets and open new tickets when you find something missing.

When reviewing, you can start with src/providers/failover/readme.md that provides high level documentation of the code. And of course do not forget the design page [2].

Thanks, Pavel

(cherry picked from commit 0f5f3b6)

so it can be directly modified

So it can be modified later.

This crafts and implements the new failover interface, it does not provide complete implementation of the failover mechanism yet. It brings the code to a state were the public and private interfaces are stable, working and testable so the following tasks can be split and work on in parallel. What is missing at this state: - server configuration and discovery (failover_server_group/batch/vtable_op) - server selection mechanism (sss_failover_vtable_op_server_next) - kerberos authentication - sharing servers between IPA/AD LDAP and KDC - online/offline callbacks (resolve callback should not be needed) But especially it is possible to start refactoring SSSD code to start using the new failover implementation.

authentication

gemini-code-assist

Code Review

This pull request implements a new failover mechanism for SSSD, introducing prioritized server groups, parallelized candidate server discovery, and a transaction-based API for automated retries. It also provides a minimal provider implementation to demonstrate the new architecture. Critical logic bugs were identified in the server group resolution logic, where duplicate detection causes premature loop exit, and in the address change detection function, which currently returns inverted results.

gemini-code-assist · 2026-03-31T10:38:17Z

src/providers/failover/failover_group.c

+    for (j = 0; state->group->discovered_servers[j] != NULL; j++, i++) {
+        found = false;
+        current = state->group->discovered_servers[j];
+        for (k = 0; out[k] != NULL; k++) {
+            if (sss_failover_server_equal(out[k], current)) {
+                found = true;
+                break;
+            }
+        }
+
+        if (found) {
+            break;
+        }
+
+        out[i] = talloc_reference(out, current);
+        if (out[i] == NULL) {
+            DEBUG(SSSDBG_CRIT_FAILURE, "Out of memory!\n");
+            talloc_free(out);
+            return ENOMEM;
+        }
+    }


There's a logic error in the loop that merges discovered servers and removes duplicates. The break statement on line 349 will cause the loop to terminate prematurely if a duplicate server is found, preventing any subsequent discovered servers from being added to the list.

This should be a continue, but simply changing it would also be incorrect due to the i++ in the loop's post-increment step, which would create a hole in the out array.

The loop should be refactored to correctly handle duplicates without terminating early.

for (j = 0; state->group->discovered_servers[j] != NULL; j++) { found = false; current = state->group->discovered_servers[j]; for (k = 0; out[k] != NULL; k++) { if (sss_failover_server_equal(out[k], current)) { found = true; break; } } if (found) { continue; } out[i] = talloc_reference(out, current); if (out[i] == NULL) { DEBUG(SSSDBG_CRIT_FAILURE, "Out of memory!\n"); talloc_free(out); return ENOMEM; } i++; }

gemini-code-assist · 2026-03-31T10:38:17Z

src/providers/failover/failover_server_resolve.c

+    return memcmp(server->addr->binary, hostent->addr_list[0]->ipaddr,
+                  server->addr->binary_len) == 0;


The logic in sss_failover_server_resolve_address_changed appears to be inverted. The function name suggests it should return true if the address has changed, but the implementation memcmp(...) == 0 returns true if the addresses are the same. This will cause incorrect behavior where address changes are not detected.

return memcmp(server->addr->binary, hostent->addr_list[0]->ipaddr, server->addr->binary_len) != 0;

src/providers/failover/failover_group.c

+
+    count = talloc_array_length(fctx->groups);
+
+    for (slot = 0; fctx->groups[slot] != NULL && slot < count; slot++) {


src/providers/minimal/minimal_id.c

+    return req;
+
+done:
+    if (ret == EOK) {


src/providers/minimal/minimal_ldap_auth.c

+#include "providers/failover/ldap/failover_ldap.h"
+
+static errno_t
+find_password_expiration_attributes(TALLOC_CTX *mem_ctx,


src/providers/minimal/minimal_id.c

+    switch (ar->entry_type & BE_REQ_TYPE_MASK) {
+    case BE_REQ_SERVICES:
+        DEBUG(SSSDBG_TRACE_FUNC, "Executing BE_REQ_SERVICES request\n");
+
+        subreq = minimal_services_get_send(state, be_ctx->ev, fctx, id_ctx,
+                                           sdom, ar->filter_value,
+                                           ar->extra_value, ar->filter_type,
+                                           noexist_delete);
+        break;
+    default: /*fail*/
+        ret = EINVAL;
+        state->err = "Invalid request type";
+        DEBUG(SSSDBG_OP_FAILURE,
+              "Unexpected request type: 0x%X [%s:%s] in %s\n",
+              ar->entry_type, ar->filter_value,
+              ar->extra_value?ar->extra_value:"-",
+              ar->domain);
+        goto done;
+    }


src/providers/minimal/minimal_id.c

+    switch (state->ar->entry_type & BE_REQ_TYPE_MASK) {
+    case BE_REQ_SERVICES:
+        err = "Service lookup failed";
+        ret = minimal_services_get_recv(subreq);
+        break;
+    default: /* fail */
+        ret = EINVAL;
+        break;
+    }


src/providers/minimal/minimal_id_services.c

+    // TODO handle how to yield ERR_SERVER_FAILED
+    // ret = sdap_id_op_done(state->op, ret, &dp_error);
+    // if (dp_error == DP_ERR_OK && ret != EOK) {
+    //     /* retry */
+    //     ret = minimal_services_get_retry(req);
+    //     if (ret != EOK) {
+    //         tevent_req_error(req, ret);
+    //         return;
+    //     }


src/providers/minimal/minimal_id_services.c

+    //     /* Return to the mainloop to retry */
+    //     return;
+    // }
+    // state->sdap_ret = ret;


src/providers/minimal/minimal_id_services.c

+    // /* An error occurred. */
+    // if (ret && ret != ENOENT) {
+    //     state->dp_error = dp_error;
+    //     tevent_req_error(req, ret);
+    //     return;
+    //}


pbrezina and others added 10 commits March 31, 2026 11:54

utils: add TEVENT_REQ_ERROR_TO_ERRNO

e3cf3fa

minimal: Initial implementation of minimal provider

93d8b76

(cherry picked from commit 0f5f3b6)

minimal: copy paste services code

ebc5f4a

so it can be directly modified

minimal: push init_ctx instead of id_ctx so we can easily pass new fctx

6d451e8

minimal: add authentication provider

6420ac4

minimal: copy paste authentication code

547807a

So it can be modified later.

minimal: switch to new failover for service lookup and user

52bbb3c

authentication

minimal: update how to use it notes

078895d

DO NOT PUSH TO MASTER ci: enable failover branch

c56d871

gemini-code-assist bot reviewed Mar 31, 2026

View reviewed changes

github-advanced-security bot found potential problems Mar 31, 2026

View reviewed changes

alexey-tikhonov self-assigned this Apr 1, 2026

alexey-tikhonov requested review from alexey-tikhonov April 1, 2026 09:05

alexey-tikhonov added Waiting for review no-backport This should go to target branch only. labels Apr 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New failover implementation#8566

New failover implementation#8566
pbrezina wants to merge 10 commits intoSSSD:failoverfrom
pbrezina:failover

pbrezina commented Mar 31, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 31, 2026

Uh oh!

gemini-code-assist bot Mar 31, 2026

Uh oh!

Check failure

Check warning

Check warning

Check notice

Check notice

Check notice

Check notice

Check notice

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		return memcmp(server->addr->binary, hostent->addr_list[0]->ipaddr,
		server->addr->binary_len) == 0;


		count = talloc_array_length(fctx->groups);

		for (slot = 0; fctx->groups[slot] != NULL && slot < count; slot++) {

Conversation

pbrezina commented Mar 31, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Check failure

Uh oh!

Check warning

Check warning

Check notice

Check notice

Check notice

Check notice

Check notice

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants