[BUGFIX] Sync resources to all pods instead of service#412
Open
rickardsjp wants to merge 3 commits into
Open
Conversation
For file-based storage with multiple replicas, the operator must push resources to each pod individually rather than through the Service. This adds the factory method that lists ready pods and creates a client per pod IP. Signed-off-by: Jeremy Rickards <jeremy.rickards@sap.com>
Refactor dashboard, datasource, and globaldatasource controllers to use CreateClientsForAllPods, pushing resources to each ready pod individually rather than through the round-robin Service. Signed-off-by: Jeremy Rickards <jeremy.rickards@sap.com>
Signed-off-by: Jeremy Rickards <jeremy.rickards@sap.com>
There was a problem hiding this comment.
Pull request overview
This PR fixes multi-replica file-based Perses deployments by syncing dashboards/datasources/globaldatasources to each ready Perses pod directly, rather than sending requests through the ClusterIP Service (which load-balances to a single replica).
Changes:
- Add a
CreateClientsForAllPodsAPI in the Perses client factory to list ready pods and create one REST client per pod endpoint. - Update Dashboard/Datasource/GlobalDatasource controllers to sync/delete resources across all returned clients with best-effort error aggregation.
- Expand controller RBAC markers to allow listing pods.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| internal/perses/common/perses_client_factory.go | Adds multi-pod client creation by listing ready pods and building per-pod REST clients. |
| controllers/dashboards/persesdashboard_controller.go | Adds pod list/watch RBAC marker for dashboard controller. |
| controllers/dashboards/dashboard_controller.go | Syncs/deletes dashboards across all pod clients instead of a single service client. |
| controllers/datasources/persesdatasource_controller.go | Adds pod list/watch RBAC marker for datasource controller. |
| controllers/datasources/datasource_controller.go | Syncs/deletes datasources across all pod clients instead of a single service client. |
| controllers/globaldatasources/persesglobaldatasource_controller.go | Adds pod list/watch RBAC marker for global datasource controller. |
| controllers/globaldatasources/globaldatasource_controller.go | Syncs/deletes global datasources across all pod clients instead of a single service client. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if !isPodReady(pod) { | ||
| continue | ||
| } | ||
| urlStr := fmt.Sprintf("%s://%s:%d%s", httpProtocol, pod.Status.PodIP, containerPort, perses.Spec.Config.APIPrefix) |
Comment on lines
16
to
23
| import ( | ||
| "context" | ||
| "flag" | ||
| "fmt" | ||
| "os" | ||
|
|
||
| corev1 "k8s.io/api/core/v1" | ||
| "sigs.k8s.io/controller-runtime/pkg/client" |
Comment on lines
+105
to
+107
| if len(errs) > 0 { | ||
| return subreconciler.RequeueWithErrorAndReason(errors.Join(errs...), persescommon.ReasonBackendError) | ||
| } |
Comment on lines
142
to
147
| if persescommon.HasSecretConfig(datasource.Spec.Client) { | ||
| _, reason, err := r.syncPersesSecret(ctx, persesClient, datasource) | ||
| _, _, err := r.syncPersesSecret(ctx, persesClient, datasource) | ||
| if err != nil { | ||
| dlog.WithError(err).Errorf("Failed to create datasource secret: %s", datasource.Name) | ||
| return subreconciler.RequeueWithErrorAndReason(err, reason) | ||
| return err | ||
| } |
Comment on lines
+103
to
105
| if len(errs) > 0 { | ||
| return subreconciler.RequeueWithErrorAndReason(errors.Join(errs...), persescommon.ReasonBackendError) | ||
| } |
Comment on lines
112
to
117
| if persescommon.HasSecretConfig(globaldatasource.Spec.Client) { | ||
| _, reason, err := r.syncPersesGlobalSecret(ctx, persesClient, globaldatasource) | ||
| _, _, err := r.syncPersesGlobalSecret(ctx, persesClient, globaldatasource) | ||
| if err != nil { | ||
| gdlog.WithError(err).Errorf("Failed to create globaldatasource secret: %s", globaldatasource.Name) | ||
| return subreconciler.RequeueWithErrorAndReason(err, reason) | ||
| return err | ||
| } |
Comment on lines
+160
to
+164
| func (f *PersesClientFactoryWithConfig) CreateClientsForAllPods(ctx context.Context, k8sClient client.Reader, perses persesv1alpha2.Perses) ([]v1.ClientInterface, error) { | ||
| if perses.Spec.Config.Database.SQL != nil { | ||
| c, err := f.CreateClient(ctx, k8sClient, perses) | ||
| if err != nil { | ||
| return nil, err |
Member
|
How would this work with TLS enabled? If Perses TLS certs are issued for the Service DNS, direct pod ip calls might fail validation IIUC. Also if we scale the replicas till a reconciliation is triggered new replica won't get the updates right? |
13 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
For file-based storage with multiple replicas, push resources to each pod individually rather than through the ClusterIP Service. The operator now lists ready pods by label selector, creates an HTTP client per pod IP, and syncs to all of them with best-effort error aggregation.
Closes: #409
Type of change
FEATURE(non-breaking change which adds functionality)ENHANCEMENT(non-breaking change which improves existing functionality)BUGFIX(non-breaking change which fixes an issue)BREAKINGCHANGE(fix or feature that would cause existing functionality to not work as expected)DOC(documentation only)IGNORE(tooling, build system, CI, etc.)Verification
Checklist