Skip to content

Resources not synced to all pods when using file-based storage with multiple replicas #409

@rickardsjp

Description

@rickardsjp

Is there an existing issue for this?

  • I have searched the existing issues

What happened?

Description

When a Perses instance is configured with file-based storage (spec.config.database.file) and replicas > 1, dashboards, datasources, and global datasources are only pushed to a single pod instead of all replicas. This is because the operator syncs resources through the Kubernetes ClusterIP Service, which load-balances to one random pod. Since each pod has its own independent file-based storage (separate PVCs via the StatefulSet, or separate emptyDir volumes), the other pods never receive the resources.

This means users hitting the Perses UI through the Service will get inconsistent results depending on which pod they land on. Some requests return the dashboard, others return 404.

The same issue affects the delete path: deleting a CR only removes the resource from whichever pod the Service routes to.

Steps to Reproduce

  1. Create a Perses instance with file-based storage and 2+ replicas:

    apiVersion: perses.dev/v1alpha2
    kind: Perses
    metadata:
      name: perses
      labels:
        app.kubernetes.io/instance: perses
    spec:
      replicas: 2
      config:
        database:
          file:
            extension: yaml
            folder: /perses
  2. Wait for both pods to be ready.

  3. Create a PersesDashboard targeting that instance:

    apiVersion: perses.dev/v1alpha2
    kind: PersesDashboard
    metadata:
      name: my-dashboard
    spec:
      instanceSelector:
        matchLabels:
          app.kubernetes.io/instance: perses
      config:
        display:
          name: My Dashboard
        duration: 5m
        panels: {}
        layouts: []
  4. Port-forward to each pod individually and query the dashboards API:

    kubectl port-forward pod/perses-0 18080:8080
    curl http://localhost:18080/api/v1/projects/default/dashboards
    
    kubectl port-forward pod/perses-1 18081:8080
    curl http://localhost:18081/api/v1/projects/default/dashboards
    

Expected Result

Both pods return the dashboard.

Actual Result

Only one pod has the dashboard; the other returns an empty list (or errors because the project doesn't exist on that pod either).

Perses Operator Version

main branch (commit 43fdeb7)

Kubernetes Version

kubectl version -o yaml
clientVersion:
  buildDate: "2026-05-12T09:51:33Z"
  compiler: gc
  gitCommit: 756939600b9a7180fc2df6550a4585b638875e67
  gitTreeState: clean
  gitVersion: v1.36.1
  goVersion: go1.26.3
  major: "1"
  minor: "36"
  platform: darwin/arm64
kustomizeVersion: v5.8.1
serverVersion:
  buildDate: "2025-12-17T12:32:07Z"
  compiler: gc
  emulationMajor: "1"
  emulationMinor: "35"
  gitCommit: 66452049f3d692768c39c797b21b793dce80314e
  gitTreeState: clean
  gitVersion: v1.35.0
  goVersion: go1.25.5
  major: "1"
  minCompatibilityMajor: "1"
  minCompatibilityMinor: "34"
  minor: "35"
  platform: linux/arm64

Kubernetes Cluster Type

kind

How did you deploy Perses-Operator?

yaml manifests

Manifests

see above

Perses-operator log output

n/a

Anything else?

  • I believe SQL-backed instances are not affected since all pods share state through the external database, but I haven't tested it.
  • This affects PersesDashboard, PersesDatasource, and PersesGlobalDatasource equally.
  • The operator has the correct RBAC to list pods but does not currently use it in the sync path.
  • This is distinct from New Perses instances do not receive existing dashboards/datasources #286, which addressed multiple separate Perses instances (separate CRs) not receiving existing resources. That was a missing watch/trigger problem, fixed by the PersesAvailabilityPredicate. This issue is about multiple replicas within a single Perses instance: The reconciliation triggers correctly, but the HTTP push only reaches one pod because it goes through the Service.

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugSomething isn't workingneeds-triageNeeds triaging issue from maintainers

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    In Progress

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions