Skip to content

debezium/dbz#1897 Add support for exposing Pipelines metrics to Prometheus via OTel Collector#409

Open
mfvitale wants to merge 6 commits into
debezium:mainfrom
mfvitale:dbz#1897
Open

debezium/dbz#1897 Add support for exposing Pipelines metrics to Prometheus via OTel Collector#409
mfvitale wants to merge 6 commits into
debezium:mainfrom
mfvitale:dbz#1897

Conversation

@mfvitale
Copy link
Copy Markdown
Member

Fixes debezium/dbz#1897

Description

  • Add built-in monitoring support to the Debezium Platform using OpenTelemetry and Prometheus

    • When enabled (monitoring.otel.enabled: true), the Helm chart deploys an OpenTelemetryCollector CR and optionally a ServiceMonitor for Prometheus integration
    • The conductor is updated to enable OpenTelemetry metrics export on the DebeziumServer CRDs it creates, allowing pipeline metrics to flow from Debezium Server → OTel
      Collector → Prometheus

    Helm chart changes

    • New OpenTelemetryCollector CR template with configurable receivers (OTLP gRPC/HTTP), batch processor, and Prometheus exporter
    • New ServiceMonitor template for automatic Prometheus scraping
    • Monitoring env vars (PIPELINE_MONITORING_OTEL_ENABLED, PIPELINE_MONITORING_OTEL_ENDPOINT) injected into the conductor deployment when monitoring is enabled
    • Helper templates for collector naming and endpoint resolution
    • Updated NOTES.txt, values.schema.json, and documentation

    Conductor changes

    • New MonitoringConfigGroup config interface for OTel settings
    • PipelineMapper.createRuntime() conditionally enables OpenTelemetry on the DebeziumServer CRD with a configurable collector endpoint (defaults to http://localhost:4318)
    • Unit tests for OTel enabled/disabled/no-endpoint scenarios

Why the OTel Operator and Prometheus Operator are not included as subcharts

The OTel Operator was initially included as a subchart dependency but removed due to the following reasons:

  • Templated CRD limitation: Helm validates all rendered templates against the Kubernetes API before applying anything (helm#11120
    (unable to build kubernetes objects from current release manifest: resource mapping not found for name helm/helm#11120)). The OTel Operator chart uses templated CRDs (in templates/ rather than crds/), so on helm install the OpenTelemetryCollectorCRD is not yet registered when Helm tries to validate the collector CR, causing the resource mapping to fail. Unlike the Debezium Operator which places CRDs in crds/
    (processed first by Helm), there is no workaround that doesn't involve multiple install steps.
  • Conflicting installations: both operators are cluster-scoped and manage CRDs and admission webhooks across all namespaces. Users who already have these operators in their cluster would face conflicts with duplicate CRDs and webhook registrations.
  • Lifecycle mismatch: operators are typically managed independently with their own upgrade cadence, not tied to the application chart's lifecycle.

Instead, the chart creates only the CRs (OpenTelemetryCollector, ServiceMonitor) and documents the operators as prerequisites.

PR Checklist

  • I have read the contribution guidelines and the governance document on PR expectations.
  • Minimal changes to code not directly related to your change (e.g. no unnecessary formatting changes or refactoring to existing code)
  • One feature/change per PR unless tightly coupled
  • Do a rebase on upstream main

@mfvitale mfvitale marked this pull request as draft May 27, 2026 13:03
@mfvitale mfvitale marked this pull request as ready for review May 28, 2026 07:10
…theus via OTel Collector

Signed-off-by: Fiore Mario Vitale <mvitale@redhat.com>
Copy link
Copy Markdown
Member

@vjuranek vjuranek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Signed-off-by: Fiore Mario Vitale <mvitale@redhat.com>
}

private Runtime createRuntime() {
var metricsBuilder = new MetricsBuilder()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you refactor with a strategy this portion of code?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should keep it as it's generally a good idea to show what the UI looks like in the readme.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I forget to update the "updated" image with also the "Connections"

Comment thread helm/README.md Outdated
4. [Optional] PostgreSQL database used by conductor to store its data.
5. [Optional] Strimzi operator: operator for creating Kakfa cluster. In case you want to use a Kafka destination in you
pipeline.
6. OpenTelemetry Operator: manages the OpenTelemetry Collector used for pipeline monitoring.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion (non-blocking): Just to make it super clear:

When monitoring is enabled, the chart also creates:
1. An `OpenTelemetryCollector` custom resource, which requires the OpenTelemetry Operator to already be installed.
2. [Optional] A `ServiceMonitor`, which requires the Prometheus Operator to already be installed.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pxcamus Isn't this covered in the Monitoring section below?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it was just to reinforce, since people (me included) tend to skim the README.

mfvitale added 3 commits June 1, 2026 14:09
Signed-off-by: Fiore Mario Vitale <mvitale@redhat.com>
Signed-off-by: Fiore Mario Vitale <mvitale@redhat.com>
Signed-off-by: Fiore Mario Vitale <mvitale@redhat.com>
* via CDI and applies them based on their applicability.
*/
@ApplicationScoped
public class MetricsExporterStrategyManager {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of using this class, you could use a jakarta producer that generate the MetricsExporterStrategy based on configuration/input parameter. You will have only one bean and you can use directly in PipelineMapper.

I think also that you can use an implicit strategy which generate directly the Runtime.

Copy link
Copy Markdown
Member Author

@mfvitale mfvitale Jun 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only one point here. At the end, when the full implementation will be completed ( FE part too), there will be only the OTel enabled on the Platform and no more. Do you think improving this still make sense?

In any case I'll investigate the producer way.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of using this class, you could use a jakarta producer that generate the MetricsExporterStrategy based on configuration/input parameter. You will have only one bean and you can use directly in PipelineMapper

Currently both strategy are active at the same time. Is this something doable with the producer?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only one point here. At the end, when the full implementation will be completed ( FE part too), there will be only the OTel enabled on the Platform and no more. Do you think improving this still make sense?

In any case I'll investigate the producer way.

With this in mind, I think it's worth to investigate the creation with a producer of Runtime or a class that handle the Runtime. In this way when you have to remove the code, you don't have to touch the pipeline but the jakarta producer

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used the producer to create the Metrics object.

Signed-off-by: Fiore Mario Vitale <mvitale@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Deploy OpenTelemetry Collector in Platform Helm Chart

6 participants