Skip to content

[controller] OTel instrumentation#3201

Merged
knative-prow[bot] merged 8 commits into
knative:mainfrom
dprotaso:observability-controller
Jul 4, 2025
Merged

[controller] OTel instrumentation#3201
knative-prow[bot] merged 8 commits into
knative:mainfrom
dprotaso:observability-controller

Conversation

@dprotaso
Copy link
Copy Markdown
Member

@dprotaso dprotaso commented Jul 2, 2025

This changes the controller package to drop the use of the OpenCensus stats reporter.

As highlighted in the design document we had an overlap in our controller custom metrics and the ones provided by client-go workqueue package. For now we are removing the custom metrics and just going to rely on the workqueue metrics.

Breakdown of changes

  • add workqueue metrics interface
  • incorporate k8s client-go tool metrics - request latency & result
  • drop controller custom stats reporter

@knative-prow knative-prow Bot requested review from Cali0707 and skonto July 2, 2025 14:46
@knative-prow knative-prow Bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jul 2, 2025
@dprotaso
Copy link
Copy Markdown
Member Author

dprotaso commented Jul 2, 2025

/assign @skonto @evankanderson @Cali0707

@codecov
Copy link
Copy Markdown

codecov Bot commented Jul 2, 2025

Codecov Report

Attention: Patch coverage is 52.43446% with 127 lines in your changes missing coverage. Please review.

Project coverage is 75.30%. Comparing base (7a5377f) to head (6b157c4).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
observability/metrics/metricstest/assert.go 0.00% 84 Missing ⚠️
observability/metrics/k8s/tools.go 71.42% 18 Missing and 4 partials ⚠️
observability/metrics/k8s/workqueue.go 76.92% 14 Missing and 7 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3201      +/-   ##
==========================================
- Coverage   76.05%   75.30%   -0.75%     
==========================================
  Files         205      206       +1     
  Lines       11751    11867     +116     
==========================================
  Hits         8937     8937              
- Misses       2541     2650     +109     
- Partials      273      280       +7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread observability/metrics/k8s/tools.go Fixed
Comment thread observability/metrics/k8s/tools.go Fixed
Copy link
Copy Markdown
Member

@evankanderson evankanderson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

One concern in the client-go metrics, but this overall makes sense, and I like re-using the existing k8s metrics infrastructure, even if the workqueue implementation feels verbose (I like the tools version better).

Comment thread observability/metrics/k8s/tools.go Outdated
Copy link
Copy Markdown
Member

@Cali0707 Cali0707 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

Overall, this looks great - just one small nit about how we are getting the response status code for the attributes

Comment on lines +126 to +134
if code == "200" {
attrs = attribute.NewSet(
semconv.ServerAddressKey.String(host),
semconv.HTTPRequestMethodKey.String(strings.ToUpper(method)),
semconv.HTTPResponseStatusCodeKey.Int(200),
)
} else if c, err := strconv.Atoi(code); err != nil {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dprotaso is there any value to having a specific code path just for the code == "200" case?

As far as I can tell, the only difference between this block and the next is that we are manually setting the int value of the response status code in the attribute set, instead of using the parsed value. IMO, this would be a bit simpler to read if we just drop the special case for 200.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any value to having a specific code path just for the code == "200" case?

I was trying to avoid the string parsing on the 'happy' path - if you feel it's too much we can revert an revisit if there are problems in the future

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, let's keep it as is then. Maybe add a comment for why?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@dprotaso dprotaso force-pushed the observability-controller branch from 32c1ebe to 6b157c4 Compare July 4, 2025 14:48
Copy link
Copy Markdown
Member

@Cali0707 Cali0707 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@knative-prow knative-prow Bot added the lgtm Indicates that a PR is ready to be merged. label Jul 4, 2025
@knative-prow
Copy link
Copy Markdown

knative-prow Bot commented Jul 4, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Cali0707, dprotaso, evankanderson

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@dprotaso
Copy link
Copy Markdown
Member Author

dprotaso commented Jul 4, 2025

/retest

Soon this will be gone - metrics: TestMetricsExport/OpenCensus

@knative-prow knative-prow Bot merged commit e959f44 into knative:main Jul 4, 2025
35 of 38 checks passed
@dprotaso dprotaso deleted the observability-controller branch July 4, 2025 15:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants