feat(openfeature): emit server-side EVP flagevaluation by leoromanovsky · Pull Request #11639 · DataDog/dd-trace-java

leoromanovsky · 2026-06-12T21:44:59Z

🟢 NOTE TO REVIEWERS

I've chosen to keep this PR on the larger side in terms of "lines of code" as a test. The commits are deliberately layered in a narrative style. Each one is equivalent to what we would normally do with a "stacked" PR, but this preserves the overall view of the feature. Please review one commit at a time or all together.

If this review mechanism is not satisfactory, please let me know!

Motivation

Customers need consistent server-side feature-flag evaluation visibility across supported runtimes so rollout behavior can be correlated with application behavior in APM and Event Platform. This Java contribution adds that server-side flagevaluation signal for Java OpenFeature evaluations while preserving the existing OTel feature_flag.evaluations path and the existing exposure telemetry path.

High Priority Changes and Decisions

These are the design points I would want reviewed most closely.

EVP routing uses the Agent-advertised proxy prefix, not a hard-coded v2/v3/v4 path. The SDK keeps the track route as /api/v2/flagevaluation, but builds it under the proxy prefix discovered from the Agent. In current staging dogfooding that resolves to /evp_proxy/v4/api/v2/flagevaluation.
Flagevaluation reuses the existing Event Platform publisher path. This keeps delivery aligned with existing Agent discovery, headers, lifecycle, and compression controls instead of adding a Java-specific HTTP writer. Response compression is disabled for this track to match the merged Go behavior.
The OpenFeature hook stays on the hot path, so it only captures and enqueues. It records evaluation metadata, snapshots the OpenFeature context at enqueue time, and defers materialization, aggregation, and posting to the worker thread. This is the main correctness/performance boundary.
The worker emits the existing batched flagevaluation contract. One flush produces a FlagEvaluationsRequest with top-level context and a flagEvaluations array; this does not use a separate /batchedflagevaluations route.
Aggregation keys are limited to schema-visible fields. The aggregate dimensions are flag key, variant key, allocation key, runtime-default state, error message, targeting key, and pruned context. OpenFeature reason is intentionally not a hidden aggregate key because it is not serialized to the worker contract.
targeting_key is the single identity field for the event. The hook removes duplicate targetingKey from context.evaluation so the same identity is not encoded twice.
Cardinality/backpressure behavior is intentionally lossy but counted. The writer uses full-fidelity buckets first, degraded buckets without targeting key/context after cap pressure, then counted drops if both tiers or payload limits are exhausted.
Event time and send time are separate. Each aggregate row preserves first_evaluation and last_evaluation bounds, while the payload timestamp is the flush time.

Other Changes

Adds the Java EVP flagevaluation path behind DD_FLAGGING_EVALUATION_COUNTS_ENABLED while leaving the existing OTel feature_flag.evaluations hook in place.
Renames the existing OpenFeature metrics hook so the metrics and EVP logging hooks are easy to distinguish in review.
Adds tagged core metric counts for flagevaluation dropped/degraded/split telemetry.
Wires the writer into the feature-flagging system lifecycle with bounded queueing, periodic flush, shutdown drain, and best-effort clearing after payload encoding.
Adds focused unit coverage for routing, hook capture, context snapshotting, aggregation, payload encoding, writer lifecycle/posting, and system lifecycle wiring.
Adds a JMH benchmark for the flagevaluation hot path.
Applies the repository Spotless formatter as a small follow-up commit after the narrative stack was published.
Adds follow-up Jacoco coverage for the feature-flagging-lib class-level coverage gate surfaced by CI.

Commit Guide

LOC is rename-aware git diff-tree -M --numstat for each commit against its parent.

SHA	Changes / purpose	LOC (+/-)
`571938b6c7`	Centralize EVP proxy endpoint construction and support the Agent-advertised proxy prefix generically.	+270 / -24
`4949ed7ee4`	Share feature-flagging EVP publishing primitives so flagevaluation can reuse the existing transport path.	+179 / -27
`4aa06476b7`	Add the bootstrap flagevaluation event/writer contract between OpenFeature and the agent writer.	+180 / -0
`ac126fb44a`	Rename the existing OpenFeature metrics hook so OTel metrics and EVP logging are distinct in review.	+17 / -17
`7a43658da6`	Add the OpenFeature flagevaluation logging hook and non-blocking event capture path.	+130 / -1
`4ec77cae03`	Cover hook capture, skip/error behavior, metadata extraction, and targeting-key de-duplication.	+384 / -0
`6fa7ade3e4`	Snapshot OpenFeature context values at enqueue time, including nested structures/lists and duplicate scalars.	+161 / -19
`986a9af6be`	Register the flagevaluation logging hook with the Java OpenFeature provider behind the config gate.	+97 / -6
`f1c37aba16`	Canonicalize pruned context values for deterministic aggregation keys.	+194 / -0
`115a407b76`	Add the two-tier aggregation model for full-fidelity rows, degraded rows, and counted drops.	+234 / -1
`cc3b21c0c2`	Cover aggregation merge keys, caps, degradation, context pruning, and constants.	+183 / -0
`31d7cb5116`	Encode `FlagEvaluationsRequest` payloads, split oversized bodies, degrade oversized rows, and count drops.	+270 / -0
`e43f93acd0`	Cover payload wire shape, split behavior, degraded rows, and error serialization.	+251 / -0
`17d7785420`	Allow tagged core metric counts for flagevaluation drop/degradation metrics.	+28 / -0
`5f88cadeb2`	Add writer lifecycle, bounded queue, worker thread, flush cadence, and shutdown drain.	+296 / -1
`597e97d10b`	Post encoded flagevaluation payloads through EVP and clear best-effort aggregates after encoding.	+296 / -17
`58df936165`	Add shared test support for writer and payload tests.	+212 / -0
`acc6e21940`	Cover writer queueing, flush, backpressure, drop metrics, shutdown, and payload posting.	+304 / -0
`96df04f1be`	Wire the flagevaluation writer into the feature-flagging system lifecycle.	+22 / -0
`edb264e7d8`	Cover system lifecycle registration, start, and close behavior for the flagevaluation writer.	+28 / -0
`90ed6cb556`	Add a JMH benchmark for the flagevaluation hot path.	+155 / -0
`c73cafdb46`	Apply repository Spotless formatting after publishing the stack.	+4 / -5
`5b1456b3fb`	Add focused branch and instruction coverage for the feature-flagging-lib Jacoco gate.	+465 / -2

Validation Evidence

Local Test Gates

Focused Gradle gate passed after rebasing onto current origin/master (ac29db2316):
- :communication:test
- :products:feature-flagging:feature-flagging-api:test
- :products:feature-flagging:feature-flagging-agent:test
- :products:feature-flagging:feature-flagging-lib:test
- :products:feature-flagging:feature-flagging-lib:jmhClasses
Covered test classes included:
- BackendApiFactoryTest, DDAgentFeaturesDiscoveryTest
- DDEvaluatorTest, ProviderTest, FlagEvalLoggingHookTest
- FeatureFlaggingSystemTest
- FeatureFlagEvpPublisherTest, FlagEvaluationAggregatorTest, FlagEvaluationPayloadsTest, FlagEvaluationWriterImplTest
Feature-flagging-lib coverage gate passed after the Jacoco follow-up commit:
- ./gradlew :products:feature-flagging:feature-flagging-lib:test :products:feature-flagging:feature-flagging-lib:jacocoTestReport :products:feature-flagging:feature-flagging-lib:jacocoTestCoverageVerification
Formatting/lint checks after the Spotless and coverage follow-up commits:
- ./gradlew spotlessApply
- ./gradlew spotlessCheck
- ./gradlew :products:feature-flagging:feature-flagging-lib:spotlessCheck
- ./gradlew :communication:forbiddenApis :dd-trace-core:forbiddenApis :internal-api:forbiddenApis :telemetry:forbiddenApis :products:feature-flagging:feature-flagging-api:forbiddenApis :products:feature-flagging:feature-flagging-agent:forbiddenApis :products:feature-flagging:feature-flagging-lib:forbiddenApis
Stack hygiene:
- git diff --check passed.
- All published commits verified with good git signatures.

Dogfooding App

Rebuilt ffe-dogfooding Java artifacts from this local dd-trace-java stack with scripts/prepare-local-java.sh.
Restarted dogfooding with local dd-openfeature and dd-java-agent artifacts plus the real backend EVP path.
Java app health reached PROVIDER_READY.
Evaluated ffe-dogfooding-string-flag through the Java dogfooding app 15 times total: 5 evaluations for each targeting key:
- java-restack4-20260702T042247Z-alpha
- java-restack4-20260702T042247Z-bravo
- java-restack4-20260702T042247Z-charlie
App-side result: all 15 evaluations returned variant_1, allocation allocation-override-392dd7c149f8, service java, and evaluation reason STATIC.
App logs showed two successful EVP posts to the Agent-advertised route http://datadog-agent:8126/evp_proxy/v4/api/v2/flagevaluation, both returning 202.

Staging End-To-End

Dogfooding ran without the local mock-intake EVP tee/proxy, so the Agent sent EVP traffic through the normal backend path.
Retriever staging query against eventplatform.system.track(TRACK => 'flagevaluation') returned 3 aggregated rows for the exact targeting keys above.
Each row had:
- flag.key=ffe-dogfooding-string-flag
- variant.key=variant_1
- allocation.key=allocation-override-392dd7c149f8
- evaluation_count=5
This proves SDK aggregation/batching for the final local tree: 15 app evaluations became 3 backend flagevaluation rows.

System Tests

Companion draft PR: Enable EVP flagevaluation system tests for Java system-tests#7185
Local manifest-enabled Java EVP flagevaluation system tests passed against PR head 6b7aa4273d:
- TEST_LIBRARY=java ./run.sh +v FEATURE_FLAGGING_AND_EXPERIMENTATION tests/ffe/test_flag_eval_evp.py
- Result: 8 passed in 80.08s (Library: java@1.64.0-SNAPSHOT+6b7aa4273d, Weblog variant: spring-boot).

datadog-datadog-prod-us1-2 · 2026-06-12T21:47:23Z

🎯 Code Coverage (details)
• Patch Coverage: 90.79%
• Overall Coverage: 57.19% (+0.23%)

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: 6b7aa42 | Docs | Datadog PR Page | Give us feedback!}

dd-octo-sts · 2026-06-12T22:06:16Z

🟢 Java Benchmark SLOs — All performance SLOs passed

Suite	Status
Startup	🟢 pass

SLO thresholds are defined here based on automatically generated metrics. A warning is raised when results are within 5% of the threshold.

PR vs. master results

Scenario	Candidate	master	Δ (95% CI of mean)
startup:insecure-bank:iast:Agent	13.98 s	14.02 s	[-0.9%; +0.3%] (no difference)
startup:insecure-bank:tracing:Agent	12.91 s	13.02 s	[-1.7%; -0.0%] (maybe better)
startup:petclinic:appsec:Agent	16.38 s	16.90 s	[-7.2%; +1.0%] (no difference)
startup:petclinic:iast:Agent	16.95 s	16.94 s	[-0.9%; +1.0%] (no difference)
startup:petclinic:profiling:Agent	16.91 s	16.99 s	[-1.6%; +0.6%] (no difference)
startup:petclinic:sca:Agent	17.01 s	16.76 s	[+0.6%; +2.4%] (maybe worse)
startup:petclinic:tracing:Agent	15.68 s	16.15 s	[-7.3%; +1.5%] (no difference)

Commit: 6b7aa427 · CI Pipeline · Benchmarking Platform UI

Load and DaCapo benchmarks can be triggered manually in the GitLab pipeline. Results will appear in the Benchmarking Platform UI after completion.

dd-octo-sts · 2026-06-23T00:23:25Z

Hi! 👋 Thanks for your pull request! 🎉

To help us review it, please make sure to:

Add at least one type, and one component or instrumentation label to the pull request

If you need help, please check our contributing guidelines.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3d4244f8ae

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

leoromanovsky · 2026-07-02T14:27:27Z

  private int spansWritten;

  public LLMObsSpanMapper() {
-    this(5 << 20);


This wasn't precise enough; the limit is measured in even megabytes. I can revert the change, though, and leave it as-is for the LLMObs product. I noticed this same constant (even 5M) in go tracer, so maybe it means something special here.

leoromanovsky · 2026-07-02T14:28:08Z

+  /**
+   * Default SDK-side target for uncompressed EVP request bodies. Writers may split batches at or
+   * below this size to keep Agent proxy requests comfortably bounded.
+   */
+  public static final int PAYLOAD_SIZE_LIMIT_BYTES = 5 * 1024 * 1024;


The declared limit in the agent is 10MB, but I'm leaving it at 5 here to not change tracer behavior too much.

leoromanovsky · 2026-07-02T14:31:15Z

      new AtomicReference<>(InitializationState.NOT_STARTED);
  private final FlagEvalMetrics flagEvalMetrics;
-  private final FlagEvalHook flagEvalHook;
+  private final FlagEvalMetricsHook flagEvalMetricsHook;


While it's a bit outside the scope of this PR, we decided to rename these hooks globally to clarify the distinction between OTel metrics and EVP-based ones.

leoromanovsky · 2026-07-02T14:54:14Z

            .addString("variationType", flag.variationType.name())
-            .addString("allocationKey", allocation.key);
+            .addString("allocationKey", allocation.key)
+            .addLong("dd.eval.timestamp_ms", evalTimestampMs);


Wondering if I should adjust this to be consistent.

leoromanovsky · 2026-07-02T15:21:56Z

+    final boolean evalCountsEnabled =
+        config
+            .configProvider()
+            .getBoolean(FeatureFlaggingConfig.FLAGGING_EVALUATION_COUNTS_ENABLED, true);
+    if (evalCountsEnabled) {
+      final FlagEvaluationWriterImpl evalWriter = new FlagEvaluationWriterImpl(sco, config);
+      evalWriter.start();
+      FLAG_EVAL_WRITER = evalWriter;
+      LOGGER.debug("Flag evaluation EVP writer started");
+    } else {


We only start the writer on the provider launch. Not to self, check earlier in the PR whether we enqueue when FLAGGING_EVALUATION_COUNTS_ENABLED is disabled. These just take up memory without being processed.

…-java

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6b7aa4273d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-07-03T04:05:46Z

+    } else if (value.isInstant()) {
+      return value.asInstant();


Normalize instant context values before serializing

When an evaluation context contains an OpenFeature instant attribute, this preserves a java.time.Instant in the attrs map. The flagevaluation payload is later encoded with a plain Moshi instance in FlagEvaluationPayloads, which has no Instant adapter, so toJson throws during flush; because FlagEvaluationWriterImpl only clears aggregates after successful payload encoding, that poisoned bucket remains and every later flush aborts, blocking all flagevaluation telemetry until restart. Convert instants to a JSON scalar (or drop them) before aggregation/serialization.

Useful? React with 👍 / 👎.

leoromanovsky changed the title ~~[FFL-2446] dd-trace-java: emit EVP flagevaluation (Phase 2 fan-out)~~ feat(openfeature): emit server-side EVP flagevaluation Jun 14, 2026

This was referenced Jun 17, 2026

Enable EVP flagevaluation system tests for Java DataDog/system-tests#7158

Closed

Enable EVP flagevaluation system tests for Java DataDog/system-tests#7185

Draft

leoromanovsky marked this pull request as ready for review June 23, 2026 00:23

leoromanovsky requested review from a team as code owners June 23, 2026 00:23

leoromanovsky requested review from PerfectSlayer, bric3, dd-oleksii and typotter and removed request for a team June 23, 2026 00:23

leoromanovsky requested a review from manuel-alvarez-alvarez June 23, 2026 00:25

leoromanovsky added type: enhancement Enhancements and improvements comp: openfeature OpenFeature labels Jun 23, 2026

chatgpt-codex-connector Bot reviewed Jun 23, 2026

View reviewed changes

Comment thread ...ng/feature-flagging-api/src/main/java/datadog/trace/api/openfeature/FlagEvalLoggingHook.java Outdated

Comment thread communication/src/main/java/datadog/communication/BackendApiFactory.java Outdated

leoromanovsky requested a review from a team as a code owner June 23, 2026 19:41

leoromanovsky force-pushed the leo.romanovsky/ffl-2446-evp-flagevaluation-java branch from 83ac4c4 to 81ed6f1 Compare July 2, 2026 00:02

leoromanovsky marked this pull request as draft July 2, 2026 03:15

leoromanovsky added 7 commits July 2, 2026 10:19

refactor(evp): centralize EVP proxy routing

571938b

refactor(feature-flagging): share EVP publishing

4949ed7

feat(openfeature): add flagevaluation contract

4aa0647

refactor(openfeature): name metrics hook explicitly

ac126fb

feat(openfeature): add flagevaluation logging hook

7a43658

test(openfeature): cover flagevaluation logging hook

4ec77ca

fix(openfeature): snapshot flagevaluation context

6fa7ade

leoromanovsky added 14 commits July 2, 2026 10:19

feat(openfeature): register flagevaluation hook

986a9af

feat(feature-flagging): canonicalize flagevaluation context

f1c37ab

feat(feature-flagging): aggregate flagevaluation rows

115a407

test(feature-flagging): cover flagevaluation aggregation

cc3b21c

feat(feature-flagging): encode flagevaluation payloads

31d7cb5

test(feature-flagging): cover flagevaluation payload encoding

e43f93a

feat(telemetry): support tagged core metric counts

17d7785

feat(feature-flagging): run flagevaluation writer lifecycle

5f88cad

feat(feature-flagging): post flagevaluation payloads

597e97d

test(feature-flagging): add flagevaluation test support

58df936

test(feature-flagging): cover flagevaluation writer

acc6e21

feat(feature-flagging): wire flagevaluation writer lifecycle

96df04f

test(feature-flagging): cover flagevaluation writer lifecycle

edb264e

perf(feature-flagging): benchmark flagevaluation hot path

90ed6cb

leoromanovsky force-pushed the leo.romanovsky/ffl-2446-evp-flagevaluation-java branch from 2a3f817 to 90ed6cb Compare July 2, 2026 14:21

chore: apply spotless formatting

c73cafd

leoromanovsky commented Jul 2, 2026

View reviewed changes

leoromanovsky added 5 commits July 2, 2026 11:36

test(feature-flagging): cover flag evaluation jacoco paths

5b1456b

fix(feature-flagging): gate flag evaluation enqueue during shutdown

6e1f81e

test(feature-flagging): cover flag eval event

78d8297

fix(feature-flagging): make aggregator count atomic

6acbb5e

Merge branch 'master' into leo.romanovsky/ffl-2446-evp-flagevaluation…

6b7aa42

…-java

leoromanovsky marked this pull request as ready for review July 3, 2026 03:59

chatgpt-codex-connector Bot reviewed Jul 3, 2026

View reviewed changes

Uh oh!

Conversation

leoromanovsky commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🟢 NOTE TO REVIEWERS

Motivation

High Priority Changes and Decisions

Other Changes

Commit Guide

Validation Evidence

Local Test Gates

Dogfooding App

Staging End-To-End

System Tests

Uh oh!

datadog-datadog-prod-us1-2 Bot commented Jun 12, 2026 • edited by datadog-official Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dd-octo-sts Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🟢 Java Benchmark SLOs — All performance SLOs passed

Uh oh!

dd-octo-sts Bot commented Jun 23, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

leoromanovsky Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

leoromanovsky Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

leoromanovsky Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

leoromanovsky Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

leoromanovsky Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jul 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

leoromanovsky commented Jun 12, 2026 •

edited

Loading

datadog-datadog-prod-us1-2 Bot commented Jun 12, 2026 •

edited by datadog-official Bot

Loading

dd-octo-sts Bot commented Jun 12, 2026 •

edited

Loading