Skip to content

feat(openfeature): emit server-side EVP flagevaluation#11639

Open
leoromanovsky wants to merge 27 commits into
masterfrom
leo.romanovsky/ffl-2446-evp-flagevaluation-java
Open

feat(openfeature): emit server-side EVP flagevaluation#11639
leoromanovsky wants to merge 27 commits into
masterfrom
leo.romanovsky/ffl-2446-evp-flagevaluation-java

Conversation

@leoromanovsky

@leoromanovsky leoromanovsky commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

🟢 NOTE TO REVIEWERS

I've chosen to keep this PR on the larger side in terms of "lines of code" as a test. The commits are deliberately layered in a narrative style. Each one is equivalent to what we would normally do with a "stacked" PR, but this preserves the overall view of the feature. Please review one commit at a time or all together.

If this review mechanism is not satisfactory, please let me know!

Screenshot 2026-07-01 at 8 14 47 PM

Motivation

Customers need consistent server-side feature-flag evaluation visibility across supported runtimes so rollout behavior can be correlated with application behavior in APM and Event Platform. This Java contribution adds that server-side flagevaluation signal for Java OpenFeature evaluations while preserving the existing OTel feature_flag.evaluations path and the existing exposure telemetry path.

High Priority Changes and Decisions

These are the design points I would want reviewed most closely.

  • EVP routing uses the Agent-advertised proxy prefix, not a hard-coded v2/v3/v4 path. The SDK keeps the track route as /api/v2/flagevaluation, but builds it under the proxy prefix discovered from the Agent. In current staging dogfooding that resolves to /evp_proxy/v4/api/v2/flagevaluation.
  • Flagevaluation reuses the existing Event Platform publisher path. This keeps delivery aligned with existing Agent discovery, headers, lifecycle, and compression controls instead of adding a Java-specific HTTP writer. Response compression is disabled for this track to match the merged Go behavior.
  • The OpenFeature hook stays on the hot path, so it only captures and enqueues. It records evaluation metadata, snapshots the OpenFeature context at enqueue time, and defers materialization, aggregation, and posting to the worker thread. This is the main correctness/performance boundary.
  • The worker emits the existing batched flagevaluation contract. One flush produces a FlagEvaluationsRequest with top-level context and a flagEvaluations array; this does not use a separate /batchedflagevaluations route.
  • Aggregation keys are limited to schema-visible fields. The aggregate dimensions are flag key, variant key, allocation key, runtime-default state, error message, targeting key, and pruned context. OpenFeature reason is intentionally not a hidden aggregate key because it is not serialized to the worker contract.
  • targeting_key is the single identity field for the event. The hook removes duplicate targetingKey from context.evaluation so the same identity is not encoded twice.
  • Cardinality/backpressure behavior is intentionally lossy but counted. The writer uses full-fidelity buckets first, degraded buckets without targeting key/context after cap pressure, then counted drops if both tiers or payload limits are exhausted.
  • Event time and send time are separate. Each aggregate row preserves first_evaluation and last_evaluation bounds, while the payload timestamp is the flush time.

Other Changes

  • Adds the Java EVP flagevaluation path behind DD_FLAGGING_EVALUATION_COUNTS_ENABLED while leaving the existing OTel feature_flag.evaluations hook in place.
  • Renames the existing OpenFeature metrics hook so the metrics and EVP logging hooks are easy to distinguish in review.
  • Adds tagged core metric counts for flagevaluation dropped/degraded/split telemetry.
  • Wires the writer into the feature-flagging system lifecycle with bounded queueing, periodic flush, shutdown drain, and best-effort clearing after payload encoding.
  • Adds focused unit coverage for routing, hook capture, context snapshotting, aggregation, payload encoding, writer lifecycle/posting, and system lifecycle wiring.
  • Adds a JMH benchmark for the flagevaluation hot path.
  • Applies the repository Spotless formatter as a small follow-up commit after the narrative stack was published.
  • Adds follow-up Jacoco coverage for the feature-flagging-lib class-level coverage gate surfaced by CI.

Commit Guide

LOC is rename-aware git diff-tree -M --numstat for each commit against its parent.

SHA Changes / purpose LOC (+/-)
571938b6c7 Centralize EVP proxy endpoint construction and support the Agent-advertised proxy prefix generically. +270 / -24
4949ed7ee4 Share feature-flagging EVP publishing primitives so flagevaluation can reuse the existing transport path. +179 / -27
4aa06476b7 Add the bootstrap flagevaluation event/writer contract between OpenFeature and the agent writer. +180 / -0
ac126fb44a Rename the existing OpenFeature metrics hook so OTel metrics and EVP logging are distinct in review. +17 / -17
7a43658da6 Add the OpenFeature flagevaluation logging hook and non-blocking event capture path. +130 / -1
4ec77cae03 Cover hook capture, skip/error behavior, metadata extraction, and targeting-key de-duplication. +384 / -0
6fa7ade3e4 Snapshot OpenFeature context values at enqueue time, including nested structures/lists and duplicate scalars. +161 / -19
986a9af6be Register the flagevaluation logging hook with the Java OpenFeature provider behind the config gate. +97 / -6
f1c37aba16 Canonicalize pruned context values for deterministic aggregation keys. +194 / -0
115a407b76 Add the two-tier aggregation model for full-fidelity rows, degraded rows, and counted drops. +234 / -1
cc3b21c0c2 Cover aggregation merge keys, caps, degradation, context pruning, and constants. +183 / -0
31d7cb5116 Encode FlagEvaluationsRequest payloads, split oversized bodies, degrade oversized rows, and count drops. +270 / -0
e43f93acd0 Cover payload wire shape, split behavior, degraded rows, and error serialization. +251 / -0
17d7785420 Allow tagged core metric counts for flagevaluation drop/degradation metrics. +28 / -0
5f88cadeb2 Add writer lifecycle, bounded queue, worker thread, flush cadence, and shutdown drain. +296 / -1
597e97d10b Post encoded flagevaluation payloads through EVP and clear best-effort aggregates after encoding. +296 / -17
58df936165 Add shared test support for writer and payload tests. +212 / -0
acc6e21940 Cover writer queueing, flush, backpressure, drop metrics, shutdown, and payload posting. +304 / -0
96df04f1be Wire the flagevaluation writer into the feature-flagging system lifecycle. +22 / -0
edb264e7d8 Cover system lifecycle registration, start, and close behavior for the flagevaluation writer. +28 / -0
90ed6cb556 Add a JMH benchmark for the flagevaluation hot path. +155 / -0
c73cafdb46 Apply repository Spotless formatting after publishing the stack. +4 / -5
5b1456b3fb Add focused branch and instruction coverage for the feature-flagging-lib Jacoco gate. +465 / -2

Validation Evidence

Local Test Gates

  • Focused Gradle gate passed after rebasing onto current origin/master (ac29db2316):
    • :communication:test
    • :products:feature-flagging:feature-flagging-api:test
    • :products:feature-flagging:feature-flagging-agent:test
    • :products:feature-flagging:feature-flagging-lib:test
    • :products:feature-flagging:feature-flagging-lib:jmhClasses
  • Covered test classes included:
    • BackendApiFactoryTest, DDAgentFeaturesDiscoveryTest
    • DDEvaluatorTest, ProviderTest, FlagEvalLoggingHookTest
    • FeatureFlaggingSystemTest
    • FeatureFlagEvpPublisherTest, FlagEvaluationAggregatorTest, FlagEvaluationPayloadsTest, FlagEvaluationWriterImplTest
  • Feature-flagging-lib coverage gate passed after the Jacoco follow-up commit:
    • ./gradlew :products:feature-flagging:feature-flagging-lib:test :products:feature-flagging:feature-flagging-lib:jacocoTestReport :products:feature-flagging:feature-flagging-lib:jacocoTestCoverageVerification
  • Formatting/lint checks after the Spotless and coverage follow-up commits:
    • ./gradlew spotlessApply
    • ./gradlew spotlessCheck
    • ./gradlew :products:feature-flagging:feature-flagging-lib:spotlessCheck
    • ./gradlew :communication:forbiddenApis :dd-trace-core:forbiddenApis :internal-api:forbiddenApis :telemetry:forbiddenApis :products:feature-flagging:feature-flagging-api:forbiddenApis :products:feature-flagging:feature-flagging-agent:forbiddenApis :products:feature-flagging:feature-flagging-lib:forbiddenApis
  • Stack hygiene:
    • git diff --check passed.
    • All published commits verified with good git signatures.

Dogfooding App

  • Rebuilt ffe-dogfooding Java artifacts from this local dd-trace-java stack with scripts/prepare-local-java.sh.
  • Restarted dogfooding with local dd-openfeature and dd-java-agent artifacts plus the real backend EVP path.
  • Java app health reached PROVIDER_READY.
  • Evaluated ffe-dogfooding-string-flag through the Java dogfooding app 15 times total: 5 evaluations for each targeting key:
    • java-restack4-20260702T042247Z-alpha
    • java-restack4-20260702T042247Z-bravo
    • java-restack4-20260702T042247Z-charlie
  • App-side result: all 15 evaluations returned variant_1, allocation allocation-override-392dd7c149f8, service java, and evaluation reason STATIC.
  • App logs showed two successful EVP posts to the Agent-advertised route http://datadog-agent:8126/evp_proxy/v4/api/v2/flagevaluation, both returning 202.

Staging End-To-End

  • Dogfooding ran without the local mock-intake EVP tee/proxy, so the Agent sent EVP traffic through the normal backend path.
  • Retriever staging query against eventplatform.system.track(TRACK => 'flagevaluation') returned 3 aggregated rows for the exact targeting keys above.
  • Each row had:
    • flag.key=ffe-dogfooding-string-flag
    • variant.key=variant_1
    • allocation.key=allocation-override-392dd7c149f8
    • evaluation_count=5
  • This proves SDK aggregation/batching for the final local tree: 15 app evaluations became 3 backend flagevaluation rows.

System Tests

  • Companion draft PR: Enable EVP flagevaluation system tests for Java system-tests#7185
  • Local manifest-enabled Java EVP flagevaluation system tests passed against PR head 6b7aa4273d:
    • TEST_LIBRARY=java ./run.sh +v FEATURE_FLAGGING_AND_EXPERIMENTATION tests/ffe/test_flag_eval_evp.py
    • Result: 8 passed in 80.08s (Library: java@1.64.0-SNAPSHOT+6b7aa4273d, Weblog variant: spring-boot).

@datadog-datadog-prod-us1-2

datadog-datadog-prod-us1-2 Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

🎯 Code Coverage (details)
Patch Coverage: 90.79%
Overall Coverage: 57.19% (+0.23%)

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 6b7aa42 | Docs | Datadog PR Page | Give us feedback!

@dd-octo-sts

dd-octo-sts Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

🟢 Java Benchmark SLOs — All performance SLOs passed

Suite Status
Startup 🟢 pass

SLO thresholds are defined here based on automatically generated metrics. A warning is raised when results are within 5% of the threshold.

PR vs. master results
Scenario Candidate master Δ (95% CI of mean)
startup:insecure-bank:iast:Agent 13.98 s 14.02 s [-0.9%; +0.3%] (no difference)
startup:insecure-bank:tracing:Agent 12.91 s 13.02 s [-1.7%; -0.0%] (maybe better)
startup:petclinic:appsec:Agent 16.38 s 16.90 s [-7.2%; +1.0%] (no difference)
startup:petclinic:iast:Agent 16.95 s 16.94 s [-0.9%; +1.0%] (no difference)
startup:petclinic:profiling:Agent 16.91 s 16.99 s [-1.6%; +0.6%] (no difference)
startup:petclinic:sca:Agent 17.01 s 16.76 s [+0.6%; +2.4%] (maybe worse)
startup:petclinic:tracing:Agent 15.68 s 16.15 s [-7.3%; +1.5%] (no difference)

Commit: 6b7aa427 · CI Pipeline · Benchmarking Platform UI


Load and DaCapo benchmarks can be triggered manually in the GitLab pipeline. Results will appear in the Benchmarking Platform UI after completion.

@leoromanovsky leoromanovsky changed the title [FFL-2446] dd-trace-java: emit EVP flagevaluation (Phase 2 fan-out) feat(openfeature): emit server-side EVP flagevaluation Jun 14, 2026
@leoromanovsky leoromanovsky marked this pull request as ready for review June 23, 2026 00:23
@leoromanovsky leoromanovsky requested review from a team as code owners June 23, 2026 00:23
@leoromanovsky leoromanovsky requested review from PerfectSlayer, bric3, dd-oleksii and typotter and removed request for a team June 23, 2026 00:23
@dd-octo-sts

dd-octo-sts Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Hi! 👋 Thanks for your pull request! 🎉

To help us review it, please make sure to:

  • Add at least one type, and one component or instrumentation label to the pull request

If you need help, please check our contributing guidelines.

@leoromanovsky leoromanovsky added type: enhancement Enhancements and improvements comp: openfeature OpenFeature labels Jun 23, 2026

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3d4244f8ae

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread communication/src/main/java/datadog/communication/BackendApiFactory.java Outdated
@leoromanovsky leoromanovsky requested a review from a team as a code owner June 23, 2026 19:41
@leoromanovsky leoromanovsky force-pushed the leo.romanovsky/ffl-2446-evp-flagevaluation-java branch from 83ac4c4 to 81ed6f1 Compare July 2, 2026 00:02
@leoromanovsky leoromanovsky marked this pull request as draft July 2, 2026 03:15
@leoromanovsky leoromanovsky force-pushed the leo.romanovsky/ffl-2446-evp-flagevaluation-java branch from 2a3f817 to 90ed6cb Compare July 2, 2026 14:21
private int spansWritten;

public LLMObsSpanMapper() {
this(5 << 20);

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wasn't precise enough; the limit is measured in even megabytes. I can revert the change, though, and leave it as-is for the LLMObs product. I noticed this same constant (even 5M) in go tracer, so maybe it means something special here.

Comment on lines +8 to +12
/**
* Default SDK-side target for uncompressed EVP request bodies. Writers may split batches at or
* below this size to keep Agent proxy requests comfortably bounded.
*/
public static final int PAYLOAD_SIZE_LIMIT_BYTES = 5 * 1024 * 1024;

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The declared limit in the agent is 10MB, but I'm leaving it at 5 here to not change tracer behavior too much.

new AtomicReference<>(InitializationState.NOT_STARTED);
private final FlagEvalMetrics flagEvalMetrics;
private final FlagEvalHook flagEvalHook;
private final FlagEvalMetricsHook flagEvalMetricsHook;

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it's a bit outside the scope of this PR, we decided to rename these hooks globally to clarify the distinction between OTel metrics and EVP-based ones.

.addString("variationType", flag.variationType.name())
.addString("allocationKey", allocation.key);
.addString("allocationKey", allocation.key)
.addLong("dd.eval.timestamp_ms", evalTimestampMs);

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if I should adjust this to be consistent.

Comment on lines +33 to +42
final boolean evalCountsEnabled =
config
.configProvider()
.getBoolean(FeatureFlaggingConfig.FLAGGING_EVALUATION_COUNTS_ENABLED, true);
if (evalCountsEnabled) {
final FlagEvaluationWriterImpl evalWriter = new FlagEvaluationWriterImpl(sco, config);
evalWriter.start();
FLAG_EVAL_WRITER = evalWriter;
LOGGER.debug("Flag evaluation EVP writer started");
} else {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only start the writer on the provider launch. Not to self, check earlier in the PR whether we enqueue when FLAGGING_EVALUATION_COUNTS_ENABLED is disabled. These just take up memory without being processed.

@leoromanovsky leoromanovsky marked this pull request as ready for review July 3, 2026 03:59

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6b7aa4273d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +609 to +610
} else if (value.isInstant()) {
return value.asInstant();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Normalize instant context values before serializing

When an evaluation context contains an OpenFeature instant attribute, this preserves a java.time.Instant in the attrs map. The flagevaluation payload is later encoded with a plain Moshi instance in FlagEvaluationPayloads, which has no Instant adapter, so toJson throws during flush; because FlagEvaluationWriterImpl only clears aggregates after successful payload encoding, that poisoned bucket remains and every later flush aborts, blocking all flagevaluation telemetry until restart. Convert instants to a JSON scalar (or drop them) before aggregation/serialization.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp: openfeature OpenFeature type: enhancement Enhancements and improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant