Skip to content

[Cosmos] Make JsonSerializable.propertyBag final to fix sporadic NPE in DatabaseAccount lazy getters#49258

Open
lloydmeta wants to merge 2 commits into
Azure:mainfrom
lloydmeta:serdes/final-propertyBag-JsonSerializable
Open

[Cosmos] Make JsonSerializable.propertyBag final to fix sporadic NPE in DatabaseAccount lazy getters#49258
lloydmeta wants to merge 2 commits into
Azure:mainfrom
lloydmeta:serdes/final-propertyBag-JsonSerializable

Conversation

@lloydmeta
Copy link
Copy Markdown
Contributor

Fixes #49256.

JsonSerializable.propertyBag is non-volatile and non-final, assigned exactly once in each of the six JsonSerializable constructors and never written to outside them in the SDK. Combined with the lazy-initialised, non-volatile field assignments in DatabaseAccount.getConsistencyPolicy() (and its three siblings), this enables an unsafe publication: a concurrent reader can observe consistencyPolicy != null while observing the new ConsistencyPolicy's propertyBag field at its default value null, producing the NullPointerException reported in #49256.

Making propertyBag final invokes final-field publication semantics (JLS §17.5, JSR-133 FAQ): a thread that sees the reference to the constructed object after its constructor completes is guaranteed to see the correctly-initialised final field without requiring a synchronizes-with edge from the publishing thread. This closes the race window for ConsistencyPolicy and for every other JsonSerializable subclass that could be lazy-published in the same way.

The change is one keyword. Every existing constructor already assigns propertyBag, so the only other mechanical change is dropping the redundant = null default initialiser.

Why this fix vs. the alternatives

  • volatile on consistencyPolicy (and siblings) in DatabaseAccount: also closes the window, but only for those four fields, and leaves the latent risk for every other JsonSerializable subclass that could be lazy-published anywhere in the SDK.
  • Eager initialisation in DatabaseAccount(ObjectNode): works for DatabaseAccount specifically but is structurally larger, doesn't generalise to other subclasses, and changes when work happens (eagerly per DatabaseAccount construction, including on every GlobalEndpointManager refresh).

final propertyBag is the smallest patch that closes the bug class globally.

Safety / compatibility

  • Source compatibility: propertyBag is package-private to com.azure.cosmos.implementation. A grep across sdk/cosmos for propertyBag\s*= finds zero writes outside the six JsonSerializable constructors, zero reflective writes, and no setter. The only out-of-class mention is @JsonIgnoreProperties("propertyBag") on Range, which doesn't touch the field.
  • populatePropertyBag() overrides: Several subclasses (e.g. DatabaseAccount, DocumentCollection, the FeedRange*Impl and ChangeFeedStartFrom*Impl families) override populatePropertyBag() and mutate the bag via this.set(...) or setProperties(this, false). These mutate the ObjectNode contents through the reference, not the reference itself, so final doesn't affect them.
  • Binary compatibility: Technically binary-incompatible for any out-of-tree code in the same internal package that assigned to propertyBag. Nothing in this SDK does, but flagging in case there's a downstream/private consumer to check.

Testing

Existing unit tests cover the JsonSerializable constructor paths and the getString/getObject/getWithMapping reads, with no expected behaviour change for correctly-published instances. The actual bug is a non-deterministic JMM unsafe-publication race that doesn't reproduce reliably in a standard JUnit test, a JCStress harness would be the right tool. Happy to add one in a follow-up if the team finds it useful.

All SDK Contribution checklist:

  • The pull request does not introduce [breaking changes] (see "Safety / compatibility" above)
  • CHANGELOG is updated for new features, bug fixes or other significant changes.
  • I have read the contribution guidelines.

General Guidelines and Best Practices

  • Title of the pull request is clear and informative.
  • There are a small number of commits, each of which have an informative message.

Testing Guidelines

  • Pull request includes test coverage for the included changes.

See "Testing" section above for why a deterministic functional regression test isn't included with this PR.

@lloydmeta lloydmeta changed the title [Serdes] Make propertyBag final in JsonSeralizable [Cosmos] Make JsonSerializable.propertyBag final to fix sporadic NPE in DatabaseAccount lazy getters May 26, 2026
@github-actions github-actions Bot added the Community Contribution Community members are working on the issue label May 26, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Thank you for your contribution @lloydmeta! We will review the pull request and get back to you soon.

@github-actions github-actions Bot added Cosmos customer-reported Issues that are reported by GitHub users external to the Azure organization. labels May 26, 2026
lloydmeta added 2 commits May 26, 2026 17:43
Signed-off-by: lloydmeta <lloydmeta@gmail.com>
Signed-off-by: lloydmeta <lloydmeta@gmail.com>
@lloydmeta lloydmeta force-pushed the serdes/final-propertyBag-JsonSerializable branch from c47f3ad to d59c585 Compare May 26, 2026 08:43
@lloydmeta
Copy link
Copy Markdown
Contributor Author

Build fails in an interesting way...

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-enforcer-plugin:3.6.1:enforce (default-cli) on project azure-cosmos-spark_3-3_2-12: 
[ERROR] Rule 0: org.apache.maven.enforcer.rules.dependency.BannedDependencies failed with message:
[ERROR] 
[ERROR] com.fasterxml.jackson.module:jackson-module-scala_2.12:jar:2.18.7 <--- banned via the exclude/include list
[ERROR] -> [Help 1]
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-enforcer-plugin:3.6.1:enforce (default-cli) on project azure-cosmos-spark_3-4_2-12: 
[ERROR] Rule 0: org.apache.maven.enforcer.rules.dependency.BannedDependencies failed with message:
[ERROR] 
[ERROR] com.fasterxml.jackson.module:jackson-module-scala_2.12:jar:2.18.7 <--- banned via the exclude/include list
[ERROR] -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <args> -rf :azure-cosmos-spark_3-3_2-12

which I'm not sure is related to this PR.

@lloydmeta lloydmeta marked this pull request as ready for review May 26, 2026 09:30
Copilot AI review requested due to automatic review settings May 26, 2026 09:30
@lloydmeta lloydmeta requested review from a team and kirankumarkolli as code owners May 26, 2026 09:30
@jeet1995
Copy link
Copy Markdown
Member

jeet1995 commented May 26, 2026

Thanks @lloydmeta for your contribution. I'll do a review and have a potential fix for the CI failure (this isn't related to your PR) - #49263

Update the CI failure fix has been merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Community Contribution Community members are working on the issue Cosmos customer-reported Issues that are reported by GitHub users external to the Azure organization.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] DatabaseAccount lazy-initialised fields are unsafely published, sporadic NPE in JsonSerializable.getWithMapping

2 participants