[RF] Disable redundant dirty-flag propagation during minimization#21343
Merged
guitargeek merged 2 commits intoroot-project:masterfrom Apr 23, 2026
Merged
[RF] Disable redundant dirty-flag propagation during minimization#21343guitargeek merged 2 commits intoroot-project:masterfrom
guitargeek merged 2 commits intoroot-project:masterfrom
Conversation
OperMode::ADirty for all RooAbsArgs in RooFit::EvaluaturOperMode::ADirty for all RooAbsArgs in RooFit::Evaluator
Test Results 22 files 22 suites 3d 11h 22m 14s ⏱️ For more details on these failures, see this check. Results for commit 53aec08. ♻️ This comment has been updated with latest results. |
OperMode::ADirty for all RooAbsArgs in RooFit::Evaluator
vgvassilev
reviewed
Feb 21, 2026
e386697 to
111908f
Compare
vgvassilev
reviewed
Feb 21, 2026
vgvassilev
reviewed
Feb 21, 2026
vgvassilev
reviewed
Feb 21, 2026
guitargeek
commented
Feb 22, 2026
6550887 to
a2192f4
Compare
This was referenced Mar 16, 2026
4858a3e to
45b179f
Compare
611fce2 to
187e8e3
Compare
When a likelihood is evaluated with the new `"cpu"` backend, the `RooFit::Evaluator` fully manages dependency tracking and re-evaluation of the computation graph. In this case, RooFit’s built-in dirty flag propagation in RooAbsArg becomes redundant and introduces significant overhead for large models. This patch disables regular dirty state propagation for all non-fundamental nodes in the Evaluator's computation graph by setting their OperMode to `RooAbsArg::ADirty`. Fundamental nodes (e.g. RooRealVar, RooCategory) are excluded because they are often shared with other computation graphs outside the Evaluator (usually the original pdf in the RooWorkspace). To set the OperMode of *all* RooAbsArgs to `ADirty` during minimization, while avoiding side effects outside the minimization scope, the dirty flag propagation for the fundamental nodes is only disabled temporarily in the RooMinimizer. This commit drastically speeds up fits with AD in particular (up to 2 x for large models), because with fast gradients, the dirty flag propagation that determines which part of the compute graph needs to be recomputed becomes the bottleneck. It was also redundant with a faster "dirty state" bookkeeping mechanism in the `RooFit::Evaluator` class itself. At this point, there is no performance regression anymore when disabling recursive dirty flag propagation for all evaluated nodes, so the old comment in the code about test 14 in stressRooFit being slow doesn't apply anymore.
Several places needed to record a set of operation-mode changes and restore them later as a group, so it's better to have the ChangeOperModeRAII act on groups of RooAbsArg to not have to create one RAII object per arg.
lmoneta
approved these changes
Apr 23, 2026
Member
lmoneta
left a comment
There was a problem hiding this comment.
Thank you Jonas for implementing this significant improvement, speeding up performances!
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When a likelihood is evaluated with the new
"cpu"backend, theRooFit::Evaluatorfully manages dependency tracking and re-evaluation of the computation graph. In this case, RooFit’s built-in dirty flag propagation in RooAbsArg becomes redundant and introduces significant overhead for large models.This patch disables regular dirty state propagation for all non-fundamental nodes in the Evaluator's computation graph by setting their OperMode to
RooAbsArg::ADirty. Fundamental nodes (e.g. RooRealVar, RooCategory) are excluded because they are often shared with other computation graphs outside the Evaluator (usually the original pdf in the RooWorkspace).To set the OperMode of all RooAbsArgs to
ADirtyduring minimization, while avoiding side effects outside the minimization scope, the dirty flag propagation for the fundamental nodes is only disabled temporarily in the RooMinimizer.This commit drastically speeds up fits with AD in particular (up to 2 x for large models), because with fast gradients, the dirty flag propagation that determines which part of the compute graph needs to be recomputed becomes the bottleneck. It was also redundant with a faster "dirty state" bookkeeping mechanism in the
RooFit::Evaluatorclass itself.At this point, there is no performance regression anymore when disabling recursive dirty flag propagation for all evaluated nodes, so the old comment in the code about test 14 in stressRooFit being slow doesn't apply anymore.
See also slide 12 and 13 on my RooFit AD ROOT users workshop talk for the flamegraphs that show how significant the RooFit bookkeeping was for minimizations with AD gradients.