Skip to content

[RF] Disable redundant dirty-flag propagation during minimization#21343

Merged
guitargeek merged 2 commits intoroot-project:masterfrom
guitargeek:ADirty
Apr 23, 2026
Merged

[RF] Disable redundant dirty-flag propagation during minimization#21343
guitargeek merged 2 commits intoroot-project:masterfrom
guitargeek:ADirty

Conversation

@guitargeek
Copy link
Copy Markdown
Contributor

@guitargeek guitargeek commented Feb 20, 2026

When a likelihood is evaluated with the new "cpu" backend, the RooFit::Evaluator fully manages dependency tracking and re-evaluation of the computation graph. In this case, RooFit’s built-in dirty flag propagation in RooAbsArg becomes redundant and introduces significant overhead for large models.

This patch disables regular dirty state propagation for all non-fundamental nodes in the Evaluator's computation graph by setting their OperMode to RooAbsArg::ADirty. Fundamental nodes (e.g. RooRealVar, RooCategory) are excluded because they are often shared with other computation graphs outside the Evaluator (usually the original pdf in the RooWorkspace).

To set the OperMode of all RooAbsArgs to ADirty during minimization, while avoiding side effects outside the minimization scope, the dirty flag propagation for the fundamental nodes is only disabled temporarily in the RooMinimizer.

This commit drastically speeds up fits with AD in particular (up to 2 x for large models), because with fast gradients, the dirty flag propagation that determines which part of the compute graph needs to be recomputed becomes the bottleneck. It was also redundant with a faster "dirty state" bookkeeping mechanism in the RooFit::Evaluator class itself.

At this point, there is no performance regression anymore when disabling recursive dirty flag propagation for all evaluated nodes, so the old comment in the code about test 14 in stressRooFit being slow doesn't apply anymore.

See also slide 12 and 13 on my RooFit AD ROOT users workshop talk for the flamegraphs that show how significant the RooFit bookkeeping was for minimizations with AD gradients.

@guitargeek guitargeek self-assigned this Feb 20, 2026
@guitargeek guitargeek changed the title [RF] Set OperMode::ADirty for all RooAbsArgs in RooFit::Evaluatur [RF] Set OperMode::ADirty for all RooAbsArgs in RooFit::Evaluator Feb 20, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Feb 21, 2026

Test Results

    22 files      22 suites   3d 11h 22m 14s ⏱️
 3 845 tests  3 840 ✅   1 💤 4 ❌
76 827 runs  76 688 ✅ 135 💤 4 ❌

For more details on these failures, see this check.

Results for commit 53aec08.

♻️ This comment has been updated with latest results.

@guitargeek guitargeek changed the title [RF] Set OperMode::ADirty for all RooAbsArgs in RooFit::Evaluator [RF] Disable redundant dirty-flag propagation during minimization Feb 21, 2026
Comment thread roofit/roofitcore/inc/RooAbsPdf.h Outdated
@guitargeek guitargeek force-pushed the ADirty branch 2 times, most recently from e386697 to 111908f Compare February 21, 2026 16:43
Comment thread roofit/batchcompute/res/RooNaNPacker.h Outdated
Comment thread roofit/roofitcore/res/RooFitImplHelpers.h Outdated
Comment thread roofit/roofitcore/src/RooFit/Evaluator.cxx Outdated
Comment thread tutorials/roofit/roofit/rf617_simulation_based_inference_multidimensional.py Outdated
When a likelihood is evaluated with the new `"cpu"` backend, the
`RooFit::Evaluator` fully manages dependency tracking and re-evaluation
of the computation graph. In this case, RooFit’s built-in dirty flag
propagation in RooAbsArg becomes redundant and introduces significant
overhead for large models.

This patch disables regular dirty state propagation for all
non-fundamental nodes in the Evaluator's computation graph by setting
their OperMode to `RooAbsArg::ADirty`. Fundamental nodes (e.g.
RooRealVar, RooCategory) are excluded because they are often shared with
other computation graphs outside the Evaluator (usually the original pdf
in the RooWorkspace).

To set the OperMode of *all* RooAbsArgs to `ADirty` during minimization,
while avoiding side effects outside the minimization scope, the dirty
flag propagation for the fundamental nodes is only disabled temporarily
in the RooMinimizer.

This commit drastically speeds up fits with AD in particular (up to 2 x
for large models), because with fast gradients, the dirty flag
propagation that determines which part of the compute graph needs to be
recomputed becomes the bottleneck. It was also redundant with a faster
"dirty state" bookkeeping mechanism in the `RooFit::Evaluator` class
itself.

At this point, there is no performance regression anymore when disabling
recursive dirty flag propagation for all evaluated nodes, so the old
comment in the code about test 14 in stressRooFit being slow doesn't
apply anymore.
Several places needed to record a set of operation-mode changes and
restore them later as a group, so it's better to have the
ChangeOperModeRAII act on groups of RooAbsArg to not have to create one
RAII object per arg.
Copy link
Copy Markdown
Member

@lmoneta lmoneta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Jonas for implementing this significant improvement, speeding up performances!

@guitargeek guitargeek merged commit d42a27c into root-project:master Apr 23, 2026
51 of 53 checks passed
@guitargeek guitargeek deleted the ADirty branch April 23, 2026 14:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants