Skip to content

[PERF] backport vectorization improvements#8774

Open
rrahir wants to merge 6 commits into
18.0from
18.0-perf-vectorization-rar
Open

[PERF] backport vectorization improvements#8774
rrahir wants to merge 6 commits into
18.0from
18.0-perf-vectorization-rar

Conversation

@rrahir
Copy link
Copy Markdown
Collaborator

@rrahir rrahir commented May 22, 2026

Description:

description of this task, what is implemented and why it is implemented that way.

Task: TASK_ID

review checklist

  • feature is organized in plugin, or UI components
  • support of duplicate sheet (deep copy)
  • in model/core: ranges are Range object, and can be adapted (adaptRanges)
  • in model/UI: ranges are strings (to show the user)
  • undo-able commands (uses this.history.update)
  • multiuser-able commands (has inverse commands and transformations where needed)
  • new/updated/removed commands are documented
  • exportable in excel
  • translations (_t("qmsdf %s", abc))
  • unit tested
  • clean commented code
  • track breaking changes
  • doc is rebuild (npm run doc)
  • status is correct in Odoo

@robodoo
Copy link
Copy Markdown
Collaborator

robodoo commented May 22, 2026

Pull request status dashboard

@rrahir rrahir force-pushed the 18.0-perf-vectorization-rar branch 2 times, most recently from f8fb448 to 1ea02b3 Compare May 22, 2026 05:43
@rrahir rrahir marked this pull request as ready for review May 22, 2026 05:44
Each vectorized cell allocated a fresh `args` array via `args.map(...)`.
For large vectorized formulas this produces one array allocation per
output cell, hammering the GC.

Allocate a single `argsBuffer` once and fill it in-place per iteration.

Benchmark scenario: 1500 cells of `=SUM($C:$C="132")` against a 10k-row `C:C` column.

evaluate all cells
  before:   Mean: 2171.79 ms, StdErr: 12.84 ms, n=30
  after:    Mean: 1983.47 ms, StdErr: 12.82 ms, n=30 (vs prev: -9%)

Task: 6222157
The inner loop re-evaluated vectorArgsType?.[k] via a switch for every
arg of every vectorized cell. The arg's access pattern is fixed for the
whole call, so resolve it once into a closure per arg and just invoke
the closure inside the loop.

Benchmark scenario: 1500 cells of `=SUM($C:$C="132")` against a 10k-row `C:C` column.

evaluate all cells
  baseline: Mean: 2171.79 ms, StdErr: 12.84 ms, n=30
  before:   Mean: 1983.47 ms, StdErr: 12.82 ms, n=30
  after:    Mean: 1919.10 ms, StdErr: 12.66 ms, n=30 (vs prev: -3%, vs baseline: -12%)

Task: 6222157
errorHandlingCompute called descr.getArgToFocus() on every arg of every
vectorized cell. The arg→definition mapping is fixed for the whole call,
so compute argDefinitions once in vectorizedCompute and share it with
errorHandlingCompute via a closure variable.

Benchmark scenario: 1500 cells of `=SUM($C:$C="132")` against a 10k-row `C:C` column.

evaluate all cells
  baseline: Mean: 2171.79 ms, StdErr: 12.84 ms, n=30
  before:   Mean: 1919.10 ms, StdErr: 12.66 ms, n=30
  after:    Mean: 1677.95 ms, StdErr: 21.29 ms, n=30 (vs prev: -13%, vs baseline: -23%)

Task: 6222157
Non-vectorized args have the same value for every cell, so they don't
need to be written into argsBuffer on each iteration. Write them once
during setup and only iterate over the truly vectorized slots in the
hot loop.

Benchmark scenario: 1500 cells of `=SUM($C:$C="132")` against a 10k-row `C:C` column.

evaluate all cells
  baseline: Mean: 2171.79 ms, StdErr: 12.84 ms, n=30
  before:   Mean: 1677.95 ms, StdErr: 21.29 ms, n=30
  after:    Mean: 1528.36 ms, StdErr: 11.75 ms, n=30 (vs prev: -9%, vs baseline: -30%)

Task: 6222157
Inlining the generateMatrix loop body directly into applyVectorization
removes one callback invocation per cell and lets the engine specialize
the loop.

Benchmark scenario: 1500 cells of `=SUM($C:$C="132")` against a 10k-row `C:C` column.

evaluate all cells
  baseline: Mean: 2171.79 ms, StdErr: 12.84 ms, n=30
  before:   Mean: 1528.36 ms, StdErr: 11.75 ms, n=30
  after:    Mean: 1458.47 ms, StdErr: 10.62 ms, n=30 (vs prev: -5%, vs baseline: -33%)

Task: 6222157
Spreading an array is slower than a direct call with explicit arguments.
argsBuffer.length is constant for the whole applyVectorization call, so
dispatch once before the loops and use direct calls for 1–3 args,
falling back to spread for higher arities.

Benchmark scenario: 1500 cells of `=SUM($C:$C="132")` against a 10k-row `C:C` column.

evaluate all cells
  baseline: Mean: 2171.79 ms, StdErr: 12.84 ms, n=30
  before:   Mean: 1458.47 ms, StdErr: 10.62 ms, n=30
  after:    Mean: 1378.92 ms, StdErr: 10.24 ms, n=30 (vs prev: -5%, vs baseline: -37%)

Task: 6222157
@rrahir rrahir force-pushed the 18.0-perf-vectorization-rar branch from 1ea02b3 to c90a043 Compare May 22, 2026 05:46
@rrahir
Copy link
Copy Markdown
Collaborator Author

rrahir commented May 22, 2026

robodoo rebase-ff

@robodoo
Copy link
Copy Markdown
Collaborator

robodoo commented May 22, 2026

Merge method set to rebase and fast-forward.

Copy link
Copy Markdown
Contributor

@hokolomopo hokolomopo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👋

My comments aren't about whether the code works or not, just readability. This part of the code is already complex, and becomes even more so. Performance is fine and all, but we should at least do the maximal effort to keep the code somewhat readable.

Additionally, these optimization are very low-level, and look very dependent of the JS engine. Sure V8 is good and all, but should our benchmark really be only "what happens on the current version of node" ? What about older/more recent V8 versions ? Firefox ? Safari ? Or even what happens on a more realistic test dataset.

Comment thread src/functions/helpers.ts Outdated
const argsBuffer: Arg[] = new Array(args.length);

const fillArgsBuffer = (i: number, j: number): void => {
for (let k = 0; k < args.length; k++) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use real names for the variables? very hard to understand what this does. k is the argIndex, is i the col ? Or the colOffset ? It wasn't very clear before, but even worse now 😅

Talking about a buffer also sounds very esoteric IMO. It's just functionArgs/computeArgsAtPosition no ?

Comment thread src/functions/index.ts
Comment on lines +116 to +117
// Shared across vectorizedCompute→errorHandlingCompute within a single (synchronous) call.
let currentArgDefinitions: ArgDefinition[] = [];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems a bit wonky to have currentArgDefinitions in this scope, and rely on the fact that we call the methods in the correct order so it is correctly defined in errorHandlingCompute. Can't you scope the array to vectorizedCompute, and pass it as argument of errorHandlingCompute ?

The variable could then also be named argDefinitions, w/o the current which I'm not sure what it refers to (current evaluation cycle ? current function call ?)

Comment thread src/functions/helpers.ts Outdated
Comment on lines 595 to 597
for (let k = 0; k < nbVectorized; k++) {
argsBuffer[vectorizedIndices[k]] = argGetters[k](col, row);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: I'd inline vectorizedIndices.length instead of creating another variables which adds even more confusion to this complex code

Comment thread src/functions/helpers.ts
Comment on lines +562 to +563
const argGetters: ArgGetter[] = [];
const vectorizedIndices: number[] = []; // tracks which slots need updating each iteration.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you do a single array {argGetter; argIndex}[] instead ? Maybe it'd be a bit easier to undersand

Comment thread src/functions/helpers.ts
}
for (let k = 0; k < nbVectorized; k++) {
argsBuffer[vectorizedIndices[k]] = argGetters[k](col, row);
const result: Matrix<FunctionResultObject> = new Array(countVectorizedCol);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If inlining a loop really has such an impact on perf, we should investigate other places too ... Maybe computeFunctionToObject for example

Comment thread src/functions/helpers.ts
}
const nbVectorized = vectorizedIndices.length;

// Specialize the call for common arities — argsBuffer.length is constant for the whole call,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if "arities" is a real word (it's not recognized by my spell checker), and if it is I'm not sure if a comment that require a google search to understand is really useful ... (or maybe it's common knowledge, and I'm just bad at computer english)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a real word: https://en.wikipedia.org/wiki/Arity even though I agree it's not super common knowledge, it's exactly what it's used for here.

Comment thread src/functions/helpers.ts
Comment on lines +591 to +605
type FormulaCall = () => Matrix<FunctionResultObject> | FunctionResultObject;
let callFormula: FormulaCall;
switch (argsBuffer.length) {
case 1:
callFormula = () => formula(argsBuffer[0]);
break;
case 2:
callFormula = () => formula(argsBuffer[0], argsBuffer[1]);
break;
case 3:
callFormula = () => formula(argsBuffer[0], argsBuffer[1], argsBuffer[2]);
break;
default:
callFormula = () => formula(...argsBuffer);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean if we stop caring that much about readability, and it's faster, why stop there at 3 ? Vlookup takes 4 arguments, Xlookup takes 6. Create a helper that returns a callback from 1 to 10 arguments, bind it, and call it a day.

@hokolomopo
Copy link
Copy Markdown
Contributor

hokolomopo commented May 26, 2026

Oh it's a backport of master commits .... well my comments still stand. @LucasLefevre

@LucasLefevre
Copy link
Copy Markdown
Collaborator

@hokolomopo can be improved, but then I'd merge it as it is in 18.0, then directly refactor in 18.0 and forward-port, that would be probably easier

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants