[PERF] backport vectorization improvements by rrahir · Pull Request #8774 · odoo/o-spreadsheet

rrahir · 2026-05-22T05:33:42Z

Description:

description of this task, what is implemented and why it is implemented that way.

review checklist

robodoo · 2026-05-22T05:33:45Z

Each vectorized cell allocated a fresh `args` array via `args.map(...)`. For large vectorized formulas this produces one array allocation per output cell, hammering the GC. Allocate a single `argsBuffer` once and fill it in-place per iteration. Benchmark scenario: 1500 cells of `=SUM($C:$C="132")` against a 10k-row `C:C` column. evaluate all cells before: Mean: 2171.79 ms, StdErr: 12.84 ms, n=30 after: Mean: 1983.47 ms, StdErr: 12.82 ms, n=30 (vs prev: -9%) Task: 6222157

The inner loop re-evaluated vectorArgsType?.[k] via a switch for every arg of every vectorized cell. The arg's access pattern is fixed for the whole call, so resolve it once into a closure per arg and just invoke the closure inside the loop. Benchmark scenario: 1500 cells of `=SUM($C:$C="132")` against a 10k-row `C:C` column. evaluate all cells baseline: Mean: 2171.79 ms, StdErr: 12.84 ms, n=30 before: Mean: 1983.47 ms, StdErr: 12.82 ms, n=30 after: Mean: 1919.10 ms, StdErr: 12.66 ms, n=30 (vs prev: -3%, vs baseline: -12%) Task: 6222157

errorHandlingCompute called descr.getArgToFocus() on every arg of every vectorized cell. The arg→definition mapping is fixed for the whole call, so compute argDefinitions once in vectorizedCompute and share it with errorHandlingCompute via a closure variable. Benchmark scenario: 1500 cells of `=SUM($C:$C="132")` against a 10k-row `C:C` column. evaluate all cells baseline: Mean: 2171.79 ms, StdErr: 12.84 ms, n=30 before: Mean: 1919.10 ms, StdErr: 12.66 ms, n=30 after: Mean: 1677.95 ms, StdErr: 21.29 ms, n=30 (vs prev: -13%, vs baseline: -23%) Task: 6222157

Non-vectorized args have the same value for every cell, so they don't need to be written into argsBuffer on each iteration. Write them once during setup and only iterate over the truly vectorized slots in the hot loop. Benchmark scenario: 1500 cells of `=SUM($C:$C="132")` against a 10k-row `C:C` column. evaluate all cells baseline: Mean: 2171.79 ms, StdErr: 12.84 ms, n=30 before: Mean: 1677.95 ms, StdErr: 21.29 ms, n=30 after: Mean: 1528.36 ms, StdErr: 11.75 ms, n=30 (vs prev: -9%, vs baseline: -30%) Task: 6222157

Inlining the generateMatrix loop body directly into applyVectorization removes one callback invocation per cell and lets the engine specialize the loop. Benchmark scenario: 1500 cells of `=SUM($C:$C="132")` against a 10k-row `C:C` column. evaluate all cells baseline: Mean: 2171.79 ms, StdErr: 12.84 ms, n=30 before: Mean: 1528.36 ms, StdErr: 11.75 ms, n=30 after: Mean: 1458.47 ms, StdErr: 10.62 ms, n=30 (vs prev: -5%, vs baseline: -33%) Task: 6222157

Spreading an array is slower than a direct call with explicit arguments. argsBuffer.length is constant for the whole applyVectorization call, so dispatch once before the loops and use direct calls for 1–3 args, falling back to spread for higher arities. Benchmark scenario: 1500 cells of `=SUM($C:$C="132")` against a 10k-row `C:C` column. evaluate all cells baseline: Mean: 2171.79 ms, StdErr: 12.84 ms, n=30 before: Mean: 1458.47 ms, StdErr: 10.62 ms, n=30 after: Mean: 1378.92 ms, StdErr: 10.24 ms, n=30 (vs prev: -5%, vs baseline: -37%) Task: 6222157

rrahir · 2026-05-22T06:28:34Z

robodoo rebase-ff

robodoo · 2026-05-22T06:28:36Z

Merge method set to rebase and fast-forward.

hokolomopo

👋

My comments aren't about whether the code works or not, just readability. This part of the code is already complex, and becomes even more so. Performance is fine and all, but we should at least do the maximal effort to keep the code somewhat readable.

Additionally, these optimization are very low-level, and look very dependent of the JS engine. Sure V8 is good and all, but should our benchmark really be only "what happens on the current version of node" ? What about older/more recent V8 versions ? Firefox ? Safari ? Or even what happens on a more realistic test dataset.

hokolomopo · 2026-05-26T11:48:21Z

+  const argsBuffer: Arg[] = new Array(args.length);
+
+  const fillArgsBuffer = (i: number, j: number): void => {
+    for (let k = 0; k < args.length; k++) {


can you use real names for the variables? very hard to understand what this does. k is the argIndex, is i the col ? Or the colOffset ? It wasn't very clear before, but even worse now 😅

Talking about a buffer also sounds very esoteric IMO. It's just functionArgs/computeArgsAtPosition no ?

hokolomopo · 2026-05-26T11:59:47Z

+  // Shared across vectorizedCompute→errorHandlingCompute within a single (synchronous) call.
+  let currentArgDefinitions: ArgDefinition[] = [];


Seems a bit wonky to have currentArgDefinitions in this scope, and rely on the fact that we call the methods in the correct order so it is correctly defined in errorHandlingCompute. Can't you scope the array to vectorizedCompute, and pass it as argument of errorHandlingCompute ?

The variable could then also be named argDefinitions, w/o the current which I'm not sure what it refers to (current evaluation cycle ? current function call ?)

hokolomopo · 2026-05-26T12:03:04Z

+    for (let k = 0; k < nbVectorized; k++) {
+      argsBuffer[vectorizedIndices[k]] = argGetters[k](col, row);
    }


nitpick: I'd inline vectorizedIndices.length instead of creating another variables which adds even more confusion to this complex code

hokolomopo · 2026-05-26T12:08:12Z

+  const argGetters: ArgGetter[] = [];
+  const vectorizedIndices: number[] = []; // tracks which slots need updating each iteration.


Can you do a single array {argGetter; argIndex}[] instead ? Maybe it'd be a bit easier to undersand

hokolomopo · 2026-05-26T12:15:37Z

-    }
-    for (let k = 0; k < nbVectorized; k++) {
-      argsBuffer[vectorizedIndices[k]] = argGetters[k](col, row);
+  const result: Matrix<FunctionResultObject> = new Array(countVectorizedCol);


If inlining a loop really has such an impact on perf, we should investigate other places too ... Maybe computeFunctionToObject for example

hokolomopo · 2026-05-26T12:20:03Z

  }
  const nbVectorized = vectorizedIndices.length;

+  // Specialize the call for common arities — argsBuffer.length is constant for the whole call,


Not sure if "arities" is a real word (it's not recognized by my spell checker), and if it is I'm not sure if a comment that require a google search to understand is really useful ... (or maybe it's common knowledge, and I'm just bad at computer english)

it's a real word: https://en.wikipedia.org/wiki/Arity even though I agree it's not super common knowledge, it's exactly what it's used for here.

hokolomopo · 2026-05-26T12:22:37Z

+  type FormulaCall = () => Matrix<FunctionResultObject> | FunctionResultObject;
+  let callFormula: FormulaCall;
+  switch (argsBuffer.length) {
+    case 1:
+      callFormula = () => formula(argsBuffer[0]);
+      break;
+    case 2:
+      callFormula = () => formula(argsBuffer[0], argsBuffer[1]);
+      break;
+    case 3:
+      callFormula = () => formula(argsBuffer[0], argsBuffer[1], argsBuffer[2]);
+      break;
+    default:
+      callFormula = () => formula(...argsBuffer);
+  }


I mean if we stop caring that much about readability, and it's faster, why stop there at 3 ? Vlookup takes 4 arguments, Xlookup takes 6. Create a helper that returns a callback from 1 to 10 arguments, bind it, and call it a day.

hokolomopo · 2026-05-26T12:40:00Z

Oh it's a backport of master commits .... well my comments still stand. @LucasLefevre

LucasLefevre · 2026-05-26T13:30:27Z

@hokolomopo can be improved, but then I'd merge it as it is in 18.0, then directly refactor in 18.0 and forward-port, that would be probably easier

rrahir force-pushed the 18.0-perf-vectorization-rar branch 2 times, most recently from f8fb448 to 1ea02b3 Compare May 22, 2026 05:43

rrahir marked this pull request as ready for review May 22, 2026 05:44

LucasLefevre added 6 commits May 22, 2026 07:45

rrahir force-pushed the 18.0-perf-vectorization-rar branch from 1ea02b3 to c90a043 Compare May 22, 2026 05:46

hokolomopo reviewed May 26, 2026

View reviewed changes

		// Shared across vectorizedCompute→errorHandlingCompute within a single (synchronous) call.
		let currentArgDefinitions: ArgDefinition[] = [];

		const argGetters: ArgGetter[] = [];
		const vectorizedIndices: number[] = []; // tracks which slots need updating each iteration.

Conversation

rrahir commented May 22, 2026

Description:

review checklist

Uh oh!

robodoo commented May 22, 2026

Uh oh!

rrahir commented May 22, 2026

Uh oh!

robodoo commented May 22, 2026

Uh oh!

hokolomopo left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hokolomopo commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LucasLefevre commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

hokolomopo commented May 26, 2026 •

edited

Loading