Skip to content

[Perf] Add qd.field_array(N, dtype) for indexed @qd.dataclass fields#712

Draft
hughperkins wants to merge 1 commit into
mainfrom
hp/qd-field-array
Draft

[Perf] Add qd.field_array(N, dtype) for indexed @qd.dataclass fields#712
hughperkins wants to merge 1 commit into
mainfrom
hp/qd-field-array

Conversation

@hughperkins
Copy link
Copy Markdown
Collaborator

Adds a new qd.field_array(N, dtype) annotation for @qd.dataclass that exposes a logical N-element array as obj.r[i] while storing N individually- named synthetic scalar fields (_r0..._r{N-1}) under the hood. For python-int indices (including qd.static(range(N))-unrolled loop variables), the AST transformer rewrites obj.r[i] directly to obj._r{i}, so the generated LLVM IR / PTX is byte-identical to a hand-rolled named-field struct.

Motivation: today's idiomatic r: qd.types.vector(N, dtype) group field leaves an alloca that LLVM SROA can't decompose once register pressure crosses a threshold (e.g. two concurrent tiles in a Cholesky+TRSM kernel), causing runtime regressions via local-memory spills. The named-field cascade pattern avoids the spill but balloons source size (32-way if k == N: self.rN = val write cascades duplicated at every callsite). field_array collapses those cascades to one AST node per callsite while preserving the named-field IR.

Changes:

  • lang/struct.py: FieldArray type wrapper, field_array(count, dtype) constructor, expansion in StructType.__init__ (synthetic field names plus _field_groups metadata), propagation in StructType.__call__, _FieldArrayRef transient proxy.
  • lang/impl.py: preserve _qd_field_groups across the Struct rewrap in expr_init.
  • lang/ast/ast_transformer.py: build_Attribute returns a _FieldArrayRef for group access; build_Subscript resolves it to a direct field reference for python-int indices.
  • tests/python/test_field_array.py: 5 tests covering construction, static python-int index, qd.static loop-var index, runtime-index rejection (clear error), and static-index OOB rejection.

Runtime-int indexing is intentionally rejected with a friendly error pointing at qd.static; existing cascade helpers continue to handle the runtime case by spelling out the _rN fields directly. Adding runtime-int support is a small follow-up.

Verified on a field_array port of genesis _tile32.py: PTX byte-identical to the named-field S1 baseline (modulo the per-session-nonce comment) on both chol_kernel and chol_trsm_kernel; zero local-memory spills (S1: 0/0, FA: 0/0, F4-A vector-field variant: 42/97); 25% compile-time reduction on the single-tile harness (5.60s -> 4.19s, 3-run mean). Source dropped from 1068 to 515 lines (-52%). Full writeup in perso_hugh/doc/qd_field_array_2026may23.md.

All 201 tests in test_py_dataclass.py + test_complex_struct.py + test_struct.py continue to pass; the 5 new tests pass in 1.76s total.

Issue: #

Brief Summary

copilot:summary

Walkthrough

copilot:walkthrough

…ields

Adds a new ``qd.field_array(N, dtype)`` annotation for ``@qd.dataclass`` that
exposes a logical N-element array as ``obj.r[i]`` while storing N individually-
named synthetic scalar fields (``_r0..._r{N-1}``) under the hood. For python-int
indices (including ``qd.static(range(N))``-unrolled loop variables), the AST
transformer rewrites ``obj.r[i]`` directly to ``obj._r{i}``, so the generated
LLVM IR / PTX is byte-identical to a hand-rolled named-field struct.

Motivation: today's idiomatic ``r: qd.types.vector(N, dtype)`` group field
leaves an alloca that LLVM SROA can't decompose once register pressure crosses
a threshold (e.g. two concurrent tiles in a Cholesky+TRSM kernel), causing
runtime regressions via local-memory spills. The named-field cascade pattern
avoids the spill but balloons source size (32-way ``if k == N: self.rN = val``
write cascades duplicated at every callsite). ``field_array`` collapses those
cascades to one AST node per callsite while preserving the named-field IR.

Changes:
- ``lang/struct.py``: ``FieldArray`` type wrapper, ``field_array(count, dtype)``
  constructor, expansion in ``StructType.__init__`` (synthetic field names plus
  ``_field_groups`` metadata), propagation in ``StructType.__call__``,
  ``_FieldArrayRef`` transient proxy.
- ``lang/impl.py``: preserve ``_qd_field_groups`` across the ``Struct`` rewrap
  in ``expr_init``.
- ``lang/ast/ast_transformer.py``: ``build_Attribute`` returns a
  ``_FieldArrayRef`` for group access; ``build_Subscript`` resolves it to a
  direct field reference for python-int indices.
- ``tests/python/test_field_array.py``: 5 tests covering construction, static
  python-int index, qd.static loop-var index, runtime-index rejection (clear
  error), and static-index OOB rejection.

Runtime-int indexing is intentionally rejected with a friendly error pointing
at ``qd.static``; existing cascade helpers continue to handle the runtime case
by spelling out the ``_rN`` fields directly. Adding runtime-int support is a
small follow-up.

Verified on a field_array port of genesis ``_tile32.py``: PTX byte-identical to
the named-field S1 baseline (modulo the per-session-nonce comment) on both
``chol_kernel`` and ``chol_trsm_kernel``; zero local-memory spills (S1: 0/0,
FA: 0/0, F4-A vector-field variant: 42/97); 25% compile-time reduction on the
single-tile harness (5.60s -> 4.19s, 3-run mean). Source dropped from 1068 to
515 lines (-52%). Full writeup in perso_hugh/doc/qd_field_array_2026may23.md.

All 201 tests in test_py_dataclass.py + test_complex_struct.py + test_struct.py
continue to pass; the 5 new tests pass in 1.76s total.
@hughperkins
Copy link
Copy Markdown
Collaborator Author

Need some user-facing doc.

@hughperkins
Copy link
Copy Markdown
Collaborator Author

(Also, might want to revisit the name)

@github-actions
Copy link
Copy Markdown

@github-actions
Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant