Summary
SparseColumnPivotedQR.jl takes SparseMatrixCSR at its API boundary, but the entire numeric and symbolic machinery is CSC internally — the CSR requirement is a naming/convention artifact, not an algorithmic one. This forces consumers (e.g. LinearSolve.jl) to do a redundant, allocating CSC → transpose → CSR → CSC round-trip on every solve, which defeats the package's otherwise zero-allocation refactor design.
Evidence (the kernel is CSC end-to-end)
All refs in src/SparseColumnPivotedQR.jl:
_factor_kernel immediately converts the incoming CSR back to CSC via _csr_to_csc_with_norms! (call at line ~1085, defn ~1295) before any work.
- Symbolic phase is CSC:
_build_symbolic → _csr_pattern_to_csc_permuted (~667), column elimination tree _coletree_ata (~571), _cs_counts (~537) all operate on CSC colptr/rowval (etree of AᵀA from CSC columns).
- Numeric kernel
_csc_qr_numeric! (~1562) is literally CSC-named: reads S = PAQ from colptr_S/rowval_S/nzval_S, emits V and R as CSC; walks columns of S.
- The factorization stores V and R as CSC;
_usolve! (~2271) comments "Davis-style usolve on a CSC upper triangular"; _apply_QH!/_apply_Q! gather/scatter over CSC columns.
- Only
_capture_pattern/_pattern_matches (~476/489) touch CSR rowptr/colval; trivially adaptable to CSC.
Davis CSparse (cs) is compressed-column; SuiteSparse qr(::SparseMatrixCSC) is CSC-native. So CSC is the faithful boundary here; CSR is the deviation. (Note SparseMatrixCSR(A) is bit-for-bit the CSC of Aᵀ.)
Cost (Julia 1.10, sprandn(n,n,dens)+I)
SparseMatricesCSR's SparseMatrixCSR(::SparseMatrixCSC) does a full materializing transpose-of-transpose; the kernel then transposes back to CSC. Boundary conversion vs. a bare CSC copy:
| n |
nnz |
CSC→CSR convert |
plain copy(CSC) |
| 100 |
589 |
4.95 µs / 21,376 B |
1.24 µs |
| 500 |
5.5k |
36.5 µs / 185,696 B |
8.5 µs |
| 2000 |
22k |
159 µs / 739,936 B |
31 µs |
≈4–5× slower and ~20 KB–740 KB allocated per solve. Negligible vs. factor time, but it reintroduces a large per-solve heap allocation on the csr_refactor! steady-state path that the workspace-pool design otherwise makes allocation-free.
Recommendation
Add a CSC method at the boundary — csr_qr(::SparseMatrixCSC) / csr_factor / csr_refactor! accepting SparseMatrixCSC directly, with _csr_to_csc_with_norms! specialized to read CSC arrays in place (no transpose, no allocation). Keep the existing SparseMatrixCSR methods for back-compat. The numeric kernel needs no change (it is already CSC). Downstream, LinearSolve.jl's _scpqr_csr would then pass the CSC straight through, eliminating the per-solve allocation and aligning the boundary with the algorithm's true format and with SuiteSparse.
Investigation done in the course of integrating this solver as the default sparse QR in LinearSolve.jl (SciML/LinearSolve.jl#1007).
Summary
SparseColumnPivotedQR.jltakesSparseMatrixCSRat its API boundary, but the entire numeric and symbolic machinery is CSC internally — the CSR requirement is a naming/convention artifact, not an algorithmic one. This forces consumers (e.g. LinearSolve.jl) to do a redundant, allocatingCSC → transpose → CSR → CSCround-trip on every solve, which defeats the package's otherwise zero-allocation refactor design.Evidence (the kernel is CSC end-to-end)
All refs in
src/SparseColumnPivotedQR.jl:_factor_kernelimmediately converts the incoming CSR back to CSC via_csr_to_csc_with_norms!(call at line ~1085, defn ~1295) before any work._build_symbolic→_csr_pattern_to_csc_permuted(~667), column elimination tree_coletree_ata(~571),_cs_counts(~537) all operate on CSCcolptr/rowval(etree of AᵀA from CSC columns)._csc_qr_numeric!(~1562) is literally CSC-named: readsS = PAQfromcolptr_S/rowval_S/nzval_S, emits V and R as CSC; walks columns of S._usolve!(~2271) comments "Davis-style usolve on a CSC upper triangular";_apply_QH!/_apply_Q!gather/scatter over CSC columns._capture_pattern/_pattern_matches(~476/489) touch CSRrowptr/colval; trivially adaptable to CSC.Davis CSparse (
cs) is compressed-column; SuiteSparseqr(::SparseMatrixCSC)is CSC-native. So CSC is the faithful boundary here; CSR is the deviation. (NoteSparseMatrixCSR(A)is bit-for-bit the CSC ofAᵀ.)Cost (Julia 1.10,
sprandn(n,n,dens)+I)SparseMatricesCSR'sSparseMatrixCSR(::SparseMatrixCSC)does a full materializing transpose-of-transpose; the kernel then transposes back to CSC. Boundary conversion vs. a bare CSC copy:≈4–5× slower and ~20 KB–740 KB allocated per solve. Negligible vs. factor time, but it reintroduces a large per-solve heap allocation on the
csr_refactor!steady-state path that the workspace-pool design otherwise makes allocation-free.Recommendation
Add a CSC method at the boundary —
csr_qr(::SparseMatrixCSC)/csr_factor/csr_refactor!acceptingSparseMatrixCSCdirectly, with_csr_to_csc_with_norms!specialized to read CSC arrays in place (no transpose, no allocation). Keep the existingSparseMatrixCSRmethods for back-compat. The numeric kernel needs no change (it is already CSC). Downstream, LinearSolve.jl's_scpqr_csrwould then pass the CSC straight through, eliminating the per-solve allocation and aligning the boundary with the algorithm's true format and with SuiteSparse.Investigation done in the course of integrating this solver as the default sparse QR in LinearSolve.jl (SciML/LinearSolve.jl#1007).