Skip to content

Accept SparseMatrixCSC natively — internal kernel is already CSC; CSR boundary forces a redundant transpose-of-transpose #33

Description

@ChrisRackauckas-Claude

Summary

SparseColumnPivotedQR.jl takes SparseMatrixCSR at its API boundary, but the entire numeric and symbolic machinery is CSC internally — the CSR requirement is a naming/convention artifact, not an algorithmic one. This forces consumers (e.g. LinearSolve.jl) to do a redundant, allocating CSC → transpose → CSR → CSC round-trip on every solve, which defeats the package's otherwise zero-allocation refactor design.

Evidence (the kernel is CSC end-to-end)

All refs in src/SparseColumnPivotedQR.jl:

  • _factor_kernel immediately converts the incoming CSR back to CSC via _csr_to_csc_with_norms! (call at line ~1085, defn ~1295) before any work.
  • Symbolic phase is CSC: _build_symbolic_csr_pattern_to_csc_permuted (~667), column elimination tree _coletree_ata (~571), _cs_counts (~537) all operate on CSC colptr/rowval (etree of AᵀA from CSC columns).
  • Numeric kernel _csc_qr_numeric! (~1562) is literally CSC-named: reads S = PAQ from colptr_S/rowval_S/nzval_S, emits V and R as CSC; walks columns of S.
  • The factorization stores V and R as CSC; _usolve! (~2271) comments "Davis-style usolve on a CSC upper triangular"; _apply_QH!/_apply_Q! gather/scatter over CSC columns.
  • Only _capture_pattern/_pattern_matches (~476/489) touch CSR rowptr/colval; trivially adaptable to CSC.

Davis CSparse (cs) is compressed-column; SuiteSparse qr(::SparseMatrixCSC) is CSC-native. So CSC is the faithful boundary here; CSR is the deviation. (Note SparseMatrixCSR(A) is bit-for-bit the CSC of Aᵀ.)

Cost (Julia 1.10, sprandn(n,n,dens)+I)

SparseMatricesCSR's SparseMatrixCSR(::SparseMatrixCSC) does a full materializing transpose-of-transpose; the kernel then transposes back to CSC. Boundary conversion vs. a bare CSC copy:

n nnz CSC→CSR convert plain copy(CSC)
100 589 4.95 µs / 21,376 B 1.24 µs
500 5.5k 36.5 µs / 185,696 B 8.5 µs
2000 22k 159 µs / 739,936 B 31 µs

≈4–5× slower and ~20 KB–740 KB allocated per solve. Negligible vs. factor time, but it reintroduces a large per-solve heap allocation on the csr_refactor! steady-state path that the workspace-pool design otherwise makes allocation-free.

Recommendation

Add a CSC method at the boundary — csr_qr(::SparseMatrixCSC) / csr_factor / csr_refactor! accepting SparseMatrixCSC directly, with _csr_to_csc_with_norms! specialized to read CSC arrays in place (no transpose, no allocation). Keep the existing SparseMatrixCSR methods for back-compat. The numeric kernel needs no change (it is already CSC). Downstream, LinearSolve.jl's _scpqr_csr would then pass the CSC straight through, eliminating the per-solve allocation and aligning the boundary with the algorithm's true format and with SuiteSparse.

Investigation done in the course of integrating this solver as the default sparse QR in LinearSolve.jl (SciML/LinearSolve.jl#1007).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions