Accept SparseMatrixCSC natively — internal kernel is already CSC; CSR boundary forces a redundant transpose-of-transpose

## Summary

`SparseColumnPivotedQR.jl` takes `SparseMatrixCSR` at its API boundary, but the entire numeric and symbolic machinery is **CSC** internally — the CSR requirement is a naming/convention artifact, not an algorithmic one. This forces consumers (e.g. LinearSolve.jl) to do a redundant, allocating `CSC → transpose → CSR → CSC` round-trip on every solve, which defeats the package's otherwise zero-allocation refactor design.

## Evidence (the kernel is CSC end-to-end)

All refs in `src/SparseColumnPivotedQR.jl`:

- `_factor_kernel` immediately converts the incoming CSR back to CSC via `_csr_to_csc_with_norms!` (call at line ~1085, defn ~1295) before any work.
- Symbolic phase is CSC: `_build_symbolic` → `_csr_pattern_to_csc_permuted` (~667), column elimination tree `_coletree_ata` (~571), `_cs_counts` (~537) all operate on CSC `colptr`/`rowval` (etree of AᵀA from CSC columns).
- Numeric kernel `_csc_qr_numeric!` (~1562) is literally CSC-named: reads `S = PAQ` from `colptr_S`/`rowval_S`/`nzval_S`, emits V and R as CSC; walks columns of S.
- The factorization stores V and R as CSC; `_usolve!` (~2271) comments "Davis-style usolve on a CSC upper triangular"; `_apply_QH!`/`_apply_Q!` gather/scatter over CSC columns.
- Only `_capture_pattern`/`_pattern_matches` (~476/489) touch CSR `rowptr`/`colval`; trivially adaptable to CSC.

Davis CSparse (`cs`) is compressed-**column**; SuiteSparse `qr(::SparseMatrixCSC)` is CSC-native. So CSC is the *faithful* boundary here; CSR is the deviation. (Note `SparseMatrixCSR(A)` is bit-for-bit the CSC of `Aᵀ`.)

## Cost (Julia 1.10, `sprandn(n,n,dens)+I`)

`SparseMatricesCSR`'s `SparseMatrixCSR(::SparseMatrixCSC)` does a full materializing transpose-of-transpose; the kernel then transposes back to CSC. Boundary conversion vs. a bare CSC copy:

| n | nnz | CSC→CSR convert | plain copy(CSC) |
|---|-----|-----------------|-----------------|
| 100 | 589 | 4.95 µs / 21,376 B | 1.24 µs |
| 500 | 5.5k | 36.5 µs / 185,696 B | 8.5 µs |
| 2000 | 22k | 159 µs / 739,936 B | 31 µs |

≈4–5× slower and ~20 KB–740 KB allocated per solve. Negligible vs. factor time, but it reintroduces a large per-solve heap allocation on the `csr_refactor!` steady-state path that the workspace-pool design otherwise makes allocation-free.

## Recommendation

Add a CSC method at the boundary — `csr_qr(::SparseMatrixCSC)` / `csr_factor` / `csr_refactor!` accepting `SparseMatrixCSC` directly, with `_csr_to_csc_with_norms!` specialized to read CSC arrays in place (no transpose, no allocation). Keep the existing `SparseMatrixCSR` methods for back-compat. The numeric kernel needs no change (it is already CSC). Downstream, LinearSolve.jl's `_scpqr_csr` would then pass the CSC straight through, eliminating the per-solve allocation and aligning the boundary with the algorithm's true format and with SuiteSparse.

_Investigation done in the course of integrating this solver as the default sparse QR in LinearSolve.jl (SciML/LinearSolve.jl#1007)._


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accept SparseMatrixCSC natively — internal kernel is already CSC; CSR boundary forces a redundant transpose-of-transpose #33

Summary

Evidence (the kernel is CSC end-to-end)

Cost (Julia 1.10, `sprandn(n,n,dens)+I`)

Recommendation

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

n	nnz	CSC→CSR convert	plain copy(CSC)
100	589	4.95 µs / 21,376 B	1.24 µs
500	5.5k	36.5 µs / 185,696 B	8.5 µs
2000	22k	159 µs / 739,936 B	31 µs

Accept SparseMatrixCSC natively — internal kernel is already CSC; CSR boundary forces a redundant transpose-of-transpose #33

Description

Summary

Evidence (the kernel is CSC end-to-end)

Cost (Julia 1.10, sprandn(n,n,dens)+I)

Recommendation

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Cost (Julia 1.10, `sprandn(n,n,dens)+I`)