Skip to content

Add incremental sparse RREF C-API (libsparse_rref_inc)#1

Open
thibautbar wants to merge 9 commits into
masterfrom
incremental-api
Open

Add incremental sparse RREF C-API (libsparse_rref_inc)#1
thibautbar wants to merge 9 commits into
masterfrom
incremental-api

Conversation

@thibautbar

@thibautbar thibautbar commented May 4, 2026

Copy link
Copy Markdown
Owner

Summary

Adds a stateful, row-by-row sparse RREF API alongside the existing one-shot
sparse_mat_rref path, exposed as a separate translation unit
sprref_incremental.{cpp,h} that builds into a shared library
libsparse_rref_inc.{so,dylib}.

The motivation comes from project-numina/project-feynman:
SparseRREFBatchSolver there used to drive the sparserref CLI binary for
end-of-run RREF. That works great when is_solved is queried only once at
the end (check_at_end=True), but for flows that need per-step is_solved
(early termination, FIRE6-style dynamic seed expansion, intermediate logs,
pivot tracking — check_at_end=False) re-running the batch binary on every
step is O(N²). This branch lets project-feynman keep a persistent in-memory
basis that's updated row-by-row.

The legacy CLI (main.cpp) is unchanged. This is purely additive.

Public C API

sprref_inc_t* sprref_inc_init(uint64_t field_order, int n_threads);
void          sprref_inc_free(sprref_inc_t*);

void sprref_inc_resize_nvars(sprref_inc_t*, uint32_t);
void sprref_inc_set_masters(sprref_inc_t*, const uint32_t* cols, size_t n);
void sprref_inc_set_pivot_order(sprref_inc_t*, const uint32_t* cols, const uint64_t* keys, size_t n);
void sprref_inc_add_pivot_keys(sprref_inc_t*, const uint32_t* cols, const uint64_t* keys, size_t n);
void sprref_inc_add_pivot_keys2(sprref_inc_t*, const uint32_t* cols, const uint64_t* k0, const uint64_t* k1, size_t n);

/* Single-call insert (uses the C-side pivot key map). */
int  sprref_inc_insert(sprref_inc_t*, const uint32_t* cols, const uint64_t* vals,
                       size_t nnz, uint64_t rhs, uint32_t preferred_pivot);

/* Two-phase insert: forward elim in C, pivot choice in caller, then commit.
   Designed for callers whose natural pivot ordering is too rich to encode
   in fixed-width C-side keys (e.g. Python tuple Laporta keys). */
int  sprref_inc_insert_forward(sprref_inc_t*, const uint32_t* cols, const uint64_t* vals,
                               size_t nnz, uint64_t rhs,
                               uint32_t** out_candidate_cols, size_t* out_n);
void sprref_inc_commit_pivot(sprref_inc_t*, uint32_t pivot_col);
void sprref_inc_abort_pending(sprref_inc_t*);
void sprref_inc_buffer_free_u32(uint32_t* ptr);

/* Query: returns NOT_PIVOT / HAS_FREE / SOLVED + the stored row's expression. */
int  sprref_inc_is_solved(sprref_inc_t*, uint32_t var_idx,
                          const uint32_t* acceptable_free, size_t n_free,
                          uint32_t** out_cols, uint64_t** out_vals,
                          size_t* out_nnz, uint64_t* out_rhs);
void sprref_inc_buffer_free(uint32_t* cols, uint64_t* vals);

size_t   sprref_inc_rank(const sprref_inc_t*);
uint32_t sprref_inc_nvars(const sprref_inc_t*);
const char* sprref_inc_version(void);

State is held entirely in the opaque sprref_inc_t* handle (no globals), so
multiple handles can coexist for multiprocessing.Pool workers.

How it works

  • Forward elimination loops over pivot columns present in the incoming
    row, saxpy-ing against each basis row until none remain. Modular arithmetic
    uses 128-bit intermediate to support primes up to 2^63; modular inverse via
    FLINT's n_invmod.
  • Pivot choice:
    • The single-call insert uses the C-side pivot key map (smallest
      lex (key0, key1) wins, masters excluded, column index breaks ties).
    • The two-phase insert_forward returns the candidate columns and lets
      the caller (e.g. Python) pick the pivot using its native ordering.
  • Eager backsub: after a new pivot is chosen and normalised, every
    existing basis row touching that column is saxpy-ed to zero out the entry.
    Result: the basis is in true RREF after every commit, making is_solved
    a near-O(1) lookup.
  • Reverse index col_to_pivots: col → set<pivot_col> lets eager backsub
    find affected basis rows in O(degree) instead of scanning the full basis.

Build

build_incremental.sh produces the shared library:

bash build_incremental.sh
# Output: libsparse_rref_inc.{so,dylib} in the current directory.

Honors CXX, FLINT_PREFIX, GMP_PREFIX env vars. Defaults to:

  • g++-15 on macOS (Apple Clang lacks <execution> parallel STL — same
    caveat as main.cpp)
  • g++ on Linux
  • FLINT_PREFIX=/opt/homebrew on macOS, auto-detect on Linux
    (/usr/include/flint → empty / system search paths; else /usr/local)

Tested on macOS arm64 (Homebrew) and intended for Ubuntu (apt
libflint-dev libgmp-dev libtbb-dev); the Linux side is exercised by
project-feynman's CI.

Performance (project-feynman play_tri2lx 7D, 5696 steps, mod 2017)

backend wall time step count notes
sparse_rref_batch (CLI binary, batch) 13.75s 5696 reference; check_at_end=True
sparse_rref_batch + mode=incremental 15.55s 5696 this PR; check_at_end=False
spotlight_solver_sparse (pure Python) 20.73s 5696 per-step Python sparse RREF

→ Within ~13% of the highly-optimised batch path while supporting per-step
queries; ~25% faster than the pure-Python equivalent.

Pivot decisions match SpotlightSolverSparseGF bit-for-bit (verified by
project-feynman's BL2EM end-to-end test, which compares the resolved
target's coefficients on each master).

Commits in this branch

  1. 84ec9a3 — Scaffolding (M1): API surface, stub implementation, build script
  2. 5028015 — Forward elimination + insert + eager backsub (M2)
  3. 119adccis_solved + expression extraction (M3)
  4. fb100a7 — 3-way is_solved return (NOT_PIVOT / HAS_FREE / SOLVED)
  5. 077f23cadd_pivot_keys for delta upserts (avoid full-replace traffic)
  6. fb8160dbuild_incremental.sh Linux/Ubuntu auto-detect
  7. 5f501f2 — Two-phase insert_forward + commit_pivot (caller-side pivot choice)

Notes for future work

  • Numerator-powers tuple in C-side ordering: the optional 2-uint64 key
    pair (add_pivot_keys2) packs scalars + denom_powers but drops
    numerator_powers. Workloads that need that field for tie-breaking should
    prefer the two-phase API. A byte-buffer key API could be added if needed.
  • Multi-threading: n_threads is accepted but unused (M2 is
    single-threaded); FLINT's parallel kernels could be wired into saxpy /
    backsub later.
  • Upstream PR: this is a self-PR on the fork to track the work for
    project-feynman; opening an upstream PR against munuxi/sparse_rref is
    a separate follow-up if you want this contributed back.

thibautbar and others added 7 commits May 4, 2026 14:42
Introduces a new translation unit `sprref_incremental.cpp` (with public
header `sprref_incremental.h`) exposing a C-callable API for stateful,
row-by-row sparse RREF over GF(p):

  sprref_inc_init / free
  sprref_inc_resize_nvars / set_masters / set_pivot_order
  sprref_inc_insert
  sprref_inc_is_solved / buffer_free
  sprref_inc_rank / nvars / version

State lives in an opaque sprref_inc_t* handle (no globals; multiple
handles can coexist for mp.Pool workers). Designed to be loaded from
Python via ctypes by project-feynman's SparseRREFBatchSolver in
"incremental" mode, complementing the existing batch CLI flow.

This commit is the M1 scaffolding only: the API surface compiles into
a shared library and round-trips through ctypes, but the algorithmic
core (forward elimination, lazy backward substitution, expression
extraction) is stubbed and will land in M2/M3.

Build: bash build_incremental.sh (uses g++-15 + flint + gmp + tbb on
macOS arm64; honors CXX/FLINT_PREFIX/GMP_PREFIX env vars).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fills in the algorithmic core left stubbed by M1:

  forward_eliminate  loops over pivot columns present in the incoming
                     row, saxpy'ing against each basis row until none
                     remain. Robust to any RREF state (true RREF or
                     row-echelon).

  choose_pivot       honors the new preferred_pivot parameter (matches
                     SpotlightSolverSparseGF semantics — used for sector
                     operator relations); otherwise picks the lowest
                     pivot-key non-master column (column index breaks
                     ties for determinism).

  normalize_around_pivot
                     scales the row by inverse(row[pivot]) mod p
                     using FLINT's n_invmod, then drops the pivot
                     entry (its coefficient is implicitly 1).

  eager backsub      for each existing basis row touching the new
                     pivot column (found via col_to_pivots reverse
                     index), saxpy to zero out that entry. Result:
                     basis is left in true RREF after every insert,
                     making is_solved a near-O(1) lookup in M3.

API change: sprref_inc_insert gains a `preferred_pivot` parameter
(SPRREF_INC_NO_PREF sentinel = no preference). Bumps version to v0.1.0.

Modular arithmetic uses 128-bit intermediate for primes up to 2^63.
Verified end-to-end via ctypes smoke tests:
  - independent / dependent / inconsistent classification
  - rank monotonicity
  - master columns excluded from pivot selection
  - eager backsub correctness (manual trace confirmed for x0+x1=1,
    x0-x1=3 reducing redundant 2x0=4 to dependent and x0=5 to
    inconsistent against the resolved basis x0=2, x1=16 mod 17).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Now that the basis is maintained in true RREF after every insert (M2's
eager backsub), is_solved is a near-O(1) lookup:

  - var_idx not in basis        -> NOT_SOLVED, no buffers populated
  - var_idx is a pivot:
      * count off-pivot entries that are neither masters nor in
        acceptable_free; if any, return NOT_SOLVED (but still emit the
        buffers, which is useful for intermediate logs / pretty-printing
        in fire6 mode)
      * otherwise return SOLVED with full expression

Output buffers are heap-allocated via std::malloc and must be released
through sprref_inc_buffer_free. nnz==0 returns NULL pointers (no
allocation). Coefficients are returned as STORED (positive mod p);
the Python wrapper applies (-coeff) % p in M4 to match the existing
solver-result convention.

Smoke-tested via ctypes:
  - 2x2 closed-form system (x0+x1=1, x0-x1=3 -> x0=2, x1=16 mod 17)
  - master columns acceptable as free
  - acceptable_free overrides
  - eager backsub correctness on a 3-var system with chained inserts
  - preferred_pivot honored (forces pivot=1 when normally 0 would win)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous 2-state return (SOLVED vs NOT_SOLVED) couldn't distinguish
"var_idx is not a pivot at all" from "var_idx is a pivot but has free
columns left". Both look the same to a Python caller that needs to mimic
SpotlightSolverSparseGF.is_solved (which returns a string sentinel for
non-pivots and a dict for pivots-with-free).

Replace with three values:
  SPRREF_INC_NOT_PIVOT  (0)  var has no basis row
  SPRREF_INC_HAS_FREE   (1)  is a pivot, has unaccounted free columns
  SPRREF_INC_SOLVED     (2)  is a pivot, fully reduced

Output buffers are still populated whenever var_idx is a pivot
(HAS_FREE or SOLVED), enabling intermediate-log use cases. NOT_PIVOT
leaves them untouched.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Companion to sprref_inc_set_pivot_order, this upserts (col -> key) entries
without clearing the existing map. Designed for monotonically growing
pivot orderings (Laporta key inserted as each new integral is added),
where the previous full-replace API forced O(n) traffic per step.

The Python wrapper drives this via add_pivot_keys + a packed uint64
encoding of the Laporta key — together this fixes the dominant hot path
on tri2lx 7D (set_pivot_order was 76% of total runtime, profiled with
cProfile; eliminating the full-replace + Python sort gives a 2.6x
speedup end-to-end).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
On Ubuntu/Debian where libflint-dev is installed via apt, headers live
in /usr/include/flint and libs in a multiarch path that g++ already
searches by default. Forcing -I/usr/local/include / -L/usr/local/lib
breaks those installs.

The script now:
  - macOS: keep the existing /opt/homebrew default
  - Linux with /usr/include/flint present: leave include/lib flags empty
    and rely on the compiler's default search paths
  - Otherwise: fall back to /usr/local for source-built FLINT installs

This unblocks CI on ubuntu-latest where libflint-dev is the simplest
install path. macOS arm64 Homebrew users see no change (verified
locally — same .dylib output).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the C-side `choose_pivot` (which compared opaque uint64 keys
that couldn't fully encode a Python tuple-keyed Laporta ordering) with
a two-phase split:

  sprref_inc_insert_forward
      Phase 1: forward eliminate the row against existing pivots,
      stash the reduced row in the handle's pending slot, and return
      DEPENDENT / INCONSISTENT / NEEDS_PIVOT. On NEEDS_PIVOT the
      caller receives a heap-allocated array of the reduced row's
      candidate column indices.

  sprref_inc_commit_pivot
      Phase 2 (success): given the caller's pivot choice, normalise
      around it, do eager backward substitution against existing
      basis rows, and insert into the basis.

  sprref_inc_abort_pending
      Phase 2 (cancel): discard the pending row (used when all
      candidates are masters and so dependent is the right answer).

  sprref_inc_buffer_free_u32
      Free the candidate array allocated by insert_forward.

This keeps the actual pivot ordering decision in the calling language
(Python on our side), where a full Laporta tuple key (denom_count,
binary_sector, total_power, numerator_sum, denom_powers, numerator_powers)
can be compared natively without trying to pack it into a fixed-width
C-side type. End result: bit-for-bit pivot decisions matching
SpotlightSolverSparseGF on real workloads.

The legacy single-call sprref_inc_insert (which uses the C-side pivot
key map) is kept for direct C-API users and the existing equivalence
tests; it is not used by the project-feynman wrapper anymore.

Internal refactor: hoisted parse_input_row + commit_pivot_internal as
shared helpers between the legacy single-call insert and the new
two-phase path; no behavior change there.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@thibautbar thibautbar changed the title Incremental api Add incremental sparse RREF C-API (libsparse_rref_inc) May 4, 2026
Exposes the col_to_pivots reverse index for the RL observation feature
solver_freq, which is called per-step over many columns and was the
missing API on the incremental backend (vs SpotlightSolverSparseGF's
get_variable_freq).

The C-side basis stores a pivot row with its own pivot column erased
(implicit coefficient 1), so col_to_pivots never contains the self-edge;
we add it back when the queried column is itself a pivot to match
Spotlight semantics bit-for-bit.

Batched form (n columns at a time) amortises the FFI call overhead —
single-column queries are rare; the hot path is the per-step graph view.
Returns the set of pivot columns whose stored basis row contains a
given column as a non-zero entry — the C-side col_to_pivots map already
maintained for eager backsub, exposed for callers that need it
(notably project-feynman's master_pruner, which walks basis rows
containing a candidate variable to enumerate its Laporta blockers).

Matches SpotlightSolverSparseGF.col_rows[col] semantics: when col is
itself a pivot, the result includes col (the C-side col_to_pivots drops
the self-edge because the pivot coefficient is implicit; this accessor
restores it).

Bumps version to v0.7.0.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant