Skip to content

Wayy-Research/vortex-sieve

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Vortex Sieve

A high-performance prime sieve in Rust, inspired by Vortex-Based Mathematics (VBM). Uses wheel-30 factorization — a generalization of the VBM "doubling circuit" filter — combined with segmented sieving, persistent per-prime streams, pre-sieve patterns, and multi-threaded parallelism.

Performance

Benchmarked on a single machine (results will vary by hardware):

Limit Single-thread Multi-thread
1,000,000 0.12 ms
10,000,000 2.0 ms 1.8 ms
100,000,000 26 ms 6.9 ms
1,000,000,000 287 ms 59 ms
10,000,000,000 3.1 s 577 ms

Comparison with Popular Crates

Benchmarked against primal (the most popular Rust prime sieve), primes, and a naive boolean sieve:

Counting primes (π(n)):

N Vortex (single) Vortex (parallel) primal naive primes crate
1,000,000 117 µs 114 µs 96 µs 977 µs 46.6 ms
10,000,000 2.02 ms 1.80 ms 638 µs 26.0 ms 637 ms
100,000,000 25.8 ms 6.89 ms 8.44 ms
1,000,000,000 288 ms 59.0 ms 127 ms

Enumerating primes:

N Vortex primal primes crate
1,000,000 689 µs 281 µs 49.7 ms
10,000,000 7.34 ms 2.31 ms 684 ms
100,000,000 94.5 ms 38.5 ms

Key takeaways:

  • At 100M+, Vortex parallel counting beats primal: 1.2x faster at 1e8, 2.2x faster at 1e9
  • Single-threaded, primal's tighter inner loop wins at smaller limits (~3x at 1e7)
  • Both crush simpler implementations: 400-500x faster than the primes crate
  • Enumeration has room to improve — primal's extraction loop is ~2.5x faster

Run the benchmarks yourself: RUSTFLAGS="-C target-cpu=native" cargo bench

Quick Start

# Build with native CPU optimizations (recommended)
RUSTFLAGS="-C target-cpu=native" cargo build --release

# Run benchmarks + VBM digital root analysis
RUSTFLAGS="-C target-cpu=native" cargo run --release

# Run tests
cargo test

As a Library

use vortex_sieve::{count_primes, count_primes_parallel, sieve_primes};

// Count primes up to 1 billion (single-threaded)
let count = count_primes(1_000_000_000);
assert_eq!(count, 50_847_534);

// Count primes up to 1 billion (multi-threaded)
let count = count_primes_parallel(1_000_000_000);
assert_eq!(count, 50_847_534);

// Enumerate all primes up to 100
let primes = sieve_primes(100);
assert_eq!(primes, vec![2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43,
                         47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]);

The VBM Connection

Vortex-Based Mathematics (Marko Rodin) observes that the digits 1–9 form distinct groups under mod-9 arithmetic:

  • Doubling circuit: {1, 2, 4, 8, 7, 5} — double any number, take its digital root, and you cycle through these six digits forever.
  • Trinity axis: {3, 6, 9} — these numbers form a closed group. 3 and 6 oscillate; 9 is the axis.
  • Polar pairs: (1,8), (2,7), (4,5) — mirror symmetries where each pair sums to 9.

The key insight for prime sieving: primes greater than 3 only ever have digital roots on the doubling circuit {1, 2, 4, 5, 7, 8}. The 3-6-9 axis is completely empty (except for the prime 3 itself), because any number with digital root 3, 6, or 9 is divisible by 3.

Running the built-in analysis confirms this — primes distribute evenly across all six doubling circuit positions:

Doubling circuit (primes live here):
  DR=1: 13,063 primes
  DR=2: 13,099 primes
  DR=4: 13,070 primes
  DR=5: 13,068 primes
  DR=7: 13,098 primes
  DR=8: 13,099 primes

Trinity axis (primes excluded):
  DR=3: 1 prime  (only the prime 3 itself)
  DR=6: 0 primes
  DR=9: 0 primes

From VBM to Wheel Factorization

This "doubling circuit filter" is a special case of a well-established number theory technique called wheel factorization. The idea generalizes:

Wheel Primorial Candidate Residues Density VBM Analogy
Wheel-6 2 × 3 = 6 {1, 5} → 2 of 6 33% Doubling circuit (mod 9)
Wheel-30 2 × 3 × 5 = 30 {1,7,11,13,17,19,23,29} → 8 of 30 27% Generalized vortex filter
Wheel-210 2 × 3 × 5 × 7 = 210 48 of 210 23% Extended vortex filter

By choosing wheel-30, we eliminate all multiples of 2, 3, and 5 upfront, leaving only 8 candidate residues per 30 numbers — which pack neatly into 1 byte per 30 numbers (1 bit per candidate).

Architecture

1. Wheel-30 Bit Packing (wheel.rs)

Each byte in the sieve represents 30 consecutive integers. The 8 bits correspond to the 8 residues coprime to 30: {1, 7, 11, 13, 17, 19, 23, 29}. A set bit means "possibly prime"; a cleared bit means "composite."

Precomputed lookup tables handle the modular arithmetic at compile time (Crandall & Pomerance §3.2):

  • RESIDUES: the 8 coprime residues
  • RESIDUE_TO_BIT: maps a residue to its bit position (or 0xFF for non-wheel residues)
  • BIT_MASK: byte masks for clearing individual bits
  • PRODUCT_RESIDUE[i][j]: for each pair of wheel residue classes, which bit their product lands on — eliminates all modular arithmetic from the sieve hot loop
  • COFACTOR_INDEX[pi][ci]: for a prime in class pi, which cofactor class j produces a composite in class ci — used by marking_offsets() for zero-cost stream initialization

2. Segmented Sieve with Persistent Streams (sieve.rs)

The sieve processes the number line in 256 KB segments (~7.8M numbers each), sized to fit in L2 cache. This prevents cache thrashing that would occur with a monolithic sieve array.

Persistent per-prime streams (Riesel Ch.1 optimization): each sieving prime's 8 marking positions are computed once at initialization, then advanced across segments by simple addition. This avoids recomputing modular offsets per segment — a key improvement over naive segmented sieves.

// Per-prime state: 8 (absolute_byte, bit_mask) pairs
// Computed once, advanced across segments
struct PrimeStreams {
    p_step: usize,
    streams: [(u64, u8); 8],
}

Three public functions:

Function Purpose Best For
count_primes(limit) Returns π(n), single-threaded Quick counts, small limits
count_primes_parallel(limit) Returns π(n), multi-threaded via rayon Large limits (> 10M)
sieve_primes(limit) Returns Vec<u64> of all primes When you need the actual primes

3. Pre-Sieve Pattern

Before the main sieve loop, a repeating pattern for the smallest wheel primes (7, 11, 13) is precomputed. The pattern has period 7 × 11 × 13 = 1,001 bytes (fits comfortably in L1 cache) and is stamped into each segment via memcpy, eliminating ~58% of remaining composites without any per-element work.

4. Inner Loop Optimization

The hot loop that marks composites operates entirely in byte-index space (stepping by p bytes) rather than number space (stepping by 30p numbers). This eliminates a division per iteration. The loop uses unsafe get_unchecked_mut to skip bounds checks after the index is validated upfront.

// No division in the hot loop — step by p bytes
while idx < sieve_len {
    unsafe { *sieve.get_unchecked_mut(idx) &= mask; }
    idx += p_step;  // just an add
}

5. Counting with Hardware POPCNT

When only counting primes (not enumerating), set bits are tallied using count_ones() on u64-aligned chunks, which compiles to a single POPCNT instruction on modern x86 CPUs. This counts 64 candidates per instruction.

6. Integer Square Root

Uses Newton's method for exact integer square root computation, avoiding floating-point rounding errors that can occur with (n as f64).sqrt() for large values near 2^64.

7. Multi-Threading

count_primes_parallel divides the number line into independent segments and processes them in parallel using rayon. Each thread gets its own sieve buffer — no synchronization needed during sieving.

Mathematical Foundations

The algorithms draw from several key texts in computational number theory:

  • Crandall & Pomerance, Prime Numbers: A Computational Perspective (2nd ed., 2005) — Section 3.2 covers sieving to recognize primes with wheel factorization. The PRODUCT_RESIDUE table implements their approach of precomputing all residue-class interactions at compile time to eliminate modular arithmetic from the inner loop.

  • Riesel, Prime Numbers and Computer Methods for Factorization (2nd ed., 1994) — Chapter 1 covers the Sieve of Eratosthenes with compact prime tables. The wheel-30 bit-packing (1 byte = 30 numbers) follows Riesel's observation that eliminating multiples of 2, 3, and 5 gives 8 candidates per 30 numbers, fitting naturally into a byte. The persistent stream approach avoids recomputing per-segment offsets.

  • Programming Praxis — Blog posts on segmented sieves and Pritchard's wheel sieve informed the segment-size tuning and pre-sieve pattern strategy.

Building

# Debug build (for development)
cargo build

# Release build with full optimizations
RUSTFLAGS="-C target-cpu=native" cargo build --release

# Run criterion benchmarks
RUSTFLAGS="-C target-cpu=native" cargo bench

The Cargo.toml release profile enables:

  • opt-level = 3 — maximum optimization
  • lto = "fat" — full link-time optimization across all crates
  • codegen-units = 1 — single codegen unit for better inlining

The RUSTFLAGS="-C target-cpu=native" flag enables CPU-specific instructions (AVX2, POPCNT, etc.) for your machine.

Tests

# Run all tests
cargo test

# Run tests in release mode (faster, catches different class of bugs)
cargo test --release

The test suite includes:

  • Reference comparison: results checked against a simple trial-division sieve
  • Known values: π(10) = 4, π(100) = 25, π(1,000) = 168, π(10,000) = 1,229, π(100,000) = 9,592, π(1,000,000) = 78,498, π(10,000,000) = 664,579
  • Cross-validation: count_primes vs sieve_primes produce identical results
  • Parallel correctness: count_primes_parallel matches single-threaded count
  • Sortedness: enumerated primes are in strictly increasing order
  • Edge cases: limits 0 through 7
  • Integer square root: exact for all inputs including u64::MAX
  • Wheel tables: all residues are coprime to 30, product table entries are valid

References

  • Marko Rodin, Vortex-Based Mathematics — the original observation about digital root patterns
  • Richard Crandall & Carl Pomerance, Prime Numbers: A Computational Perspective (2nd ed., Springer, 2005) — sieving algorithms and wheel factorization theory
  • Hans Riesel, Prime Numbers and Computer Methods for Factorization (2nd ed., Birkhäuser, 1994) — compact sieve implementations and prime table compression
  • J.S. Milne, Algebraic Number Theory (v3.08, 2020) — foundational number theory
  • Sieve of Eratosthenes (Wikipedia)
  • Wheel factorization (Wikipedia)
  • primesieve by Kim Walisch — the state-of-the-art reference implementation
  • Programming Praxis — segmented sieve, Sieve of Atkin, Pritchard's wheel sieve exercises
  • Paul Pritchard, "Explaining the Wheel Sieve" (1982)

License

MIT

About

Blazing-fast prime sieve using VBM-inspired wheel factorization

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages