Vortex Sieve

A high-performance prime sieve in Rust, inspired by Vortex-Based Mathematics (VBM). Uses wheel-30 factorization — a generalization of the VBM "doubling circuit" filter — combined with segmented sieving, persistent per-prime streams, pre-sieve patterns, and multi-threaded parallelism.

Performance

Benchmarked on a single machine (results will vary by hardware):

Limit	Single-thread	Multi-thread
1,000,000	0.12 ms	—
10,000,000	2.0 ms	1.8 ms
100,000,000	26 ms	6.9 ms
1,000,000,000	287 ms	59 ms
10,000,000,000	3.1 s	577 ms

Comparison with Popular Crates

Benchmarked against primal (the most popular Rust prime sieve), primes, and a naive boolean sieve:

Counting primes (π(n)):

N	Vortex (single)	Vortex (parallel)	primal	naive	primes crate
1,000,000	117 µs	114 µs	96 µs	977 µs	46.6 ms
10,000,000	2.02 ms	1.80 ms	638 µs	26.0 ms	637 ms
100,000,000	25.8 ms	6.89 ms	8.44 ms	—	—
1,000,000,000	288 ms	59.0 ms	127 ms	—	—

Enumerating primes:

N	Vortex	primal	primes crate
1,000,000	689 µs	281 µs	49.7 ms
10,000,000	7.34 ms	2.31 ms	684 ms
100,000,000	94.5 ms	38.5 ms	—

Key takeaways:

At 100M+, Vortex parallel counting beats primal: 1.2x faster at 1e8, 2.2x faster at 1e9
Single-threaded, primal's tighter inner loop wins at smaller limits (~3x at 1e7)
Both crush simpler implementations: 400-500x faster than the primes crate
Enumeration has room to improve — primal's extraction loop is ~2.5x faster

Run the benchmarks yourself: RUSTFLAGS="-C target-cpu=native" cargo bench

Quick Start

# Build with native CPU optimizations (recommended)
RUSTFLAGS="-C target-cpu=native" cargo build --release

# Run benchmarks + VBM digital root analysis
RUSTFLAGS="-C target-cpu=native" cargo run --release

# Run tests
cargo test

As a Library

use vortex_sieve::{count_primes, count_primes_parallel, sieve_primes};

// Count primes up to 1 billion (single-threaded)
let count = count_primes(1_000_000_000);
assert_eq!(count, 50_847_534);

// Count primes up to 1 billion (multi-threaded)
let count = count_primes_parallel(1_000_000_000);
assert_eq!(count, 50_847_534);

// Enumerate all primes up to 100
let primes = sieve_primes(100);
assert_eq!(primes, vec![2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43,
                         47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]);

The VBM Connection

Vortex-Based Mathematics (Marko Rodin) observes that the digits 1–9 form distinct groups under mod-9 arithmetic:

Doubling circuit: {1, 2, 4, 8, 7, 5} — double any number, take its digital root, and you cycle through these six digits forever.
Trinity axis: {3, 6, 9} — these numbers form a closed group. 3 and 6 oscillate; 9 is the axis.
Polar pairs: (1,8), (2,7), (4,5) — mirror symmetries where each pair sums to 9.

The key insight for prime sieving: primes greater than 3 only ever have digital roots on the doubling circuit {1, 2, 4, 5, 7, 8}. The 3-6-9 axis is completely empty (except for the prime 3 itself), because any number with digital root 3, 6, or 9 is divisible by 3.

Running the built-in analysis confirms this — primes distribute evenly across all six doubling circuit positions:

Doubling circuit (primes live here):
  DR=1: 13,063 primes
  DR=2: 13,099 primes
  DR=4: 13,070 primes
  DR=5: 13,068 primes
  DR=7: 13,098 primes
  DR=8: 13,099 primes

Trinity axis (primes excluded):
  DR=3: 1 prime  (only the prime 3 itself)
  DR=6: 0 primes
  DR=9: 0 primes

From VBM to Wheel Factorization

This "doubling circuit filter" is a special case of a well-established number theory technique called wheel factorization. The idea generalizes:

Wheel	Primorial	Candidate Residues	Density	VBM Analogy
Wheel-6	2 × 3 = 6	{1, 5} → 2 of 6	33%	Doubling circuit (mod 9)
Wheel-30	2 × 3 × 5 = 30	{1,7,11,13,17,19,23,29} → 8 of 30	27%	Generalized vortex filter
Wheel-210	2 × 3 × 5 × 7 = 210	48 of 210	23%	Extended vortex filter

By choosing wheel-30, we eliminate all multiples of 2, 3, and 5 upfront, leaving only 8 candidate residues per 30 numbers — which pack neatly into 1 byte per 30 numbers (1 bit per candidate).

Architecture

1. Wheel-30 Bit Packing (`wheel.rs`)

Each byte in the sieve represents 30 consecutive integers. The 8 bits correspond to the 8 residues coprime to 30: {1, 7, 11, 13, 17, 19, 23, 29}. A set bit means "possibly prime"; a cleared bit means "composite."

Precomputed lookup tables handle the modular arithmetic at compile time (Crandall & Pomerance §3.2):

RESIDUES: the 8 coprime residues
RESIDUE_TO_BIT: maps a residue to its bit position (or 0xFF for non-wheel residues)
BIT_MASK: byte masks for clearing individual bits
PRODUCT_RESIDUE[i][j]: for each pair of wheel residue classes, which bit their product lands on — eliminates all modular arithmetic from the sieve hot loop
COFACTOR_INDEX[pi][ci]: for a prime in class pi, which cofactor class j produces a composite in class ci — used by marking_offsets() for zero-cost stream initialization

2. Segmented Sieve with Persistent Streams (`sieve.rs`)

The sieve processes the number line in 256 KB segments (~7.8M numbers each), sized to fit in L2 cache. This prevents cache thrashing that would occur with a monolithic sieve array.

Persistent per-prime streams (Riesel Ch.1 optimization): each sieving prime's 8 marking positions are computed once at initialization, then advanced across segments by simple addition. This avoids recomputing modular offsets per segment — a key improvement over naive segmented sieves.

// Per-prime state: 8 (absolute_byte, bit_mask) pairs
// Computed once, advanced across segments
struct PrimeStreams {
    p_step: usize,
    streams: [(u64, u8); 8],
}

Three public functions:

Function	Purpose	Best For
`count_primes(limit)`	Returns π(n), single-threaded	Quick counts, small limits
`count_primes_parallel(limit)`	Returns π(n), multi-threaded via rayon	Large limits (> 10M)
`sieve_primes(limit)`	Returns `Vec<u64>` of all primes	When you need the actual primes

3. Pre-Sieve Pattern

Before the main sieve loop, a repeating pattern for the smallest wheel primes (7, 11, 13) is precomputed. The pattern has period 7 × 11 × 13 = 1,001 bytes (fits comfortably in L1 cache) and is stamped into each segment via memcpy, eliminating ~58% of remaining composites without any per-element work.

4. Inner Loop Optimization

The hot loop that marks composites operates entirely in byte-index space (stepping by p bytes) rather than number space (stepping by 30p numbers). This eliminates a division per iteration. The loop uses unsafe get_unchecked_mut to skip bounds checks after the index is validated upfront.

// No division in the hot loop — step by p bytes
while idx < sieve_len {
    unsafe { *sieve.get_unchecked_mut(idx) &= mask; }
    idx += p_step;  // just an add
}

5. Counting with Hardware POPCNT

When only counting primes (not enumerating), set bits are tallied using count_ones() on u64-aligned chunks, which compiles to a single POPCNT instruction on modern x86 CPUs. This counts 64 candidates per instruction.

6. Integer Square Root

Uses Newton's method for exact integer square root computation, avoiding floating-point rounding errors that can occur with (n as f64).sqrt() for large values near 2^64.

7. Multi-Threading

count_primes_parallel divides the number line into independent segments and processes them in parallel using rayon. Each thread gets its own sieve buffer — no synchronization needed during sieving.

Mathematical Foundations

The algorithms draw from several key texts in computational number theory:

Crandall & Pomerance, Prime Numbers: A Computational Perspective (2nd ed., 2005) — Section 3.2 covers sieving to recognize primes with wheel factorization. The PRODUCT_RESIDUE table implements their approach of precomputing all residue-class interactions at compile time to eliminate modular arithmetic from the inner loop.
Riesel, Prime Numbers and Computer Methods for Factorization (2nd ed., 1994) — Chapter 1 covers the Sieve of Eratosthenes with compact prime tables. The wheel-30 bit-packing (1 byte = 30 numbers) follows Riesel's observation that eliminating multiples of 2, 3, and 5 gives 8 candidates per 30 numbers, fitting naturally into a byte. The persistent stream approach avoids recomputing per-segment offsets.
Programming Praxis — Blog posts on segmented sieves and Pritchard's wheel sieve informed the segment-size tuning and pre-sieve pattern strategy.

Building

# Debug build (for development)
cargo build

# Release build with full optimizations
RUSTFLAGS="-C target-cpu=native" cargo build --release

# Run criterion benchmarks
RUSTFLAGS="-C target-cpu=native" cargo bench

The Cargo.toml release profile enables:

opt-level = 3 — maximum optimization
lto = "fat" — full link-time optimization across all crates
codegen-units = 1 — single codegen unit for better inlining

The RUSTFLAGS="-C target-cpu=native" flag enables CPU-specific instructions (AVX2, POPCNT, etc.) for your machine.

Tests

# Run all tests
cargo test

# Run tests in release mode (faster, catches different class of bugs)
cargo test --release

The test suite includes:

Reference comparison: results checked against a simple trial-division sieve
Known values: π(10) = 4, π(100) = 25, π(1,000) = 168, π(10,000) = 1,229, π(100,000) = 9,592, π(1,000,000) = 78,498, π(10,000,000) = 664,579
Cross-validation: count_primes vs sieve_primes produce identical results
Parallel correctness: count_primes_parallel matches single-threaded count
Sortedness: enumerated primes are in strictly increasing order
Edge cases: limits 0 through 7
Integer square root: exact for all inputs including u64::MAX
Wheel tables: all residues are coprime to 30, product table entries are valid

References

Marko Rodin, Vortex-Based Mathematics — the original observation about digital root patterns
Richard Crandall & Carl Pomerance, Prime Numbers: A Computational Perspective (2nd ed., Springer, 2005) — sieving algorithms and wheel factorization theory
Hans Riesel, Prime Numbers and Computer Methods for Factorization (2nd ed., Birkhäuser, 1994) — compact sieve implementations and prime table compression
J.S. Milne, Algebraic Number Theory (v3.08, 2020) — foundational number theory
Sieve of Eratosthenes (Wikipedia)
Wheel factorization (Wikipedia)
primesieve by Kim Walisch — the state-of-the-art reference implementation
Programming Praxis — segmented sieve, Sieve of Atkin, Pritchard's wheel sieve exercises
Paul Pritchard, "Explaining the Wheel Sieve" (1982)

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
benches		benches
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vortex Sieve

Performance

Comparison with Popular Crates

Quick Start

As a Library

The VBM Connection

From VBM to Wheel Factorization

Architecture

1. Wheel-30 Bit Packing (`wheel.rs`)

2. Segmented Sieve with Persistent Streams (`sieve.rs`)

3. Pre-Sieve Pattern

4. Inner Loop Optimization

5. Counting with Hardware POPCNT

6. Integer Square Root

7. Multi-Threading

Mathematical Foundations

Building

Tests

References

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Vortex Sieve

Performance

Comparison with Popular Crates

Quick Start

As a Library

The VBM Connection

From VBM to Wheel Factorization

Architecture

1. Wheel-30 Bit Packing (wheel.rs)

2. Segmented Sieve with Persistent Streams (sieve.rs)

3. Pre-Sieve Pattern

4. Inner Loop Optimization

5. Counting with Hardware POPCNT

6. Integer Square Root

7. Multi-Threading

Mathematical Foundations

Building

Tests

References

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. Wheel-30 Bit Packing (`wheel.rs`)

2. Segmented Sieve with Persistent Streams (`sieve.rs`)

Packages