This document describes Iris JIT behavior, supported language surface, intrinsics, and runtime controls.
Iris JIT acceleration now has four layers that can stack:
- Cranelift-native expression JIT compilation.
- SIMD capability planning (
aarch64/arm,x86/x86_64,wasm32, scalar fallback). - SIMD-aware loop unrolling in hot runtime vector-buffer execution paths.
- Direct lowered-kernel execution for common elementwise math kernels (including trig), bypassing per-element JIT dispatch overhead.
JIT is now a build feature.
- Default build includes JIT:
pyo3 + jit. - Actors-only Python build (exclude JIT):
cargo build --no-default-features --features pyo3This mode keeps runtime/actor APIs available while omitting Cranelift/JIT internals.
@iris.offload(strategy="jit", return_type="float")
def kernel(...):
...strategy="jit": compile eligible code paths to native code (Cranelift).strategy="actor": execute through Rust actor offload workers.return_type:"float"(default),"int", or"bool".
- Arithmetic:
+ - * / % **. - Unary ops and parentheses.
- Comparisons including chained comparisons.
- Boolean logic (
and,or,not). - Ternary expressions (
a if cond else b). - Builtin-style casts/aliases:
float(x),int(x),round(x).
Supported math calls include:
sin,cos,tansinh,cosh,tanhexp,log,sqrtabs,min,max,pow
When expressions match simple elementwise forms, Iris may run a lowered vector loop directly:
- Unary kernels:
- identity (
x), negation (-x),abs(x) sin(x),cos(x),tan(x)exp(x),log(x)/ln(x),sqrt(x)
- identity (
- Binary kernels:
a + b,a - b,a * b,a / b
These lowered paths are used on vector-buffer call paths (profiled and generic), and support both map-style outputs and reductions (sum, any, all).
math.<fn>style usage is normalized by the frontend when extractable to JIT expressions.
sum(...),any(...),all(...)overrange(...)and container generators.- Predicated reductions in generator bodies.
- Positive and negative
rangestep handling (including dynamic step expressions).
sum_while(...)any_while(...)all_while(...)
Supported forms (aliases included):
- Break family:
break_if,break_when,loop_break_if,loop_break_whenbreak_unless,loop_break_unless
- Continue family:
continue_if,continue_when,loop_continue_if,loop_continue_whencontinue_unless,loop_continue_unless
- NaN control:
break_on_nan,loop_break_on_nancontinue_on_nan,loop_continue_on_nan
- Inputs: Python
float,int,bool. - Internal ABI: lowered through native
f64argument path. - Output conversion follows
return_type(float/int/bool).
Supported buffer element types include:
f64,f32- signed integers (
i64,i32,i16,i8) - unsigned integers (
u64,u32,u16,u8) bool
Iris selects a SIMD backend from host capabilities at runtime:
- ARM/aarch64: prefers
SVE/SVE2, falls back toNEON, then scalar. - x86/x86_64: prefers
AVX2, thenAVX, thenSSE2, then scalar. - wasm32: uses
simd128when available, else scalar.
Unrolling is applied to hot loops in:
- profiled vector-buffer execution,
- generic all-buffer vector execution,
- trailing-count vectorized mode,
- indexable sequence fallback loops.
Unroll factor is derived from SIMD lane width (with scalar-safe tail handling), improving throughput even before explicit vector intrinsics are emitted for every operation.
For lowered reductions, Iris uses lane-wise partial accumulation (horizontal-style combine) for:
sum(lane accumulators reduced at the end),any(lane OR style early success),all(lane AND style early failure).
This is primarily beneficial for reduction workloads; pure map kernels do not need horizontal combine.
When enabled, Iris may compile multiple variants and select adaptively using runtime telemetry.
- Selection is runtime-driven from per-variant execution stats.
- Warm-seeded startup may intentionally compile a single variant first, then rearm to multi-variant when observed latency degrades.
- If speculation is gated by runtime controls, Iris falls back to single-variant compile.
- Failures or mismatches fall back safely to Python execution path.
-
Enable/disable SIMD planning:
- Env:
IRIS_JIT_SIMD - Values: truthy (
1/true/yes/on/...) to enable, falsy to force scalar planning.
- Env:
-
SIMD math mode for lowered unary trig kernels:
- Env:
IRIS_JIT_SIMD_MATH accurate(default): standard libm-backed behavior.fast/approx/poly: fast approximation mode for lower latency with possible precision tradeoffs.
- Env:
- Enable:
IRIS_JIT_LOG=1oriris.jit.set_jit_logging(...) - Read status:
iris.jit.get_jit_logging()
When logging is enabled, SIMD planner details are emitted, for example:
[Iris][jit][simd] backend=ArmNeon lane_bytes=16 auto_vectorize=true
- Env:
IRIS_JIT_QUANTUM=1 - API:
iris.jit.set_quantum_speculation(...),iris.jit.get_quantum_speculation()
- Speculation threshold (ns):
- Env:
IRIS_JIT_QUANTUM_SPECULATION_NS - API:
iris.jit.set_quantum_speculation_threshold(...),iris.jit.get_quantum_speculation_threshold()
- Env:
- Quantum log threshold (ns):
- Env:
IRIS_JIT_QUANTUM_LOG_NS - API:
iris.jit.set_quantum_log_threshold(...),iris.jit.get_quantum_log_threshold()
- Env:
- Compile budget/window (ns):
- Env:
IRIS_JIT_QUANTUM_COMPILE_BUDGET_NS,IRIS_JIT_QUANTUM_COMPILE_WINDOW_NS - API:
iris.jit.set_quantum_compile_budget(...),iris.jit.get_quantum_compile_budget()
- Env:
- Cooldown backoff bounds (ns):
- Env:
IRIS_JIT_QUANTUM_COOLDOWN_BASE_NS,IRIS_JIT_QUANTUM_COOLDOWN_MAX_NS - API:
iris.jit.set_quantum_cooldown(...),iris.jit.get_quantum_cooldown()
- Env:
- Rearm cadence/trigger (ns):
- Env:
IRIS_JIT_QUANTUM_REARM_INTERVAL_NS,IRIS_JIT_QUANTUM_REARM_MIN_OBSERVED_NS - Behavior: controls how often a single-variant warm state may attempt multi-variant rearm, and the minimum observed latency required to trigger rearm.
- Env:
- Rearm sensitivity controls:
- Env:
IRIS_JIT_QUANTUM_REARM_MIN_SAMPLES,IRIS_JIT_QUANTUM_REARM_MAX_VOLATILITY - Behavior: requires enough observations before rearm and suppresses rearm when per-run latency is too volatile.
- Env:
Quantum telemetry is persisted for restart-time warm-up.
- Path:
__pycache__/.iris.meta.bin - Format: binary framed payload (
magic + flags + msgpack-bytes), optional compression - Persistence controls:
IRIS_JIT_META_TTL_NSIRIS_JIT_META_MAX_ENTRIESIRIS_JIT_META_FLUSH_MIN,IRIS_JIT_META_FLUSH_MAXIRIS_JIT_META_COMPRESS_MIN_BYTESIRIS_JIT_META_REFRESH_NS
Metadata lifecycle notes:
- Warm seeds are loaded during registration and may be staged before full quantum state is initialized.
- Aggressive-source registration paths use effective extracted source for seed/register/persist flow.
- Writes are adaptive/deferred during execution and force-flushed at process exit for short-lived runs.
- Unchanged profile shape may skip rewrite within the refresh window to reduce churn/noise.
Iris prioritizes correctness over acceleration:
- Unsupported syntax or compile misses: Python fallback.
- Runtime panic/mismatch in JIT path: guarded fallback.
- Quantum variant errors: fallback variant or Python path.
- Lowered kernel mismatch or unsupported expression shape: falls back to normal JIT execution path.
- Scalar fast path remains preferred for stable scalar workloads.
- Single-arg kernels now include generic sequence fallback when typed-buffer fast paths do not apply.
- Loop-step wrappers compile lowered runtime expressions (including
let_bind) and evaluate safely in Python fallback mode when needed.
- One-shot / bursty workloads: increase
IRIS_JIT_QUANTUM_REARM_MIN_SAMPLESand/or lowerIRIS_JIT_QUANTUM_REARM_MAX_VOLATILITYto avoid speculative churn. - Long-running stable workloads: keep
IRIS_JIT_QUANTUM_REARM_MIN_SAMPLESlow (or default) and raiseIRIS_JIT_QUANTUM_REARM_MAX_VOLATILITYmoderately if rearm feels too conservative. - If runs differ heavily call-to-call, prefer stricter volatility gating before reducing rearm interval.
import iris
iris.set_quantum_speculation(True)
iris.set_quantum_speculation_threshold(0)
iris.set_quantum_compile_budget(10_000_000, 1_000_000_000)
iris.set_quantum_cooldown(0, 0)
@iris.offload(strategy="jit", return_type="float")
def heavy(a: float, b: float, c: float) -> float:
return (a * a + b * b + c * c) / (a + b + c + 1.0)Note
On aarch64 targets, Iris adjusts JIT module flags to avoid unsupported relocation paths.
Tip
For Android ARM devices, SIMD planning is supported when built for ARM targets (aarch64/arm). Backend selection still depends on runtime CPU feature availability.