A fast, minimal columnar data library for Rust with Arrow compatibility.
Minarrow gives you typed columnar arrays that compile in 1.5 seconds, run with SIMD alignment, and convert to Arrow when you need interop. No trait object downcasting, no 200+ transitive dependencies, no waiting for builds.
The problem: Arrow-rs is powerful but heavy. Build times stretch to minutes. Working with arrays means downcasting dyn Array and hoping you got the type right. For real-time systems, embedded devices, or rapid iteration, this friction adds up.
The solution: Minarrow keeps concrete types throughout. An IntegerArray<i64> stays an IntegerArray<i64>. You get direct access, IDE autocomplete, and fast compilation. When you need to talk to Arrow, Polars, or PyArrow, zero-copy conversion is one method call away.
use minarrow::{arr_i32, arr_f64, arr_str32, arr_bool};
// Create arrays with macros
let ids = arr_i32![1, 2, 3, 4];
let prices = arr_f64![10.5, 20.0, 15.75];
let names = arr_str32!["alice", "bob", "charlie"];
let flags = arr_bool![true, false, true];
// Direct typed access - no downcasting
assert_eq!(ids.len(), 4);
assert_eq!(prices.get(0), Some(10.5));use minarrow::{FieldArray, Table, arr_i32, arr_str32, Print};
// Build tables from columns
let table = Table::new(
"users".into(),
vec![
FieldArray::from_arr("id", arr_i32![1, 2, 3]),
FieldArray::from_arr("name", arr_str32!["alice", "bob", "charlie"]),
].into(),
);
table.print();Six array types cover standard workloads:
| Type | Description |
|---|---|
IntegerArray<T> |
i8 through u64 |
FloatArray<T> |
f32, f64 |
StringArray<T> |
UTF-8 with u32 or u64 offsets |
BooleanArray |
Bit-packed with validity mask |
CategoricalArray<T> |
Dictionary-encoded |
DatetimeArray<T> |
Timestamps, dates, durations |
Semantic groupings (NumericArray, TextArray, TemporalArray) let you write generic functions while keeping static dispatch.
| Metric | Time |
|---|---|
| Clean build | < 1.5s |
| Incremental rebuild | < 0.15s |
Achieved through minimal dependencies: primarily num-traits, with optional rayon for parallelism.
All buffers use 64-byte alignment via Vec64. No reallocation step to fix alignment—data is ready for vectorised operations from the moment it's created.
Select columns and rows without copying data:
use minarrow::*;
let table = create_table();
// Pandas-style selection
let view = table.c(&["name", "value"]); // columns
let view = table.r(10..20); // rows
let view = table.c(&["A", "B"]).r(0..100); // both
// Materialise only when needed
let owned = view.to_table();For streaming workloads, SuperArray and SuperTable hold multiple chunks with consistent schema:
// Append chunks as they arrive
let mut super_table = SuperTable::new();
super_table.push_table(batch1);
super_table.push_table(batch2);
// Consolidate to single table when ready
let table = super_table.consolidate();Convert at the boundary, stay native internally:
// To Arrow (feature: cast_arrow)
let arrow_array = minarrow_array.to_apache_arrow();
// To Polars (feature: cast_polars)
let series = minarrow_array.to_polars();
// FFI via Arrow C Data Interface
let (array_ptr, schema_ptr) = minarrow_array.export_to_c();Minarrow uses enums for type dispatch instead of trait objects:
// Static dispatch, full inlining
match array {
Array::NumericArray(num) => match num {
NumericArray::Int64(arr) => process(arr),
NumericArray::Float64(arr) => process(arr),
// ...
},
// ...
}This gives you:
- Performance — Compiler inlines through the dispatch
- Type safety — No
Any, no runtime downcasts - Ergonomics — Direct accessors like
array.num().i64()
Sum of 1,000 integers, averaged over 1,000 runs (Intel Ultra 7 155H):
| Implementation | Time |
|---|---|
Raw Vec<i64> |
85 ns |
Minarrow IntegerArray (direct) |
88 ns |
Minarrow IntegerArray (via enum) |
124 ns |
Arrow-rs Int64Array (struct) |
147 ns |
Arrow-rs Int64Array (dyn) |
181 ns |
Minarrow's direct access matches raw Vec performance. Even through enum dispatch, it outperforms arrow-rs.
With SIMD + Rayon, summing 1 billion integers takes ~114ms.
Enable what you need:
| Feature | Description |
|---|---|
views |
Zero-copy windowed access (default) |
chunked |
SuperArray/SuperTable for streaming (default) |
datetime |
Temporal types |
cast_arrow |
Arrow-rs conversion |
cast_polars |
Polars conversion |
parallel_proc |
Rayon parallel iterators |
select |
Pandas-style .c() / .r() selection |
broadcast |
Arithmetic broadcasting |
| Crate | Purpose |
|---|---|
minarrow-pyo3 |
Zero-copy Python interop via PyArrow. See pyo3/README.md |
lightstream |
Tokio async IPC with maintained SIMD alignment |
simd-kernels |
60+ SIMD kernels including statistical distributions |
vec64 |
64-byte aligned Vec for optimal SIMD |
Minarrow focuses on flat columnar data and 80/20. Nested types (List, Struct) are not currently supported. If you need deeply nested schemas, arrow-rs is the better choice.
Contributions welcome:
- Connectors — Data source/sink integrations
- Optimisations — Performance improvements
- Nested types — List and Struct support (PRs welcome)
- Bug fixes
See CONTRIBUTING.md for guidelines.
MIT. See LICENSE for details.
Minarrow is a from-scratch implementation of the Apache Arrow memory layout inspired by the standards pioneered by Apache Arrow, Arrow2, and Polars.
Minarrow is not affiliated with Apache Arrow.