Janus

A C++20 header-only library for hedged execution on multi-core CPUs. Run the same task on multiple physical cores simultaneously, take the first result, cooperatively cancel the rest.

Trades CPU resources for tail-latency reduction on high-priority compute workloads. Based on the technique described in Google's The Tail at Scale (Jeff Dean & Luiz André Barroso, CACM 2013).

Quick start

#include <janus/janus.hpp>

// Race the same function on 2 cores (default)
auto result = janus::race([](std::stop_token token) -> int {
    int sum = 0;
    for (int i = 0; i < 1000000; ++i) {
        if (token.stop_requested()) return sum;
        sum += i;
    }
    return sum;
});

// Race with explicit configuration
janus::RaceConfig cfg{.num_runners = 4, .prefer_cross_numa = true};
auto result = janus::race(cfg, my_compute_fn);

Callables must accept std::stop_token as their first parameter and check stop_requested() periodically.

Requirements

C++20 (GCC 11+ or Clang 14+)
Linux (uses sched_setaffinity, sysfs topology)
pthreads

Building

cmake -B build -S .
cmake --build build -j$(nproc)
ctest --test-dir build

API

Function	Description
`janus::race(fn)`	Race `fn` on 2 cores, return first result
`janus::race(cfg, fn)`	Race with explicit `RaceConfig`

RaceConfig

struct RaceConfig {
    unsigned int num_runners = 2;          // redundant executions
    bool pin_to_cores = true;              // CPU affinity
    bool prefer_cross_numa = true;         // spread across NUMA nodes
    std::optional<std::vector<int>> cores; // explicit core list
};

Benchmark results

Environment: Intel Core Ultra 5 226V (8C/8T, 4.5 GHz), 8 MB L3, Linux 6.17, GCC 15.2, Release build (-O3)

Workload: ~1.3ms baseline compute with 5% probability of a 5-20ms stall (simulating interrupt storms, cache evictions, thermal throttling). 1000 iterations per benchmark.

Tail latency collapse

Percentile	Single	Hedged (2 runners)	Hedged (4 runners)
p50	1.3ms	1.5ms	1.5ms
p95	2.6ms	1.8ms	2.3ms
p99	20.4ms	2.8ms (7.3x)	2.6ms (7.8x)
p99.9	23.9ms	3.2ms (7.5x)	2.8ms (8.5x)

p50 stays flat while p99/p99.9 collapse -- with 2 runners, the chance of both hitting a stall is 0.25%; with 4 runners, 0.000625%.

Race overhead

Runners	Spawn + sync cost
2	78 μs
4	120 μs
8	159 μs

How it works

Discovers physical CPU topology via Linux sysfs
Selects N distinct physical cores (NUMA-aware round-robin)
Spawns N std::jthread instances, each pinned to a core
First runner to finish wins via atomic CAS, sets the result
Winner signals std::stop_source -- losers check stop_requested() and exit
All threads join via RAII; result returned to caller

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
include/janus		include/janus
tests		tests
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Janus

Quick start

Requirements

Building

API

RaceConfig

Benchmark results

Tail latency collapse

Race overhead

How it works

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Janus

Quick start

Requirements

Building

API

RaceConfig

Benchmark results

Tail latency collapse

Race overhead

How it works

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages