Agentic NIC Dataplane Lab

Agentic-NIC-Dataplane-Lab is a Linux-first reference repo for designing, building, and benchmarking a split NIC dataplane for agentic AI systems, with an explicit path toward bounded autonomous NIC behavior.

The repo is opinionated about one thing: agentic AI is not only a model problem. It is a queueing, copy-avoidance, packet-steering, east-west transport, and local-control problem. The goal here is to make those tradeoffs concrete with architecture notes, Linux tuning guidance, compatibility tables, starter code, and a stronger systems model for how an autonomous NIC dataplane could operate safely.

Why Agentic Workloads Are Different

The core thesis of this repo is that agentic inference is not shaped like traditional batched model serving.

Traditional inference is usually:

large batch
GPU-bound
throughput-oriented
dominated by matrix math and model execution efficiency

Agentic inference is usually:

many tiny RPCs
metadata fetches
retrieval fan-out
scheduler chatter
checkpoint or state coordination
latency amplification across multi-step plans

That difference matters because the bottleneck moves. In a classic throughput-oriented serving path, the GPU often dominates. In an agentic path, the orchestration fabric can dominate instead:

socket and packet overhead
queue contention
kernel/userspace copy cost
scheduler wakeups
cross-service retries
east-west networking

This repo is compelling only if that thesis holds up under measurement: at scale, networking overhead can become a first-order limiter for agent orchestration even when the model itself is fast enough.

Five-Minute Local Demo

The repo now includes a small v0.2 local workflow so a new reader can compile the lab, run one real localhost baseline, run one honest AF_XDP mock path, and generate charts from the resulting JSON.

make all
make demo
python3 tools/plot_baseline.py results/latest.json

What this demo is and is not:

Path A kernel UDP is a real localhost kernel networking measurement using a simple echo server and client.
Path B AF_XDP mock is a workflow-validation path that simulates a starter AF_XDP processing loop shape and output format.
The local demo exists to validate build, run, JSON, and plotting workflow without special hardware.
Real AF_XDP still requires supported NIC, driver, queue, and privilege setup. The mock path does not claim real zero-copy dataplane performance.

Current Demo Status

Path	Status	Requires root	Requires special NIC
Path A kernel UDP	runnable locally	no	no
Path B AF_XDP mock	runnable locally	no	no
Path B real AF_XDP	scaffold/in progress	yes	yes
Path C RDMA	scaffold/in progress	likely	yes

Repo Description

This project explores a tri-path host networking model for agentic AI clusters:

Path A: kernel TCP for the majority of agent RPC, improved with busy_poll, queue affinity, and io_uring zero-copy receive where supported
Path B: XDP + AF_XDP for selected hot queues that need lower packet overhead and stronger queue-to-core control
Path C: RDMA for bulk east-west movement such as state sync, checkpoint transfer, shard-to-shard movement, and GPU-adjacent feeds

It also starts to define a layered agentic NIC architecture:

an Intent Layer where the host expresses goals rather than register-level tweaks
a bounded Agent Layer that proposes local dataplane adjustments
a deterministic Guardian Layer that enforces safety and connectivity invariants
an Audit Layer with hardware-isolated reasoning logs for dataplane mutations
a Tenant Quota Model so local optimization does not destroy fairness

The intended audience is:

systems engineers building agent infrastructure
Linux kernel and NIC performance engineers
inference platform teams comparing sockets vs AF_XDP vs RDMA
researchers who want a reproducible testbed instead of architecture slides

Key Topics

agentic AI networking architecture
Linux NIC queueing and IRQ affinity
AF_XDP, UMEM, and queue ownership
io_uring receive paths and zero-copy Rx
RDMA, queue pairs, memory registration, and bulk transport
Intel ice / irdma and Broadcom bnxt_en / bnxt_re
benchmark design for agent-shaped workloads
bounded autonomous NIC control
deterministic guardrails and fail-safe policy
hardware-isolated reasoning logs
multi-tenant agent quotas and fairness

Architecture

The repo now embeds the architecture diagram directly in the README so it renders on GitHub, and the source Mermaid file is still kept in ./diagrams/tri-path-agentic-dataplane.mmd.

flowchart TD
    U["Users / upstream agents<br/>RPCs, tool calls, streaming"] --> I["Ingress NIC queues<br/>Classifier: RSS / XDP / flow rules"]
    I --> A["Path A: kernel TCP<br/>io_uring ZC Rx · busy_poll"]
    I --> B["Path B: AF_XDP<br/>UMEM · zero-copy · per-core"]
    I --> C["Path C: RDMA<br/>RC QP · MR · bulk east-west"]
    A --> O["Orchestrators / tools<br/>retrieval, memory, metadata"]
    B --> G["Gateways / routers<br/>schedulers, token gateways"]
    C --> S["State sync / checkpoints<br/>GPU feed, vector index sync"]

Why This Exists

Agentic AI is a coordination workload:

many small RPCs
retrieval and metadata fetches
tool execution
policy checks
fan-out and fan-in
streaming and retries

That shifts bottlenecks toward:

CPU time spent in the networking and storage path
queue placement and RSS policy
copy overhead between kernel, userspace, and devices
memory registration and pinned-page cost
east-west service traffic

Research Questions

At what packet size and concurrency does AF_XDP outperform kernel sockets for agent RPC traffic?
Can queue affinity reduce p99 scheduler jitter for agent orchestration paths?
Does busy_poll help or hurt mixed inference workloads that combine model serving with retrieval chatter?
When does userspace polling become CPU-inefficient relative to kernel sockets?
Can bounded autonomous NIC scheduling reduce tail collapse without violating safety constraints?

Current Status

This repo is intentionally in early-lab form:

the build system is now present and compile-oriented
the benchmark harness accepts real arguments and emits JSON metadata
the code under src/ is still starter code, not production dataplane code
the docs are detailed enough to orient a contributor and define the next work
the conceptual architecture now covers not just transport choice, but how local autonomy, safety, and auditability could fit into a SmartNIC-class system

Repo Layout

Vendor Recommendation

If you want one practical Linux-first starting point:

Intel E810/E830
ice for Ethernet, queueing, and AF_XDP
irdma for RDMA

If you are standardized on Broadcom:

bnxt_en for Ethernet
bnxt_re for RoCE

The repo does not assume one vendor forever. It is structured so contributors can compare stacks, kernels, firmware bundles, and feature maturity.

Build and Run

The repo now has a root ./Makefile and build notes in ./BUILDING.md.

Common commands:

make all
make kernel_udp
make af_xdp
make io_uring
make rdma
make xdp_prog
make demo

make all now builds the runnable local kernel_udp demo path, builds the AF_XDP starter in real or mock-only mode depending on available headers and libraries, and skips optional io_uring, RDMA, or BPF object targets gracefully when the local environment does not provide those dependencies.

Benchmark Harness

The benchmark harness is at ./scripts/benchmark-matrix.sh. It now accepts path, workload, and interface arguments and emits a JSON result envelope with host, kernel, NIC counter, and softirq metadata.

Example:

./scripts/benchmark-matrix.sh --path tcp --workload a --iface eth0 --out results/tcp-a.json

It is still intentionally conservative: if the required generator tool is missing, it fails loudly instead of pretending a benchmark ran.

The repo also now includes an illustrative baseline artifact at ./results/e810-baseline-2026-05-08.json plus a plotting helper at ./tools/plot_e810_baseline.py so the benchmark story is grounded in a reusable result format.

For the quick local workflow, use:

./scripts/run_local_baseline.sh
python3 tools/plot_baseline.py results/latest.json

That local path:

runs a real localhost UDP echo benchmark for Path A
runs an AF_XDP mock/scaffold benchmark for Path B
writes a combined JSON file at ./results/local-baseline-YYYYMMDD-HHMMSS.json
refreshes ./results/latest.json
generates ./diagrams/local-baseline-throughput.png
generates ./diagrams/local-baseline-latency.png when latency fields are present

A checked-in reference artifact is available at ./results/sample-local-baseline.json.

The next release blockers are now called out more explicitly in the docs:

prove Path B is not worse than Path A for sub-512 B RPCs
show guardian intervention does not violate tail-latency SLOs
define exactly who can read the reasoning logs and under what trust model

Perf And Flamegraphs

The repo now includes a small profiling helper at ./tools/perf_capture.sh plus a workflow note in ./docs/perf-flamegraph-workflow.md.

Example:

sudo ./tools/perf_capture.sh --output-dir perf/local-demo -- ./build/udp_client --host 127.0.0.1 --port 9000 --packet-size 256 --count 2000

This is intentionally lightweight:

it captures perf record data for a target command
emits perf report text
emits perf script output
optionally emits folded stacks and a flamegraph SVG when Brendan Gregg FlameGraph scripts are available locally

Even a simple softirq vs userspace polling profile is valuable here, because it turns the repo from architecture opinion into an instrumentable lab.

Priority Next Work

Expand the AF_XDP sample into a real UMEM-backed receive loop with fill/completion management.
Add an XDP loader and socket redirection plumbing around xdp_pass.c.
Extend the io_uring sample from setup-only into a complete RECV_ZC notification flow.
Flesh out the RDMA path with full RESET -> INIT -> RTR -> RTS state transitions and peer exchange helpers.
Add workload generators and result aggregation to the benchmark harness.
Formalize the Intent -> Agent -> Guardian -> Dataplane -> Audit model into a sharper design and possible patent memo.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agentic NIC Dataplane Lab

Why Agentic Workloads Are Different

Five-Minute Local Demo

Current Demo Status

Repo Description

Key Topics

Architecture

Why This Exists

Research Questions

Current Status

Repo Layout

Vendor Recommendation

Build and Run

Benchmark Harness

Perf And Flamegraphs

Priority Next Work

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.github/workflows		.github/workflows
diagrams		diagrams
docs		docs
results		results
scripts		scripts
src		src
tools		tools
.gitignore		.gitignore
BUILDING.md		BUILDING.md
Makefile		Makefile
README.md		README.md
ROADMAP.md		ROADMAP.md

Folders and files

Latest commit

History

Repository files navigation

Agentic NIC Dataplane Lab

Why Agentic Workloads Are Different

Five-Minute Local Demo

Current Demo Status

Repo Description

Key Topics

Architecture

Why This Exists

Research Questions

Current Status

Repo Layout

Vendor Recommendation

Build and Run

Benchmark Harness

Perf And Flamegraphs

Priority Next Work

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages