Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 51 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
<div align="center">

# KERNO

Check warning on line 3 in README.md

View workflow job for this annotation

GitHub Actions / Spell check

Unknown word (KERNO)

### The production incident diagnosis engine for Kubernetes

**Your cluster broke. Your dashboards are green. Users are paging.**
**Run `kerno doctor`. 30 seconds. Root cause. Plain English.**

Check warning on line 8 in README.md

View workflow job for this annotation

GitHub Actions / Spell check

Unknown word (kerno)

<sub>Same single binary runs on bare metal, VMs, EC2, GCE - wherever Linux lives.</sub>

Expand All @@ -18,30 +18,30 @@

[**Quick Start**](#quick-start) · [**How It Works**](#how-it-works) · [**Features**](#features) · [**Kubernetes**](#kubernetes-deployment) · [**Docs**](docs/architecture.md)

<img src="demo.gif" alt="kerno doctor demo" width="900" />

Check warning on line 21 in README.md

View workflow job for this annotation

GitHub Actions / Spell check

Unknown word (kerno)

</div>

---

## What is Kerno?

Check warning on line 27 in README.md

View workflow job for this annotation

GitHub Actions / Spell check

Unknown word (Kerno)

Kerno is a **Kubernetes-native incident diagnosis engine** built on eBPF.

Check warning on line 29 in README.md

View workflow job for this annotation

GitHub Actions / Spell check

Unknown word (Kerno)
It runs as a DaemonSet on every node, watches the kernel - not your app - and answers a single question on demand:

> *Why is production broken right now?*

```bash
kubectl -n kerno-system exec ds/kerno -- kerno doctor

Check warning on line 35 in README.md

View workflow job for this annotation

GitHub Actions / Spell check

Unknown word (kerno)

Check warning on line 35 in README.md

View workflow job for this annotation

GitHub Actions / Spell check

Unknown word (kerno)

Check warning on line 35 in README.md

View workflow job for this annotation

GitHub Actions / Spell check

Unknown word (kerno)
```

30 seconds later you get a ranked diagnostic report with **plain-English causes, evidence, ETAs, and copy-paste fix steps** - no dashboards to wire, no query language to learn, no agents in your app.

The kernel knows minutes before your APM. Hours before your users. Kerno makes that visible.

Check warning on line 40 in README.md

View workflow job for this annotation

GitHub Actions / Spell check

Unknown word (Kerno)

**Same binary outside Kubernetes too.** `curl | bash` it onto any bare-metal box, EC2 instance, or systemd VM and `sudo kerno doctor` works exactly the same.

## Why Kerno?

Check warning on line 44 in README.md

View workflow job for this annotation

GitHub Actions / Spell check

Unknown word (Kerno)

It's 3am. PagerDuty fires. Latency is up, error budget is burning, and every dashboard you own is **green**.

Expand Down Expand Up @@ -739,3 +739,54 @@
If Kerno saved your on-call shift, consider leaving a **⭐** it helps other engineers find the project.

</div>

## Performance and overhead

Kerno is designed to be provably bounded-overhead under any workload.
Three mechanisms keep it safe under sustained high event rates:

### Ringbuf drop tracking
When the kernel ringbuf overflows, kerno increments
`kerno_ringbuf_drops_total{program, cpu}` via a dedicated BPF drop-count
map polled every 5 seconds. Alert on this metric to know when a node is
producing events faster than kerno can drain.

### Adaptive sampling
Each collector has a configurable events/sec budget (default 500K/s,
200K/s for sched). Once exceeded, probabilistic sampling activates.
Histogram distribution accuracy is preserved within ±5% even at 80%
sampling. Configure via:

```yaml
collectors:
rate_limits:
syscall_latency: 500000
sched_delay: 200000
sampling:
enabled: true
target_overhead_pct: 1.0
```

### BPF-side backpressure
When overloaded, userspace sets a per-CPU `cpu_backpressure` eBPF map.
All six BPF programs check this before `bpf_ringbuf_reserve()` and skip
emission entirely, preventing overflow at the source.

### Metrics

| Metric | Type | Description |
|--------|------|-------------|
| `kerno_ringbuf_drops_total` | Counter | Kernel ringbuf overflow events, by program and CPU |
| `kerno_collector_sampled_total` | Counter | Events dropped by userspace sampler, by collector |
| `kerno_overhead_pct` | Gauge | kerno CPU overhead — alert if > 2% |

### Recommended alert

```yaml
- alert: KernoOverloaded
expr: kerno_overhead_pct > 2
for: 5m
annotations:
summary: "kerno is the bottleneck on {{ $labels.instance }}"
```

5 changes: 5 additions & 0 deletions internal/bpf/c/disk_io.c
Original file line number Diff line number Diff line change
Expand Up @@ -44,8 +44,13 @@ int tracepoint_block_rq_complete(struct trace_event_raw_block_rq_completion *ctx
__u64 latency = bpf_ktime_get_ns() - *start_ts;
bpf_map_delete_elem(&io_start, &sector);

/* Phase 9.2.4: skip emit when userspace collector is overloaded. */
if (KERNO_BACKPRESSURE())
return 0;

struct disk_event *e = bpf_ringbuf_reserve(&events, sizeof(*e), 0);
if (!e)
KERNO_RECORD_DROP();
return 0;

e->timestamp_ns = bpf_ktime_get_ns();
Expand Down
10 changes: 10 additions & 0 deletions internal/bpf/c/fd_track.c
Original file line number Diff line number Diff line change
Expand Up @@ -28,8 +28,13 @@ int tracepoint_sys_exit_openat(struct trace_event_raw_sys_exit *ctx)
if (ret < 0)
return 0; // Failed open — ignore.

/* Phase 9.2.4: skip emit when userspace collector is overloaded. */
if (KERNO_BACKPRESSURE())
return 0;

struct fd_event *e = bpf_ringbuf_reserve(&events, sizeof(*e), 0);
if (!e)
KERNO_RECORD_DROP();
return 0;

__u64 pid_tgid = bpf_get_current_pid_tgid();
Expand All @@ -53,8 +58,13 @@ int tracepoint_sys_exit_close(struct trace_event_raw_sys_exit *ctx)
if (ctx->ret != 0)
return 0; // Failed close — ignore.

/* Phase 9.2.4: skip emit when userspace collector is overloaded. */
if (KERNO_BACKPRESSURE())
return 0;

struct fd_event *e = bpf_ringbuf_reserve(&events, sizeof(*e), 0);
if (!e)
KERNO_RECORD_DROP();
return 0;

__u64 pid_tgid = bpf_get_current_pid_tgid();
Expand Down
43 changes: 43 additions & 0 deletions internal/bpf/c/headers/kerno.h
Original file line number Diff line number Diff line change
Expand Up @@ -163,4 +163,47 @@ struct file_event {
__type(value, val_type); \
} name SEC(".maps")


// ─── Per-CPU Backpressure Guard Map (Phase 9.2.4) ──────────────────────────

struct {
__uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
__uint(max_entries, 1);
__type(key, __u32);
__type(value, __u32);
} cpu_backpressure SEC(".maps");

static __always_inline int kerno_backpressure_active(void)
{
__u32 key = 0;
__u32 *val = bpf_map_lookup_elem(&cpu_backpressure, &key);
return val && *val;
}

#define KERNO_BACKPRESSURE() kerno_backpressure_active()


// ─── Per-program Drop Counter Map (Phase 9.2.4) ────────────────────────────
//
// Incremented by BPF programs when bpf_ringbuf_reserve() returns NULL.
// Userspace polls this map and exports kerno_ringbuf_drops_total.
// Key 0 = drop count for this program instance.

struct {
__uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
__uint(max_entries, 1);
__type(key, __u32);
__type(value, __u64);
} kerno_drop_count SEC(".maps");

static __always_inline void kerno_record_drop(void)
{
__u32 key = 0;
__u64 *val = bpf_map_lookup_elem(&kerno_drop_count, &key);
if (val)
__sync_fetch_and_add(val, 1);
}

#define KERNO_RECORD_DROP() kerno_record_drop()

#endif // __KERNO_H__
5 changes: 5 additions & 0 deletions internal/bpf/c/oom_track.c
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,13 @@ int BPF_KPROBE(kprobe_oom_kill, struct oom_control *oc, const char *message)
if (!victim)
return 0;

/* Phase 9.2.4: skip emit when userspace collector is overloaded. */
if (KERNO_BACKPRESSURE())
return 0;

struct oom_event *e = bpf_ringbuf_reserve(&events, sizeof(*e), 0);
if (!e)
KERNO_RECORD_DROP();
return 0;

e->timestamp_ns = bpf_ktime_get_ns();
Expand Down
5 changes: 5 additions & 0 deletions internal/bpf/c/sched_delay.c
Original file line number Diff line number Diff line change
Expand Up @@ -53,8 +53,13 @@ int tracepoint_sched_switch(struct trace_event_raw_sched_switch *ctx)
if (delay < 1000)
return 0;

/* Phase 9.2.4: skip emit when userspace collector is overloaded. */
if (KERNO_BACKPRESSURE())
return 0;

struct sched_event *e = bpf_ringbuf_reserve(&events, sizeof(*e), 0);
if (!e)
KERNO_RECORD_DROP();
return 0;

e->timestamp_ns = bpf_ktime_get_ns();
Expand Down
5 changes: 5 additions & 0 deletions internal/bpf/c/syscall_latency.c
Original file line number Diff line number Diff line change
Expand Up @@ -43,8 +43,13 @@ int tracepoint_sys_exit(struct trace_event_raw_sys_exit *ctx)
if (latency < 1000)
return 0;

/* Phase 9.2.4: skip emit when userspace collector is overloaded. */
if (KERNO_BACKPRESSURE())
return 0;

struct syscall_event *e = bpf_ringbuf_reserve(&events, sizeof(*e), 0);
if (!e)
KERNO_RECORD_DROP();
return 0;

e->timestamp_ns = bpf_ktime_get_ns();
Expand Down
10 changes: 10 additions & 0 deletions internal/bpf/c/tcp_monitor.c
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,13 @@ int tracepoint_tcp_retransmit(struct trace_event_raw_tcp_retransmit_skb *ctx)
if (ctx->family != AF_INET)
return 0;

/* Phase 9.2.4: skip emit when userspace collector is overloaded. */
if (KERNO_BACKPRESSURE())
return 0;

struct tcp_event *e = bpf_ringbuf_reserve(&events, sizeof(*e), 0);
if (!e)
KERNO_RECORD_DROP();
return 0;

e->timestamp_ns = bpf_ktime_get_ns();
Expand Down Expand Up @@ -82,8 +87,13 @@ int tracepoint_inet_sock_set_state(struct trace_event_raw_inet_sock_set_state *c
return 0; // Skip intermediate states.
}

/* Phase 9.2.4: skip emit when userspace collector is overloaded. */
if (KERNO_BACKPRESSURE())
return 0;

struct tcp_event *e = bpf_ringbuf_reserve(&events, sizeof(*e), 0);
if (!e)
KERNO_RECORD_DROP();
return 0;

e->timestamp_ns = bpf_ktime_get_ns();
Expand Down
8 changes: 8 additions & 0 deletions internal/bpf/disk_io.go
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,14 @@ func (l *DiskIOLoader) readLoop(ctx context.Context, ch chan<- RawEvent) {
}
}

// DropMap returns the per-CPU drop counter map for this program.
// Returns nil if the program has not been loaded.
func (l *DiskIOLoader) DropMap() *ebpf.Map {
if l.objs == nil {
return nil
}
return l.objs.KernoDropCount
}
func (l *DiskIOLoader) close() {
if l.reader != nil {
l.reader.Close()
Expand Down
8 changes: 8 additions & 0 deletions internal/bpf/fd_track.go
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,14 @@ func (l *FDTrackLoader) readLoop(ctx context.Context, ch chan<- RawEvent) {
}
}

// DropMap returns the per-CPU drop counter map for this program.
// Returns nil if the program has not been loaded.
func (l *FDTrackLoader) DropMap() *ebpf.Map {
if l.objs == nil {
return nil
}
return l.objs.KernoDropCount
}
func (l *FDTrackLoader) close() {
if l.reader != nil {
l.reader.Close()
Expand Down
10 changes: 8 additions & 2 deletions internal/bpf/gen_stub.go
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ type syscallLatencyObjects struct {
TracepointSysEnter *ebpf.Program `ebpf:"tracepoint_sys_enter"`
TracepointSysExit *ebpf.Program `ebpf:"tracepoint_sys_exit"`
Events *ebpf.Map `ebpf:"events"`
KernoDropCount *ebpf.Map `ebpf:"kerno_drop_count"`
}

func loadSyscallLatencyObjects(obj *syscallLatencyObjects, opts *ebpf.CollectionOptions) error {
Expand All @@ -45,6 +46,7 @@ type tcpMonitorObjects struct {
TracepointTcpRetransmit *ebpf.Program `ebpf:"tracepoint_tcp_retransmit"`
TracepointInetSockSetState *ebpf.Program `ebpf:"tracepoint_inet_sock_set_state"`
Events *ebpf.Map `ebpf:"events"`
KernoDropCount *ebpf.Map `ebpf:"kerno_drop_count"`
}

func loadTcpMonitorObjects(obj *tcpMonitorObjects, opts *ebpf.CollectionOptions) error {
Expand All @@ -56,8 +58,9 @@ func (o *tcpMonitorObjects) Close() error { return nil }
// ─── OOM Track stubs ────────────────────────────────────────────────────────

type oomTrackObjects struct {
KprobeOomKill *ebpf.Program `ebpf:"kprobe_oom_kill"`
Events *ebpf.Map `ebpf:"events"`
KprobeOomKill *ebpf.Program `ebpf:"kprobe_oom_kill"`
Events *ebpf.Map `ebpf:"events"`
KernoDropCount *ebpf.Map `ebpf:"kerno_drop_count"`
}

func loadOomTrackObjects(obj *oomTrackObjects, opts *ebpf.CollectionOptions) error {
Expand All @@ -72,6 +75,7 @@ type diskIOObjects struct {
TracepointBlockRqIssue *ebpf.Program `ebpf:"tracepoint_block_rq_issue"`
TracepointBlockRqComplete *ebpf.Program `ebpf:"tracepoint_block_rq_complete"`
Events *ebpf.Map `ebpf:"events"`
KernoDropCount *ebpf.Map `ebpf:"kerno_drop_count"`
}

func loadDiskIOObjects(obj *diskIOObjects, opts *ebpf.CollectionOptions) error {
Expand All @@ -86,6 +90,7 @@ type schedDelayObjects struct {
TracepointSchedWakeup *ebpf.Program `ebpf:"tracepoint_sched_wakeup"`
TracepointSchedSwitch *ebpf.Program `ebpf:"tracepoint_sched_switch"`
Events *ebpf.Map `ebpf:"events"`
KernoDropCount *ebpf.Map `ebpf:"kerno_drop_count"`
}

func loadSchedDelayObjects(obj *schedDelayObjects, opts *ebpf.CollectionOptions) error {
Expand All @@ -100,6 +105,7 @@ type fdTrackObjects struct {
TracepointSysExitOpenat *ebpf.Program `ebpf:"tracepoint_sys_exit_openat"`
TracepointSysExitClose *ebpf.Program `ebpf:"tracepoint_sys_exit_close"`
Events *ebpf.Map `ebpf:"events"`
KernoDropCount *ebpf.Map `ebpf:"kerno_drop_count"`
}

func loadFdTrackObjects(obj *fdTrackObjects, opts *ebpf.CollectionOptions) error {
Expand Down
8 changes: 8 additions & 0 deletions internal/bpf/loader.go
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,16 @@ import (
"context"
"fmt"
"io"

"github.com/cilium/ebpf"
)

// DropMapper is implemented by loaders that expose a per-CPU BPF drop
// counter map. Userspace polls this to increment kerno_ringbuf_drops_total.
type DropMapper interface {
DropMap() *ebpf.Map
}

// Loader is the interface that all eBPF program loaders must implement.
// Each loader manages the lifecycle of one eBPF program: loading it into
// the kernel, attaching to hook points, and reading events from ring buffers.
Expand Down
8 changes: 8 additions & 0 deletions internal/bpf/oom_track.go
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,14 @@ func (l *OOMTrackLoader) readLoop(ctx context.Context, ch chan<- RawEvent) {
}
}

// DropMap returns the per-CPU drop counter map for this program.
// Returns nil if the program has not been loaded.
func (l *OOMTrackLoader) DropMap() *ebpf.Map {
if l.objs == nil {
return nil
}
return l.objs.KernoDropCount
}
func (l *OOMTrackLoader) close() {
if l.reader != nil {
l.reader.Close()
Expand Down
8 changes: 8 additions & 0 deletions internal/bpf/sched_delay.go
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,14 @@ func (l *SchedDelayLoader) readLoop(ctx context.Context, ch chan<- RawEvent) {
}
}

// DropMap returns the per-CPU drop counter map for this program.
// Returns nil if the program has not been loaded.
func (l *SchedDelayLoader) DropMap() *ebpf.Map {
if l.objs == nil {
return nil
}
return l.objs.KernoDropCount
}
func (l *SchedDelayLoader) close() {
if l.reader != nil {
l.reader.Close()
Expand Down
8 changes: 8 additions & 0 deletions internal/bpf/syscall_latency.go
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,14 @@ func (l *SyscallLatencyLoader) readLoop(ctx context.Context, ch chan<- RawEvent)
}
}

// DropMap returns the per-CPU drop counter map for this program.
// Returns nil if the program has not been loaded.
func (l *SyscallLatencyLoader) DropMap() *ebpf.Map {
if l.objs == nil {
return nil
}
return l.objs.KernoDropCount
}
func (l *SyscallLatencyLoader) close() {
if l.reader != nil {
l.reader.Close()
Expand Down
8 changes: 8 additions & 0 deletions internal/bpf/tcp_monitor.go
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,14 @@ func (l *TCPMonitorLoader) readLoop(ctx context.Context, ch chan<- RawEvent) {
}
}

// DropMap returns the per-CPU drop counter map for this program.
// Returns nil if the program has not been loaded.
func (l *TCPMonitorLoader) DropMap() *ebpf.Map {
if l.objs == nil {
return nil
}
return l.objs.KernoDropCount
}
func (l *TCPMonitorLoader) close() {
if l.reader != nil {
l.reader.Close()
Expand Down
Loading
Loading