Druid Feature Request: Dynamic Rate-Based Sampling for Streaming Ingestion

### Description

Add an optional mechanism to automatically sample (drop a configurable fraction of) incoming rows during streaming ingestion when the ingestion rate exceeds a configurable threshold. When the rate is below the threshold, all rows would be ingested as normal; when it exceeds the threshold, the system would apply sampling to reduce load.

This would allow users to maintain ingestion stability during traffic spikes or bursts, trading some data completeness for system stability when scaling out (via the existing task autoscaler) is not sufficient or desirable.

### Motivation

**Use case:** Streaming ingestion from Kafka or Kinesis can experience sudden spikes in event volume. When incoming rate exceeds what the cluster can sustainably process, lag accumulates and tasks may fail, backpressure builds, or the system becomes unstable. Today, Druid addresses this primarily by scaling out (adding more ingestion tasks via the autoscaler), but there are scenarios where:

- Additional task capacity is not immediately available (e.g., worker slots constrained)
- Users prefer to sample during bursts rather than risk ingestion failures or growing lag
- The use case tolerates approximate data (e.g., metrics, sampling-friendly analytics)

**Rationale:** Providing a built-in, configurable way to sample when rate exceeds a threshold would give operators a predictable, observable fallback when the stream overwhelms available capacity. It would complement (not replace) the existing autoscaling approach—users could configure both and have sampling kick in only when scaling alone is insufficient.

**Benefit:** Users would have a declarative way to prioritize ingestion stability over data completeness during overload, with sampled events tracked in existing metrics (e.g., `ingest/events/thrownAway` with a distinct reason) for visibility.

### Implementation considerations

Any implementation would likely build on existing building blocks: the row-level filtering applied during streaming ingestion (e.g., `InputRowFilter`, `FilteringCloseableInputRowIterator`), the throughput and thrown-away metrics already exposed per task (`RowIngestionMeters`, `DropwizardRowIngestionMeters` with its moving averages), and the streaming tuning config for new parameters (`SeekableStreamIndexTaskTuningConfig`). The cost-based and lag-based autoscalers already consume task-level stats for scaling decisions—similar metrics could inform when sampling is warranted.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Druid Feature Request: Dynamic Rate-Based Sampling for Streaming Ingestion #19038

Description

Motivation

Implementation considerations

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Druid Feature Request: Dynamic Rate-Based Sampling for Streaming Ingestion #19038

Description

Description

Motivation

Implementation considerations

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions