Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions blog/2026-06-04-tiering-service-part1.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,12 +13,12 @@ This three-part walkthrough aims to bring some clarity to the confusing parts of

**Part 1** builds the mental model from scratch and by the end of it you'll be able to describe, step by step, what happens between the moment a tiering timer fires and the moment a lake snapshot is committed.

**[Part 2](/blog/fluss-tiering-service-deep-dive-part2) and Part 3** take that mental model and add the dials (parallelism, table kinds, freshness, multi-table behavior, scale-out) and then put it into a real production deployment (failures, pitfalls, monitoring).
**[Part 2](/blog/fluss-tiering-service-deep-dive-part2) and [Part 3](/blog/fluss-tiering-service-deep-dive-part3)** take that mental model and add the dials (parallelism, table kinds, freshness, multi-table behavior, scale-out) and then put it into a real production deployment (failures, pitfalls, monitoring).

**Tiering Service Deep Dive, 3-parts:**
* **Part 1 - The Mental Model:** how one tiering round actually works, from timer fire to lake commit.
* **[Part 2 - Tuning](/blog/fluss-tiering-service-deep-dive-part2):** per-table dials, multi-table dynamics, and scaling out.
* **Part 3 - In Production** failure modes, design pitfalls, and monitoring.
* **[Part 3 - In Production](/blog/fluss-tiering-service-deep-dive-part3):** failure modes, design pitfalls, and monitoring.
<!-- truncate -->

# Why tiering exists in the first place
Expand Down Expand Up @@ -94,7 +94,7 @@ So the worst-case lag of **"up to 30 seconds"** applies to mid-round status, not

**Second, every message has an "epoch" number stamped on it**. Each time a table starts a new tiering attempt, its epoch goes up by one.
If a Flink job tries to report success with a stale epoch (because the coordinator already gave up on that attempt and handed the work to someone else), the coordinator just ignores the message.
This is how the system stays consistent even when things go sideways and we'll come back to it in **Part 3**, in the failure-modes section.
This is how the system stays consistent even when things go sideways and we'll come back to it in **[Part 3](/blog/fluss-tiering-service-deep-dive-part3)**, in the failure-modes section.

## The Life Of A Table: Four States, Walked Through
Every table that's enabled for lake tiering goes through the same lifecycle.
Expand All @@ -104,7 +104,7 @@ The coordinator tracks each table's current state. For our purposes, four states

> **Note:** The actual source code uses seven state names: `NEW`, `INITIALIZED`, `SCHEDULED`, `PENDING`, `TIERING`, `TIERED`, `FAILED`. Here we collapse them into four pedagogical states to keep the mental model small.

**NEW** is only used for the very first time a lake-enabled table is created, and `INITIALIZED` is only used for tables the coordinator rediscovers after a restart. Both transition into `SCHEDULED` immediately, so we ignore them here. `FAILED` is the unhappy-path state we'll come back to in **Part 3**.
**NEW** is only used for the very first time a lake-enabled table is created, and `INITIALIZED` is only used for tables the coordinator rediscovers after a restart. Both transition into `SCHEDULED` immediately, so we ignore them here. `FAILED` is the unhappy-path state we'll come back to in **[Part 3](/blog/fluss-tiering-service-deep-dive-part3)**.

**WAITING.** The table has been tiered recently and the freshness timer is counting down. Nothing to do.

Expand All @@ -124,7 +124,7 @@ Three concepts are important to understand when you're reading this:

### Let's walk through a single tiering round with a concrete example.

We have one table, called `orders`, with four buckets. It's a log table (we'll talk about what that means in Part 2, when we cover table kinds. For now, just think **"an append-only stream of records"**). Freshness is configured to five minutes. The Flink tiering job is up and running.
We have one table, called `orders`, with four buckets. It's a log table (we'll talk about what that means in [Part 2](/blog/fluss-tiering-service-deep-dive-part2), when we cover table kinds. For now, just think **"an append-only stream of records"**). Freshness is configured to five minutes. The Flink tiering job is up and running.

Here's what happens:

Expand Down
9 changes: 7 additions & 2 deletions blog/2026-06-09-tiering-service-part2.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,11 @@ The freshness setting, the one knob most users actually touch, does two differen
Once a single job is handling many tables, queue position starts to dominate effective freshness more than any per-table setting.
And once that happens, you have a deployment-shape decision: stay with one job, or scale out. By the end, you'll know which levers matter most and how to use them.

**Tiering Service Deep Dive, 3-parts:**
* **[Part 1 - The Mental Model](/blog/fluss-tiering-service-deep-dive-part1):** how one tiering round actually works, from timer fire to lake commit.
* **Part 2 - Tuning:** per-table dials, multi-table dynamics, and scaling out.
* **[Part 3 - In Production](/blog/fluss-tiering-service-deep-dive-part3):** failure modes, design pitfalls, and monitoring.

<!-- truncate -->

## Buckets, Splits, And How The Work Gets Divided
Expand Down Expand Up @@ -274,7 +279,7 @@ So from the coordinator's perspective, every Flink tiering job is indistinguisha
There's no job ID, no database filter, no notion of "this table belongs to that job".
All registered jobs are undifferentiated workers reaching into the same queue, and the head of the queue goes to whoever happens to heartbeat first. If you have two jobs, two tables can be in `Tiering` at the same time. If you have five, five can. But you cannot pin clicks to job A; the coordinator wouldn't know how to honor that pin even if you asked.

The epoch mechanism from Part 1's heartbeat section keeps this safe regardless of how many jobs are running. Each table assignment carries a `tiering_epoch` stamped on it. If two jobs somehow ended up working on the same table (a rare edge case during coordinator failover, mainly), the coordinator only accepts the commit whose epoch matches its current record; the other is rejected with an epoch-fencing error. So multiple jobs running concurrently can never produce duplicate commits or corrupt lake state; the worst case is some wasted reader work that doesn't get committed.
The epoch mechanism from [Part 1's heartbeat section](/blog/fluss-tiering-service-deep-dive-part1#the-heartbeat) keeps this safe regardless of how many jobs are running. Each table assignment carries a `tiering_epoch` stamped on it. If two jobs somehow ended up working on the same table (a rare edge case during coordinator failover, mainly), the coordinator only accepts the commit whose epoch matches its current record; the other is rejected with an epoch-fencing error. So multiple jobs running concurrently can never produce duplicate commits or corrupt lake state; the worst case is some wasted reader work that doesn't get committed.

![Multiple tiering jobs pulling work from the coordinator's single shared FIFO queue](assets/tiering_service_dd_part2/fig7.png)

Expand Down Expand Up @@ -314,6 +319,6 @@ In practice the pairing tends to be sticky: once a job has been running short ro

## What's Next?
You now know all the dials, from per-table settings like bucket count and freshness, through the multi-table queue dynamics, to the deployment-shape choice between one tiering job and several.
Everything you've read so far has been about how the system behaves. Part 3 is about what you do with it.
Everything you've read so far has been about how the system behaves. [Part 3](/blog/fluss-tiering-service-deep-dive-part3) is about what you do with it.


Loading