CDC-Cache

CDC-Cache is a research prototype for CDC-triggered result-cache invalidation for distributed OLAP queries. It connects PostgreSQL logical replication, Debezium, Kafka, Redis, Trino, and small Go services to evaluate table-level cache invalidation on TPC-H SF10.

Architecture

The stack contains:

PostgreSQL 16 with TPC-H SF10 tables and logical replication enabled
Debezium Server 2.6 streaming PostgreSQL WAL events to Kafka
Apache Kafka 3.7 in single-node KRaft mode
Trino 450 querying PostgreSQL
Redis 7 storing result-cache entries, table-version counters, and streams
Proxy, a Go HTTP cache-aside query proxy
Bridge, a Go Kafka consumer that increments Redis table-version counters
Shadow, a Go validator that checks sampled cache hits against PostgreSQL

Source writes flow through:

PostgreSQL -> Debezium -> Kafka -> Bridge -> Redis version counters

Reads flow through:

client -> Proxy -> Redis cache or Trino -> PostgreSQL

Repository Layout

postgres/      TPC-H schema, indexes, data loader, stack verification
debezium/      Debezium Server configuration
trino/         Trino coordinator and PostgreSQL catalog configuration
proxy/         Go HTTP result-cache proxy
bridge/        Go Kafka-to-Redis invalidation bridge
shadow/        Go cache-hit validator
replay/        Trace generation and live replay harness
analysis/      CDC-race labeling, latency joins, statistics, plots
tests/         Integration and unit tests
figures300/    Summary CSVs and multiple-comparison output for 300s sweeps

Requirements

Docker Desktop or Docker Engine with Compose
Go 1.22+
Python 3.12+
gcc, make, git, and sed for TPC-H dbgen

The Python dependencies are listed in pyproject.toml.

Quick Start

Start the stack:

make up

Load TPC-H SF10 into PostgreSQL:

make load

Verify the Trino/PostgreSQL stack:

make verify-stack

Build Go services:

make build

Run integration tests from the repository root after the stack is up and SF10 is loaded:

python3 -m pytest tests/integration

Running a Sweep

Build primary-key pools from the loaded database:

python3 replay/pk_pool.py --n 100000

Generate traces:

python3 replay/trace_gen.py --seed 1 --pattern poisson --duration 300 --rate 10
python3 replay/trace_gen.py --seed 1 --pattern mmpp --duration 300 --rate 10
python3 replay/trace_gen.py --seed 1 --pattern zipf --duration 300 --rate 10

Replay a trace:

python3 replay/replay.py \
  --trace traces/trace_seed1_poisson_300s.parquet \
  --run-id sweep300_seed1_poisson

Aggregate a 5-seed x 3-pattern sweep:

python3 analysis/sweep_analysis.py \
  --runs-dir runs \
  --out-dir figures300 \
  --run-prefix sweep300_seed

Data and Outputs

Large generated artifacts are intentionally not committed:

raw TPC-H .tbl data
Docker volumes
generated traces
raw replay run directories
sampled primary-key pools

The committed figures300/summary_table.csv and figures300/multcomp_results.txt summarize the 300-second SF10 sweep used by the paper.

License

This project is licensed under the Apache License 2.0. See LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CDC-Cache

Architecture

Repository Layout

Requirements

Quick Start

Running a Sweep

Data and Outputs

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
analysis		analysis
bridge		bridge
debezium		debezium
figures		figures
figures300		figures300
postgres		postgres
proxy		proxy
replay		replay
runs		runs
shadow		shadow
tests		tests
traces		traces
trino		trino
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.connect.yml		docker-compose.connect.yml
docker-compose.ha.yml		docker-compose.ha.yml
docker-compose.netem.yml		docker-compose.netem.yml
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
sweep.py		sweep.py
sweep300.py		sweep300.py

Folders and files

Latest commit

History

Repository files navigation

CDC-Cache

Architecture

Repository Layout

Requirements

Quick Start

Running a Sweep

Data and Outputs

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages