feat: Adjustment of the master-slave synchronization dump cleanup mechanism, optimizing storage space usage by chenbt-hz · Pull Request #3225 · OpenAtomFoundation/pikiwidb

chenbt-hz · 2026-03-06T08:59:14Z

1.hlen命令导致ttl异常
2.rate_limit异常
3.\x00解析异常
4.【优化】调整主从同步时的dump目录清理机制

Summary by CodeRabbit

Release Notes

New Features
- Enhanced backup snapshot tracking and ownership management for safer rsync transfers
- New backup creation method that reduces time windows during checkpoint operations
Bug Fixes
- Fixed integer overflow in default bandwidth rate limiter configuration
- Improved error handling for directory operations and file transfers
- Enhanced dump file integrity validation
Configuration Changes
- Updated default bandwidth rate limiter settings
- Adjusted wash-data default value

1. 每个Slave独占一个dump目录（dump-YYYYMMDD-NN格式） 2. 传输完成后立即清理文件，释放磁盘空间 3. 限制并发dump数量为3个 4. 完善dump完整性检查和占用管理备注：1.未兼容多database场景 2.多slave同时全量同步时，清理存在异常 3.当天多次手动同步时孤儿文件清理能力下降

chenbt-hz · 2026-03-06T09:02:28Z

feat: 独立dump + 即时清理机制,解决Pika全量同步孤儿文件问题：

每个Slave独占一个dump目录（dump-YYYYMMDD-NN格式）
传输完成后立即清理文件，释放磁盘空间
限制并发dump数量为3个
完善dump完整性检查和占用管理
备注：1.未兼容多database场景 2.多slave同时全量同步时，清理存在异常 3.当天多次手动同步时孤儿文件清理能力下降

Issues-translate-bot · 2026-03-06T09:02:38Z

Bot detected the issue body's language is not English, translate it automatically.

feat: independent dump + instant cleanup mechanism to solve the problem of full synchronization of orphan files in Pika:

Each Slave has an exclusive dump directory (dump-YYYYMMDD-NN format)
Clean the files immediately after the transfer is completed to free up disk space
Limit the number of concurrent dumps to 3
Improve dump integrity check and occupancy management
Notes: 1. Not compatible with multi-database scenarios 2. When multiple slaves are fully synchronized at the same time, there are exceptions in cleaning. 3. The ability to clean orphan files decreases when manual synchronization is performed multiple times in the same day.

chenbt-hz · 2026-03-06T09:15:56Z

Pika 全量同步方案详解

文档说明

本文档详细描述 Pika 新的全量同步机制（Scheme A），包括各场景下的完整流程、状态变化、数据流转以及已知问题。

方案名称：Scheme A（独立 Dump + 延迟清理）
最后更新：2026-03-06

1. 架构概述

1.1 核心设计

Scheme A 采用以下设计原则：

每个 Slave 独占一个 Dump 目录：dump-YYYYMMDD-NN/db_name 格式
传输完成后延迟清理：孤儿文件（nlink=1）传输完成后加入延迟清理队列（10分钟后删除）
最大并发限制：默认最多 3 个并发 dump
细粒度文件保护：传输中的文件受保护，防止被误删
统一清理入口：所有孤儿文件清理通过 RemoveTransferringFile 统一处理

1.2 关键组件

组件	文件	职责
RsyncServer	`rsync_server.cc`	处理 Slave 文件同步请求
RsyncServerConn	`rsync_server.cc`	维护单个连接的状态
PikaServer	`pika_server.cc`	管理 Dump 占用、snapshot 注册
DB	`pika_db.cc`	管理 bgsave 和 dump 元数据

1.3 关键数据结构

// Dump 占用信息
struct DumpOwnerInfo {
    std::string conn_id;      // 占用连接的 ID
    std::string dump_path;    // dump 目录路径
};
std::map<std::string, DumpOwnerInfo> dump_owners_;  // snapshot_uuid -> 占用信息

// 传输中文件保护
std::map<std::string, std::set<std::string>> rsync_transferring_files_;  // snapshot_uuid -> 文件集合

// 活跃 snapshot
std::set<std::string> active_rsync_snapshots_;  // 用于孤儿文件清理保护

2. 单个 Slave 单 DB 全量同步流程

以 db0 为例，详细描述 Master 和 Slave 的状态变化。

2.1 流程时序图

阶段1: 触发全量同步
┌─────────────┐                    ┌─────────────┐
│    Slave    │                    │    Master   │
└──────┬──────┘                    └──────┬──────┘
       │                                   │
       │  1. 判断需要全量同步               │
       │  (repl_state: kTryConnect)       │
       │                                   │
       │  2. 发送 DBSync 请求              │
       │ ───────────────────────────────>│
       │                                   │
       │                              3. 检查是否正在 bgsave
       │                              (IsBgSaving())
       │                                   │
       │                              4. 如果不在 bgsave，触发 bgsave
       │                              (BgSaveDB())
       │                                   │
       │  5. 返回 kErr（等待 bgsave）      │
       │ <───────────────────────────────│
       │                                   │
       │  6. 重试（循环）                   │
       │ ───────────────────────────────>│
       │                              7. 如果仍在 bgsave，返回 kErr
       │ <───────────────────────────────│
       │                                   │

阶段2: bgsave 执行
┌─────────────┐                    ┌─────────────┐
│  Background │                    │    Master   │
│    Thread   │                    │             │
└──────┬──────┘                    └──────┬──────┘
       │                                   │
       │  1. 创建 dump 目录                 │
       │  (InitBgsaveEnv)                  │
       │  dump-20260305-0/db0              │
       │                                   │
       │  2. 创建 RocksDB Checkpoint       │
       │  (创建硬链接)                      │
       │                                   │
       │  3. 生成 info 文件                 │
       │                                   │
       │  4. bgsave 完成                   │
       │  (IsBgSaving() -> false)          │
       │                                   │

阶段3: Meta 请求处理
┌─────────────┐                    ┌─────────────┐
│    Slave    │                    │    Master   │
└──────┬──────┘                    └──────┬──────┘
       │                                   │
       │  1. 再次发送 DBSync 请求          │
       │  (循环重试后 bgsave 已完成)        │
       │ ───────────────────────────────>│
       │                                   │
       │                              2. 获取文件列表
       │                              (GetDumpMeta)
       │                              扫描 dump-20260305-0/db0
       │                              生成 snapshot_uuid
       │                                   │
       │                              3. 检查 dump 完整性
       │                              4. 检查是否已被占用
       │                              5. 检查并发限制
       │                              6. 标记 dump 为占用
       │                              (MarkDumpInUse)
       │                              7. 注册 snapshot
       │                              (RegisterSnapshot)
       │                              8. 预注册所有文件
       │                              (AddTransferringFile)
       │                                   │
       │  9. 返回 Meta 响应                 │
       │  (snapshot_uuid + 文件列表)       │
       │ <───────────────────────────────│
       │                                   │

阶段4: 文件传输
┌─────────────┐                    ┌─────────────┐
│    Slave    │                    │    Master   │
└──────┬──────┘                    └──────┬──────┘
       │                                   │
       │  1. 多线程下载文件                 │
       │  ─────────────────────────────> │
       │                                   │
       │                              2. 检查文件是否存在
       │                              3. 注册文件为传输中
       │                              4. 读取文件内容
       │                              5. 注销文件
       │                              6. 如果是最后一块(is_eof)
       │                                 检查是否为孤儿文件(nlink=1)
       │                                 如果是孤儿，加入延迟清理队列(10分钟)
       │                                   │
       │  7. 返回文件数据                   │
       │ <───────────────────────────────│
       │                                   │
       │  (重复直到所有文件下载完成)         │

阶段5: 清理
┌─────────────┐                    ┌─────────────┐
│    Slave    │                    │    Master   │
└──────┬──────┘                    └──────┬──────┘
       │                                   │
       │  1. 下载完成，关闭连接             │
       │ ───────X────────────────────────>│
       │                                   │
       │                              2. 连接断开，析构 RsyncServerConn
       │                              3. 释放 dump 占用
       │                              (ReleaseDump)
       │                              4. 注销 snapshot
       │                              (UnregisterSnapshot)
       │                                   │
       │                              5. AutoDeleteExpiredDump 定时执行
       │                              处理延迟清理队列(ProcessPendingCleanupFiles)
       │                              删除过期 dump 目录
       │                              (注：CleanupOrphanSstFiles 已移除，延迟清理统一处理)
       │                                   │

2.2 Master 状态变化

阶段	状态	说明
T0	无 dump	初始状态
T1	bgsaving	创建 dump-20260305-0/db0
T2	dump 可用	bgsave 完成，等待 Meta 请求
T3	dump 占用	收到 Meta 请求，标记为占用
T4	传输中	文件传输中，即时清理进行中
T5	dump 释放	Slave 断开，释放占用
T6	dump 过期	AutoDeleteExpiredDump 删除过期 dump

2.3 Slave 状态变化

阶段	状态	说明
T0	kTryConnect	尝试连接 Master
T1	kWaitDBSync	等待 Master bgsave 完成
T2	kWaitDBSync	获取文件列表，开始下载
T3	kWaitDBSync	文件下载中
T4	kConnected	全量同步完成，开始增量同步

2.4 数据变化

Master 磁盘占用变化：

时间点	数据目录	Dump 目录	总计
初始	100GB	0	100GB
bgsave 中	100GB	0 (硬链接不占用)	100GB
compaction 后	100GB	部分孤儿文件	100GB + 孤儿文件
传输中	100GB	100GB (dump)	200GB
传输完成	100GB	孤儿文件延迟10分钟清理	100GB ~ 200GB

3. 多 Slave 同步流程

3.1 场景描述

Master 有 100GB 数据
Slave-1 先发起同步
Slave-2 在 Slave-1 同步过程中发起同步

3.2 流程时序

时间线:
T0:
  Slave-1 ──DBSync──> Master
  Master: IsBgSaving? No
  Master: 触发 BgSaveDB()
  Master: 创建 dump-20260305-0/db0
  Slave-1 <──kErr─── Master (等待 bgsave)

T30s:
  Master: bgsave 完成
  Slave-1 ──DBSync──> Master
  Master: 获取文件列表 (dump-0)
  Master: MarkDumpInUse(dump-0, Slave-1)
  Slave-1 <──文件列表── Master
  Slave-1 开始下载...

T31s:
  Slave-2 ──DBSync──> Master
  Master: IsDumpInUse(dump-0)? Yes (被 Slave-1 占用)
  Master: 触发新的 BgSaveDB()
  Master: 创建 dump-20260305-1/db0
  Slave-2 <──kErr─── Master (等待新 bgsave)

T61s:
  Master: 新 bgsave 完成
  Slave-2 ──DBSync──> Master
  Master: MarkDumpInUse(dump-1, Slave-2)
  Slave-2 <──文件列表── Master
  Slave-2 开始下载...

T120s:
  Slave-1: 下载完成，断开连接
  Master: ReleaseDump(dump-0)
  Master: 删除 dump-0 (AutoDeleteExpiredDump)

T180s:
  Slave-2: 下载完成，断开连接
  Master: ReleaseDump(dump-1)
  Master: 删除 dump-1

3.3 关键限制

最大并发 dump 数：3 个（kMaxConcurrentDumps = 3）
超过限制：返回 kErr，Slave 重试

4. 单 Slave 多 DB 同步流程

4.1 场景描述

Master 配置 3 个 DB：db0, db1, db2
每个 DB 有独立的 RocksDB 实例（db-instance-num=3）
Slave 同时同步所有 DB

4.2 目录结构

dump/dump-20260305-0/
├── db0/
│   ├── 0/          # RocksDB 实例 0
│   │   ├── 000001.sst
│   │   └── 000002.sst
│   ├── 1/          # RocksDB 实例 1
│   │   └── 000003.sst
│   ├── 2/          # RocksDB 实例 2
│   │   └── 000004.sst
│   └── info        # dump 元信息
├── db1/
│   ├── 0/
│   ├── 1/
│   ├── 2/
│   └── info
└── db2/
    ├── 0/
    ├── 1/
    ├── 2/
    └── info

4.3 文件命名规则

Slave 请求格式：{rocksdb_instance}/{filename}
示例：0/000001.sst, 1/000003.sst
注意：不包含 db0/db1/db2 前缀

4.4 同步流程

每个 DB 独立同步：

Slave 发送 db0 的 DBSync 请求
Master 返回 db0 的文件列表
Slave 下载 db0 的所有文件
重复步骤 1-3 对 db1 和 db2

4.5 潜在问题

问题：info 文件位置不一致

AutoDeleteExpiredDump 查找：dump/dump-xxx/info
实际位置：dump/dump-xxx/db0/info

已修复：先尝试 db0/info，再回退到 info

5. 多 Slave 多 DB 同步流程

这是 Scheme A 最复杂的场景，结合了多 Slave 和多 DB 的特点。

5.1 场景描述

Master：3 个 DB (db0, db1, db2)
Slave-1：同步 db0, db1, db2
Slave-2：同步 db0, db1, db2

5.2 Dump 占用机制

方案 A 设计：每个 Slave 独占整个 dump 目录（包含所有 DB）

Slave-1 占用 dump-20260305-0:
├── db0 (传输中)
├── db1 (传输中)
└── db2 (传输中)

Slave-2 占用 dump-20260305-1:
├── db0 (传输中)
├── db1 (传输中)
└── db2 (传输中)

5.3 占用检查

检查粒度：整个 dump 目录
如果一个 Slave 正在使用 dump-0，其他 Slave 不能使用
触发新的 bgsave 创建 dump-1

5.4 潜在问题

问题 1：DB 级别粒度 vs Dump 级别粒度

当前设计：dump 级别占用
如果 Slave-1 只同步 db0，dump-0 仍不能被 Slave-2 使用
浪费磁盘空间

问题 2：多 DB 的孤儿文件清理

AutoDeleteExpiredDump 只检查 db0/info
如果 db1 或 db2 还在传输，可能被误判为可清理

6. 孤儿文件清理机制（统一延迟清理）

6.1 触发条件

孤儿文件：nlink=1 的 SST 文件（只被 dump 引用，不被 RocksDB 引用）

产生原因：

RocksDB compaction 删除旧 SST
dump 中的硬链接变成孤儿

6.2 统一清理策略

设计变更：移除 CleanupOrphanSstFiles 函数，统一使用延迟清理队列

新清理流程：

1. 文件传输完成时（RemoveTransferringFile）
   - 检查 is_eof=true（最后一块传输完成）
   - stat 检查文件 nlink
   - 如果 nlink=1（孤儿文件）：
     * 加入延迟清理队列（ScheduleFileForCleanup，延迟600秒）
     * 记录日志 "Scheduled orphan file for cleanup"
   - 如果 nlink=2（非孤儿）：
     * 不做处理，RocksDB 会管理生命周期

2. AutoDeleteExpiredDump 定时执行（每60秒）
   - 调用 ProcessPendingCleanupFiles()
   - 检查队列中到期的文件
   - 删除到期文件，记录日志 "Deleted delayed cleanup file"
   - 同时检查并删除过期的 dump 目录

6.3 保护机制

保护级别	说明	实现位置
传输中保护	传输中的文件不会被清理	`rsync_transferring_files_`
延迟保护	孤儿文件延迟10分钟删除，给 Slave 重试时间	`ScheduleFileForCleanup(filepath, 600)`
nlink 检查	只清理孤儿文件（nlink=1），避免误删	`stat` 检查

6.4 时序说明

T0: 文件传输完成（is_eof=true）
  └─> RemoveTransferringFile 检查 nlink==1
      └─> ScheduleFileForCleanup(filepath, 600) 加入队列
          └─> 日志: "Scheduled orphan file for cleanup in 10min"

T0+10min: AutoDeleteExpiredDump 定时执行
  └─> ProcessPendingCleanupFiles()
      └─> 检查到期文件
          └─> 删除文件
              └─> 日志: "Deleted delayed cleanup file"

6.5 对比：旧方案 vs 新方案

方面	旧方案（CleanupOrphanSstFiles）	新方案（统一延迟清理）
触发时机	定时扫描所有 dump 目录	传输完成时即时检查
清理延迟	扫描周期不确定	固定延迟10分钟
竞争条件	与传输过程可能竞争	无竞争，统一入口
代码复杂度	~170行独立函数	~15行集成逻辑
Slave 重试	可能失败（文件已被删）	10分钟内可重试

7. Bug 列表

7.1 待修复的 Bug

Bug	影响	严重程度	修复方案
多 DB 场景下孤儿文件清理粒度问题	db1/db2 传输中可能被误判	中	检查所有 DB 的 info 文件
多 Slave 多 DB 的磁盘浪费	每个 Slave 独占整个 dump	低	支持 DB 级别占用

8. 待办事项

8.1 高优先级

统一孤儿文件清理机制（已完成）
- 移除 CleanupOrphanSstFiles 函数
- 统一使用 RemoveTransferringFile + 延迟清理队列
- 延迟10分钟，给 Slave 重试时间
修复多 DB 孤儿文件清理粒度问题
- 当前只检查 db0/info
- 需要检查所有 DB 子目录
- 如果任何 DB 在使用中，整个 dump 应被保护

8.2 中优先级

优化多 Slave 多 DB 的磁盘占用
- 当前：每个 Slave 独占整个 dump
- 优化：支持 DB 级别占用
- 影响：需要修改占用管理逻辑
完善监控指标
- Dump 占用数量
- 孤儿文件清理统计
- 传输失败率

8.3 低优先级

支持动态调整并发限制
- 当前：编译期常量 kMaxConcurrentDumps=3
- 优化：支持配置热更新
Dump 压缩传输
- 减少网络带宽
- 权衡 CPU 和网络

9. 配置建议

9.1 关键配置项

# pika.conf

# dump 目录前缀
dump-prefix : dump-

# dump 目录路径
dump-path : ./dump/

# dump 过期时间（天）
# 0 表示永不过期
dump-expire : 1

# RocksDB 实例数
db-instance-num : 3

# 最大并发 dump 数（编译期配置）
# kMaxConcurrentDumps = 3

9.2 部署建议

磁盘空间：预留 3 × 数据量的空间
监控：监控 dump 目录数量和磁盘使用率
日志：关注 [Rsync Meta]、[RsyncTransfer]、[Scheduled orphan file、[Deleted delayed cleanup 日志

10. 附录

10.1 关键日志

# 查看 Meta 请求处理
grep "Rsync Meta" log/pika.INFO

# 查看文件传输
grep "RsyncTransfer" log/pika.INFO

# 查看孤儿文件延迟清理调度
grep "Scheduled orphan file" log/pika.INFO

# 查看延迟清理执行
grep "Deleted delayed cleanup file" log/pika.INFO

# 查看 dump 占用
grep "DumpOwnership" log/pika.INFO

# 查看错误
grep "File no longer exists" log/pika.WARNING

10.2 状态码说明

状态码	含义	处理
kOk	成功	继续处理
kErr	错误	Slave 重试

10.3 文件路径规范

类型	格式	示例
Dump 目录	dump-YYYYMMDD-NN/db_name	dump-20260305-0/db0
RocksDB 实例	{rocksdb_instance}/	0/, 1/, 2/
SST 文件	{instance}/{filename}.sst	0/000001.sst
Info 文件	db_name/info	db0/info

Issues-translate-bot · 2026-03-06T09:16:10Z

Bot detected the issue body's language is not English, translate it automatically.

Detailed explanation of Pika’s full synchronization plan

Document description

This document describes in detail Pika's new full synchronization mechanism (Scheme A), including the complete process, status changes, data flow and known issues in each scenario.

Scheme name: Scheme A (independent Dump + delayed cleanup)
Last updated: 2026-03-06

1. Architecture Overview

1.1 Core design

Scheme A uses the following design principles:

Each Slave has an exclusive Dump directory: dump-YYYYMMDD-NN/db_name format
Delayed cleanup after the transfer is completed: Orphan files (nlink=1) are added to the delayed cleanup queue after the transfer is completed (deleted after 10 minutes)
Maximum concurrency limit: Up to 3 concurrent dumps by default
Fine-grained file protection: Files in transit are protected to prevent accidental deletion.
Unified cleanup entry: All orphan file cleanups are processed uniformly through RemoveTransferringFile

1.2 Key components

Components	Documentation	Responsibilities
RsyncServer	`rsync_server.cc`	Handle Slave file synchronization requests
RsyncServerConn	`rsync_server.cc`	Maintains the state of a single connection
PikaServer	`pika_server.cc`	Manage Dump occupation, snapshot registration
DB	`pika_db.cc`	Manage bgsave and dump metadata

1.3 Key data structures

// Dump occupancy information
struct DumpOwnerInfo {
    std::string conn_id; // ID of occupied connection
    std::string dump_path; // dump directory path
};
std::map<std::string, DumpOwnerInfo> dump_owners_; // snapshot_uuid -> occupancy information

// File protection in transit
std::map<std::string, std::set<std::string>> rsync_transferring_files_; // snapshot_uuid -> file collection

// active snapshot
std::set<std::string> active_rsync_snapshots_; // Used for orphan file cleanup protection

2. Single Slave single DB full synchronization process

Taking db0 as an example, describe the status changes of Master and Slave in detail.

2.1 Process sequence diagram

Phase 1: Trigger full synchronization
┌─────────────┐ ┌─────────────┐
│ Slave │ │ Master │
└──────┬──────┘ └──────┬───────┘
       │ │
       │ 1. Determine the need for full synchronization │
       │ (repl_state: kTryConnect) │
       │ │
       │ 2. Send DBSync request │
       │────────────────────────────────>│
       │ │
       │ 3. Check whether bgsave is running
       │ (IsBgSaving())
       │ │
       │ 4. If it is not in bgsave, trigger bgsave
       │ (BgSaveDB())
       │ │
       │ 5. Return kErr (wait for bgsave) │
       │ <────────────────────────────────│
       │ │
       │ 6. Retry (loop) │
       │────────────────────────────────>│
       │ 7. If still in bgsave, return kErr
       │ <───────────────────────────────│
       │ │

Phase 2: bgsave execution
┌─────────────┐ ┌─────────────┐
│Background │ │Master │
│ Thread │ │ │
└──────┬──────┘ └──────┬───────┘
       │ │
       │ 1. Create dump directory │
       │ (InitBgsaveEnv) │
       │ dump-20260305-0/db0 │
       │ │
       │ 2. Create RocksDB Checkpoint │
       │ (Create hard link) │
       │ │
       │ 3. Generate info file │
       │ │
       │ 4. bgsave completed │
       │ (IsBgSaving() -> false) │
       │ │

Phase 3: Meta request processing
┌─────────────┐ ┌─────────────┐
│ Slave │ │ Master │
└──────┬──────┘ └──────┬───────┘
       │ │
       │ 1. Send the DBSync request again │
       │ (bgsave completed after loop retry) │
       │────────────────────────────────>│
       │ │
       │ 2. Get file list
       │ (GetDumpMeta)
       │ Scan dump-20260305-0/db0
       │ Generate snapshot_uuid
       │ │
       │ 3. Check dump integrity
       │ 4. Check if it is occupied
       │ 5. Check concurrency limits
       │ 6. Mark dump as occupied
       │ (MarkDumpInUse)
       │ 7. Register snapshot
       │ (RegisterSnapshot)
       │ 8. Pre-register all documents
       │ (AddTransferringFile)
       │ │
       │ 9. Return Meta response │
       │ (snapshot_uuid + file list) │
       │ <────────────────────────────────│
       │ │

Stage 4: File transfer
┌─────────────┐ ┌─────────────┐
│ Slave │ │ Master │
└──────┬──────┘ └──────┬───────┘
       │ │
       │ 1. Multi-threaded file download │
       │ ──────────────────────────────> │
       │ │
       │ 2. Check if the file exists
       │ 3. The registration file is being transferred
       │ 4. Read file content
       │ 5. Cancellation file
       │ 6. If it is the last block (is_eof)
       │ Check whether it is an orphan file (nlink=1)
       │ If it is an orphan, join the delayed cleanup queue (10 minutes)
       │ │
       │ 7. Return file data │
       │ <────────────────────────────────│
       │ │
       │ (Repeat until all files are downloaded) │

Stage 5: Cleanup
┌─────────────┐ ┌─────────────┐
│ Slave │ │ Master │
└──────┬──────┘ └──────┬───────┘
       │ │
       │ 1. Download completed, close the connection │
       │───────X──────────────────────────>│
       │ │
       │ 2. The connection is disconnected and RsyncServerConn is destroyed
       │ 3. Release dump occupation
       │ (ReleaseDump)
       │ 4. Log out snapshot
       │ (UnregisterSnapshot)
       │ │
       │ 5. AutoDeleteExpiredDump scheduled execution
       │ Processing delayed cleanup queue (ProcessPendingCleanupFiles)
       │ Delete expired dump directory
       │ (Note: CleanupOrphanSstFiles has been removed and delayed cleanup will be processed uniformly)
       │ │

2.2 Master status changes

Stage	Status	Description
T0	no dump	initial state
T1	bgsaving	Create dump-20260305-0/db0
T2	dump available	bgsave completed, waiting for Meta request
T3	dump occupied	Meta request received, marked as occupied
T4	Transferring	File transfer in progress, instant cleanup in progress
T5	dump release	Slave disconnect, release occupation
T6	dump expired	AutoDeleteExpiredDump delete expired dump

2.3 Slave status changes

Stage	Status	Description
T0	kTryConnect	Try to connect to Master
T1	kWaitDBSync	Wait for Master bgsave to complete
T2	kWaitDBSync	Get file list and start downloading
T3	kWaitDBSync	File downloading
T4	kConnected	Full synchronization completed, incremental synchronization started

2.4 Data changes

Master disk usage changes:

Time point	Data directory	Dump directory	Total
Initial	100GB	0	100GB
bgsave medium	100GB	0 (hard links are not occupied)	100GB
After compaction	100GB	Some orphan files	100GB + orphan files
Transferring	100GB	100GB (dump)	200GB
Transfer completed	100GB	Orphan files will be cleaned with a 10-minute delay	100GB ~ 200GB

3. Multi-Slave synchronization process

3.1 Scene description

Master has 100GB data
Slave-1 initiates synchronization first
Slave-2 initiates synchronization during Slave-1 synchronization process

3.2 Process Timing

Timeline:
T0:
  Slave-1──DBSync──> Master
  Master: IsBgSaving? No
  Master: trigger BgSaveDB()
  Master: Create dump-20260305-0/db0
  Slave-1 <──kErr─── Master (waiting for bgsave)

T30s:
  Master: bgsave completed
  Slave-1──DBSync──> Master
  Master: Get file list (dump-0)
  Master: MarkDumpInUse(dump-0, Slave-1)
  Slave-1 <──File list── Master
  Slave-1 starts downloading...

T31s:
  Slave-2 ──DBSync──> Master
  Master: IsDumpInUse(dump-0)? Yes (occupied by Slave-1)
  Master: trigger new BgSaveDB()
  Master: Create dump-20260305-1/db0
  Slave-2 <──kErr─── Master (waiting for new bgsave)

T61s:
  Master: New bgsave completed
  Slave-2 ──DBSync──> Master
  Master: MarkDumpInUse(dump-1, Slave-2)
  Slave-2 <──File list── Master
  Slave-2 starts downloading...

T120s:
  Slave-1: Download completed, disconnect
  Master: ReleaseDump(dump-0)
  Master: Delete dump-0 (AutoDeleteExpiredDump)

T180s:
  Slave-2: Download completed, disconnect
  Master: ReleaseDump(dump-1)
  Master: Delete dump-1

3.3 Key limitations

Maximum number of concurrent dumps: 3 (kMaxConcurrentDumps = 3)
Limit exceeded: Return kErr, Slave tries again

4. Single Slave multiple DB synchronization process

4.1 Scene description

Master configures 3 DBs: db0, db1, db2
Each DB has an independent RocksDB instance (db-instance-num=3)
Slave synchronizes all DBs at the same time

4.2 Directory structure

dump/dump-20260305-0/
├── db0/
│ ├── 0/ # RocksDB instance 0
│ │ ├── 000001.sst
│ │ └── 000002.sst
│ ├── 1/ # RocksDB instance 1
│ │ └── 000003.sst
│ ├── 2/ # RocksDB instance 2
│ │ └── 000004.sst
│ └── info # dump meta information
├── db1/
│ ├── 0/
│ ├── 1/
│ ├── 2/
│ └── info
└── db2/
    ├── 0/
    ├── 1/
    ├── 2/
    └── info

4.3 File naming rules

Slave request format: {rocksdb_instance}/{filename}
Example: 0/000001.sst, 1/000003.sst
NOTE: Does not include db0/db1/db2 prefix

4.4 Synchronization process

Each DB is synchronized independently:

Slave sends a DBSync request for db0
Master returns the file list of db0
Slave downloads all files of db0
Repeat steps 1-3 for db1 and db2

4.5 Potential Problems

Problem: Info file location is inconsistent

AutoDeleteExpiredDump looks for: dump/dump-xxx/info
Actual location: dump/dump-xxx/db0/info

FIXED: Try db0/info first, then fall back to info

5. Multi-Slave multi-DB synchronization process

This is the most complex scenario of Scheme A, which combines the characteristics of multiple slaves and multiple DBs.

5.1 Scene description

Master: 3 DBs (db0, db1, db2)
Slave-1: synchronize db0, db1, db2
Slave-2: synchronize db0, db1, db2

5.2 Dump occupation mechanism

Option A Design: Each Slave exclusively owns the entire dump directory (including all DBs)

Slave-1 occupies dump-20260305-0:
├── db0 (transmitting)
├── db1 (transmitting)
└── db2 (transmitting)

Slave-2 occupies dump-20260305-1:
├── db0 (transmitting)
├── db1 (transmitting)
└── db2 (transmitting)

5.3 Occupancy check

Check granularity: entire dump directory
If one Slave is using dump-0, other Slave cannot use it
Trigger new bgsave to create dump-1

5.4 Potential issues

Question 1: DB level granularity vs Dump level granularity

Current design: dump level occupancy
If Slave-1 only synchronizes db0, dump-0 still cannot be used by Slave-2
Waste of disk space

Question 2: Orphan file cleaning for multiple DBs

AutoDeleteExpiredDump only checks db0/info
If db1 or db2 is still being transferred, it may be mistakenly determined to be cleanable

6. Orphan file cleaning mechanism (unified delayed cleaning)

6.1 Trigger conditions

Orphan file: SST file with nlink=1 (only referenced by dump, not by RocksDB)

Cause:

RocksDB compaction delete old SST
hard links in dump become orphaned

6.2 Unified cleaning strategy

Design change: Remove the CleanupOrphanSstFiles function and use delayed cleanup queue uniformly

New Cleanup Process:

1. When the file transfer is completed (RemoveTransferringFile)
   - Check is_eof=true (last block transfer completed)
   - stat check file nlink
   - If nlink=1 (orphan file):
     * Add delayed cleanup queue (ScheduleFileForCleanup, delay 600 seconds)
     * Log "Scheduled orphan file for cleanup"
   - If nlink=2 (not orphaned):
     * No processing, RocksDB will manage the life cycle

2. AutoDeleteExpiredDump is executed regularly (every 60 seconds)
   - Call ProcessPendingCleanupFiles()
   - Check the queue for expired files
   - Delete expired files and log "Deleted delayed cleanup file"
   - Also check and delete expired dump directories

6.3 Protection mechanism

Protection level	Description	Implementation location
In-transfer protection	Files in transfer will not be cleaned	`rsync_transferring_files_`
Delay protection	Orphan files are deleted with a 10-minute delay to give the Slave time to retry	`ScheduleFileForCleanup(filepath, 600)`
nlink check	Only clean up orphan files (nlink=1) to avoid accidental deletion	`stat` check

6.4 Timing description

T0: File transfer completed (is_eof=true)
  └─> RemoveTransferringFile check nlink==1
      └─> ScheduleFileForCleanup(filepath, 600) Add to queue
          └─> Log: "Scheduled orphan file for cleanup in 10min"

T0+10min: AutoDeleteExpiredDump scheduled execution
  └─> ProcessPendingCleanupFiles()
      └─> Check due documents
          └─> Delete files
              └─> Log: "Deleted delayed cleanup file"

6.5 Comparison: old solution vs new solution

Aspects	Old solution (CleanupOrphanSstFiles)	New solution (Unified Delayed Cleanup)
Trigger timing	Regularly scan all dump directories	Check immediately when transfer is completed
Cleaning delay	Undefined scan cycle	Fixed delay of 10 minutes
Race conditions	Possible competition with the transfer process	No competition, unified entry
Code complexity	~170 lines of independent functions	~15 lines of integrated logic
Slave retry	May fail (file has been deleted)	Can retry within 10 minutes

7. Bug List

7.1 Bugs to be fixed

Bug	Impact	Severity	Fix
Orphan file cleanup granularity issue in multi-DB scenarios	db1/db2 may be misjudged during transmission	Medium	Check the info files of all DBs
Disk waste of multiple Slaves and multiple DBs	Each Slave exclusively occupies the entire dump	Low	Supports DB level occupancy

8. To-do list

8.1 High priority

Unified orphan file cleaning mechanism (completed)
- Removed CleanupOrphanSstFiles function
- Use RemoveTransferringFile + delayed cleaning of the queue uniformly
- Delay for 10 minutes to give Slave time to retry
Fix multi-DB orphan file cleaning granularity issue
- Currently only checking db0/info
- Need to check all DB subdirectories
- If any DB is in use, the entire dump should be protected

8.2 Medium priority

Optimize the disk usage of multiple slaves and multiple DBs
- Currently: each Slave exclusively owns the entire dump
- Optimization: Support DB level occupancy
- Impact: The occupancy management logic needs to be modified
Improve monitoring indicators
- Dump occupied quantity
- Orphan file cleaning statistics
  -Transmission failure rate

8.3 Low priority

Support dynamic adjustment of concurrency limits
- Current: compile-time constant kMaxConcurrentDumps=3
- Optimization: Support configuration hot update
Dump compressed transmission
- Reduce network bandwidth
- Tradeoff between CPU and network

9. Configuration suggestions

9.1 Key configuration items

#pika.conf

# dump directory prefix
dump-prefix : dump-

# dump directory path
dump-path: ./dump/

# dump expiration time (days)
# 0 means never expires
dump-expire: 1

# Number of RocksDB instances
db-instance-num : 3

#Maximum number of concurrent dumps (configured at compile time)
#kMaxConcurrentDumps = 3

9.2 Deployment recommendations

Disk space: Reserve 3 × data amount space
Monitoring: Monitor the number of dump directories and disk usage
Log: Pay attention to [Rsync Meta], [RsyncTransfer], [Scheduled orphan file, [Deleted delayed cleanup logs

10. Appendix

10.1 Key logs

# View Meta request processing
grep "Rsync Meta" log/pika.INFO

# View file transfer
grep "RsyncTransfer" log/pika.INFO

# Check the orphan file delayed cleanup schedule
grep "Scheduled orphan file" log/pika.INFO

# View delayed cleanup execution
grep "Deleted delayed cleanup file" log/pika.INFO

# Check dump occupancy
grep "DumpOwnership" log/pika.INFO

# View errors
grep "File no longer exists" log/pika.WARNING

10.2 Status code description

Status code	Meaning	Processing
kOk	Success	Continue processing
kErr	Error	Slave retry

10.3 File path specification

Type	Format	Example
Dump directory	dump-YYYYMMDD-NN/db_name	dump-20260305-0/db0
RocksDB instance	{rocksdb_instance}/	0/, 1/, 2/
SST file	{instance}/{filename}.sst	0/000001.sst
Info file	db_name/info	db0/info

Copilot

Pull request overview

This PR targets multiple stability issues in storage backup/rsync full-sync flows (orphan SST files, dump lifecycle/cleanup) and also adjusts some RocksDB-related defaults and build/test toggles.

Changes:

Add a “get checkpoint files + immediately create checkpoint” flow to reduce the compaction window that can produce orphan SSTs during bgsave.
Introduce rsync snapshot/dump ownership tracking plus delayed orphan-file cleanup to avoid deleting files still needed by syncing slaves.
Make test building optional via BUILD_TESTS, and adjust several config/default behaviors (rate limiter, wash-data, etc.).

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
`src/storage/src/backupable.cc`	Adds `SetBackupContentAndCreate` to minimize the gap between live-file listing and checkpoint creation.
`src/storage/include/storage/backupable.h`	Declares the new `SetBackupContentAndCreate` API.
`src/storage/include/storage/storage_define.h`	Fixes pointer advancement in `EncodeUserKey` when `\x00` exists.
`src/storage/CMakeLists.txt`	Makes storage tests conditional on `BUILD_TESTS`.
`src/rsync_server.cc`	Adds dump reservation, integrity checks, snapshot/file transfer tracking, and delayed orphan cleanup scheduling.
`include/rsync_server.h`	Exposes snapshot/file tracking APIs and adds per-connection tracking state.
`src/rsync_client.cc`	Adds retry/error behavior for missing files and adjusts local tracking update logic.
`src/pstd/src/env.cc`	Wraps `GetChildren` with exception handling and logging for filesystem errors.
`src/pstd/CMakeLists.txt`	Makes pstd tests conditional on `BUILD_TESTS`.
`src/pika_server.cc`	Implements global rsync snapshot/file tracking, dump ownership, delayed cleanup processing, and dump cleanup policy updates.
`include/pika_server.h`	Declares new dump ownership / rsync tracking / delayed cleanup APIs and state.
`src/pika_db.cc`	Uses the new immediate-checkpoint backup path and introduces unique dump directory naming with sequence suffixes.
`src/pika_rm.cc`	Rate-limits rsync retry logs.
`src/pika_conf.cc`	Fixes integer literal width for rate limiter default bandwidth.
`src/pika_command.cc`	Adjusts `HLEN` command flags to avoid cache/TTL anomalies.
`conf/pika.conf`	Changes sample/default settings for rate limiter and wash-data.
`CMakeLists.txt`	Adds `BUILD_TESTS` option, gates `enable_testing()`, and adjusts build version source inclusion.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copilot · 2026-03-06T09:17:30Z

src/rsync_server.cc

+  if (!snapshot_uuid_.empty() && !filename.empty()) {
+    std::lock_guard<std::mutex> guard(mu_);
+    transferring_files_.insert(filename);
+    g_pika_server->RegisterRsyncTransferringFile(snapshot_uuid_, filename);
+  }
+}
+
+void RsyncServerConn::RemoveTransferringFile(const std::string& filename, bool is_eof) {
+  if (!snapshot_uuid_.empty() && !filename.empty()) {
+    std::lock_guard<std::mutex> guard(mu_);
+    transferring_files_.erase(filename);
+    g_pika_server->UnregisterRsyncTransferringFile(snapshot_uuid_, filename);
+
+    // Only process cleanup when file transfer is complete (is_eof=true)
+    if (is_eof) {
+      std::string dump_path = g_pika_server->GetDumpPathBySnapshot(snapshot_uuid_);
+      std::string filepath = dump_path + "/" + filename;
+
+      // Check if file is orphan (nlink=1, only referenced by dump, not by db)
+      struct stat st;
+      if (stat(filepath.c_str(), &st) == 0 && st.st_nlink == 1) {
+        // Orphan file: schedule for delayed cleanup (10 minutes)
+        // This allows Slave to retry if needed before actual deletion
+        g_pika_server->ScheduleFileForCleanup(filepath, 600);
+        LOG(INFO) << "[RsyncTransfer] Scheduled orphan file for cleanup: " << filename
+                  << " for snapshot: " << snapshot_uuid_;
+      }
+      // Non-orphan files (nlink=2) are still referenced by RocksDB, no cleanup needed
+    }


AddTransferringFile/RemoveTransferringFile hold mu_ while calling into g_pika_server and doing filesystem work (stat, path building, scheduling cleanup). This can unnecessarily block parallel rsync reads and increases deadlock risk due to lock-ordering across different mutexes. Limit mu_ to only protecting transferring_files_ (and snapshot state), then perform server calls and stat/cleanup decisions after releasing the lock using local copies of snapshot_uuid_/dump_path.

Suggested change

if (!snapshot_uuid_.empty() && !filename.empty()) {

std::lock_guard<std::mutex> guard(mu_);

transferring_files_.insert(filename);

g_pika_server->RegisterRsyncTransferringFile(snapshot_uuid_, filename);

}

}

void RsyncServerConn::RemoveTransferringFile(const std::string& filename, bool is_eof) {

if (!snapshot_uuid_.empty() && !filename.empty()) {

std::lock_guard<std::mutex> guard(mu_);

transferring_files_.erase(filename);

g_pika_server->UnregisterRsyncTransferringFile(snapshot_uuid_, filename);

// Only process cleanup when file transfer is complete (is_eof=true)

if (is_eof) {

std::string dump_path = g_pika_server->GetDumpPathBySnapshot(snapshot_uuid_);

std::string filepath = dump_path + "/" + filename;

// Check if file is orphan (nlink=1, only referenced by dump, not by db)

struct stat st;

if (stat(filepath.c_str(), &st) == 0 && st.st_nlink == 1) {

// Orphan file: schedule for delayed cleanup (10 minutes)

// This allows Slave to retry if needed before actual deletion

g_pika_server->ScheduleFileForCleanup(filepath, 600);

LOG(INFO) << "[RsyncTransfer] Scheduled orphan file for cleanup: " << filename

<< " for snapshot: " << snapshot_uuid_;

}

// Non-orphan files (nlink=2) are still referenced by RocksDB, no cleanup needed

}

if (filename.empty()) {

return;

}

std::string snapshot_uuid_copy;

{

std::lock_guard<std::mutex> guard(mu_);

snapshot_uuid_copy = snapshot_uuid_;

if (!snapshot_uuid_copy.empty()) {

transferring_files_.insert(filename);

}

}

if (!snapshot_uuid_copy.empty()) {

g_pika_server->RegisterRsyncTransferringFile(snapshot_uuid_copy, filename);

}

}

void RsyncServerConn::RemoveTransferringFile(const std::string& filename, bool is_eof) {

if (filename.empty()) {

return;

}

std::string snapshot_uuid_copy;

{

std::lock_guard<std::mutex> guard(mu_);

snapshot_uuid_copy = snapshot_uuid_;

if (!snapshot_uuid_copy.empty()) {

transferring_files_.erase(filename);

}

}

if (snapshot_uuid_copy.empty()) {

return;

}

g_pika_server->UnregisterRsyncTransferringFile(snapshot_uuid_copy, filename);

// Only process cleanup when file transfer is complete (is_eof=true)

if (is_eof) {

std::string dump_path = g_pika_server->GetDumpPathBySnapshot(snapshot_uuid_copy);

std::string filepath = dump_path + "/" + filename;

// Check if file is orphan (nlink=1, only referenced by dump, not by db)

struct stat st;

if (stat(filepath.c_str(), &st) == 0 && st.st_nlink == 1) {

// Orphan file: schedule for delayed cleanup (10 minutes)

// This allows Slave to retry if needed before actual deletion

g_pika_server->ScheduleFileForCleanup(filepath, 600);

LOG(INFO) << "[RsyncTransfer] Scheduled orphan file for cleanup: " << filename

<< " for snapshot: " << snapshot_uuid_copy;

}

// Non-orphan files (nlink=2) are still referenced by RocksDB, no cleanup needed

Copilot · 2026-03-06T09:17:30Z

src/rsync_server.cc

+    // Compare filenames (from GetDumpMeta) with actual_files (re-scanned)
+    std::vector<std::string> missing_files;
+    for (const auto& expected : filenames) {
+      bool found = false;
+      for (const auto& actual : actual_files) {
+        if (actual == expected) {
+          found = true;
+          break;
+        }
+      }
+      if (!found && expected != "info") {  // info file is handled separately
+        missing_files.push_back(expected);
+      }
+    }


The second dump integrity check does an O(N*M) nested loop to match filenames against actual_files, which can become expensive with large snapshots. Convert actual_files to a hash set (or sort both vectors and do a linear merge) so the missing-file check is O(N).

Copilot · 2026-03-06T09:17:30Z

src/rsync_client.cc

+      // If Master returns empty response, it means the file was cleaned up during sync
+      // This is an error condition - the sync should fail and retry
+      if (ret_count == 0 && resp->file_resp().eof()) {
+        LOG(ERROR) << "File not available on Master at offset " << offset
+                   << ", filename: " << filename
+                   << ". Will retry and may trigger new bgsave.";
+        s = Status::IOError("File not available on Master");
+        return s;
+      }
+


Treating ret_count == 0 && eof == true as an error will break valid zero-length files: the first (and only) read would legitimately return 0 bytes with EOF set, but this code returns an IOError. Missing files are already signaled by resp->code() != kOk (and the server now explicitly returns kErr when the file is gone), so this check should be removed or tightened (e.g., only error when ret_count == 0 && eof == true AND offset > 0, or when an explicit server-side error is returned).

Suggested change

// If Master returns empty response, it means the file was cleaned up during sync

// This is an error condition - the sync should fail and retry

if (ret_count == 0 && resp->file_resp().eof()) {

LOG(ERROR) << "File not available on Master at offset " << offset

<< ", filename: " << filename

<< ". Will retry and may trigger new bgsave.";

s = Status::IOError("File not available on Master");

return s;

}

Copilot · 2026-03-06T09:17:31Z

src/pika_server.cc

+// Utility function to ensure directory exists
+// Returns true if directory exists or was created successfully
+// Handles the special case where CreatePath returns 0 for both success and "already exists"
+bool PikaServer::EnsureDirExists(const std::string& path, mode_t mode) {
+  // First check if directory already exists
+  if (pstd::FileExists(path)) {
+    return true;
+  }
+  // Directory doesn't exist, try to create it
+  int ret = pstd::CreatePath(path, mode);
+  // CreatePath returns 0 on success, -1 on failure
+  // Note: CreatePath also returns -1 if directory already exists (due to the
+  // !filesystem::create_directories check), but we already checked FileExists above
+  if (ret != 0) {
+    LOG(WARNING) << "Failed to create directory: " << path << ", error code: " << ret;
+    return false;
+  }
+  // Verify directory was created successfully
+  return pstd::FileExists(path);


EnsureDirExists returns true for any existing path via pstd::FileExists(path), even if path is a regular file. Callers assume this guarantees a directory exists, so this can lead to later failures when treating a file as a directory. Use pstd::IsDir(path) == 0 for the fast path (and consider updating the comment above as it currently describes CreatePath behavior inaccurately).

Suggested change

// Utility function to ensure directory exists

// Returns true if directory exists or was created successfully

// Handles the special case where CreatePath returns 0 for both success and "already exists"

bool PikaServer::EnsureDirExists(const std::string& path, mode_t mode) {

// First check if directory already exists

if (pstd::FileExists(path)) {

return true;

}

// Directory doesn't exist, try to create it

int ret = pstd::CreatePath(path, mode);

// CreatePath returns 0 on success, -1 on failure

// Note: CreatePath also returns -1 if directory already exists (due to the

// !filesystem::create_directories check), but we already checked FileExists above

if (ret != 0) {

LOG(WARNING) << "Failed to create directory: " << path << ", error code: " << ret;

return false;

}

// Verify directory was created successfully

return pstd::FileExists(path);

// Utility function to ensure a directory exists at the given path.

// Returns true if the path already refers to a directory, or if a directory

// was created (or found to already exist) successfully.

// Note: pstd::CreatePath returns 0 both when the directory is created and

// when it already exists, and a non-zero value on failure.

bool PikaServer::EnsureDirExists(const std::string& path, mode_t mode) {

// Fast path: check if a directory already exists at the given path.

if (pstd::IsDir(path) == 0) {

return true;

}

// Directory doesn't exist, try to create it.

int ret = pstd::CreatePath(path, mode);

if (ret != 0) {

LOG(WARNING) << "Failed to create directory: " << path << ", error code: " << ret;

return false;

}

// Verify that the resulting path is indeed a directory.

return pstd::IsDir(path) == 0;

Copilot · 2026-03-06T09:17:31Z

src/pika_server.cc

+void PikaServer::ProcessPendingCleanupFiles() {
+  std::lock_guard<std::mutex> lock(pending_cleanup_mutex_);
+  time_t now = time(nullptr);
+  int cleaned_count = 0;
+
+  for (auto it = pending_cleanup_files_.begin();
+       it != pending_cleanup_files_.end(); ) {
+    if (now >= it->second.cleanup_time) {
+      // Check if file still exists and is still an orphan (nlink=1)
+      if (pstd::FileExists(it->second.filepath)) {
+        struct stat st;
+        if (stat(it->second.filepath.c_str(), &st) == 0 && st.st_nlink == 1) {
+          pstd::DeleteFile(it->second.filepath);
+          cleaned_count++;
+          LOG(INFO) << "[Cleanup] Deleted delayed cleanup file: " << it->second.filepath;


ProcessPendingCleanupFiles holds pending_cleanup_mutex_ while doing potentially slow I/O (FileExists, stat, DeleteFile) and logging. This can block concurrent scheduling of new cleanup entries and prolong lock hold times during disk stalls. Collect the expired filepaths under the mutex, erase them from the map, then release the lock and perform stat/deletes/logging outside the critical section.

Copilot · 2026-03-06T09:17:31Z

conf/pika.conf

@@ -505,7 +505,7 @@ default-slot-num : 1024
 # if auto_tuned is true: Enables dynamic adjustment of rate limit within the range
 #`[rate-limiter-bandwidth / 20, rate-limiter-bandwidth]`, according to the recent demand for background I/O.
 # rate limiter auto tune https://rocksdb.org/blog/2017/12/18/17-auto-tuned-rate-limiter.html. the default value is true.
-#rate-limiter-auto-tuned : yes
+rate-limiter-auto-tuned : no


The comments around the rate limiter settings no longer match the effective defaults in this file (e.g., it now enforces a non-default rate-limiter-bandwidth and sets rate-limiter-auto-tuned to no while the comment says the default is true/no-limit). Please update the surrounding comments to reflect the new defaults, or keep these lines commented-out if they are meant as optional overrides rather than shipped defaults.

Copilot · 2026-03-06T09:17:31Z

conf/pika.conf

@@ -673,7 +673,7 @@ internal-used-unfinished-full-sync :
 # for wash data from 4.0.0 to 4.0.1
 # https://github.com/OpenAtomFoundation/pika/issues/2886
 # default value: true


The comment says wash-data default value is true, but the config now sets wash-data: false. Update the comment to match the new default (or keep the setting commented-out if this is only intended as an example override).

Suggested change

# default value: true

# default value: false

Copilot · 2026-03-06T09:17:32Z

src/rsync_server.cc

+  {
+    std::lock_guard<std::mutex> guard(mu_);
+    for (int i = 0; i < readers_.size(); i++) {
+      readers_[i].reset();
+    }
+  }
+  // Release dump ownership when connection closes (Scheme A)
+  if (!snapshot_uuid_.empty()) {
+    LOG(INFO) << "[RsyncServerConn] Connection " << conn_id_ << " closing, releasing dump " << snapshot_uuid_;
+    g_pika_server->ReleaseDump(snapshot_uuid_);
+  }
+  // Unregister snapshot when connection closes (outside of mu_ lock)
+  UnregisterSnapshot();
+}
+
+void RsyncServerConn::RegisterSnapshot(const std::string& snapshot_uuid) {
+  if (!snapshot_uuid.empty() && snapshot_uuid_ != snapshot_uuid) {
+    // Unregister old snapshot if different
+    if (!snapshot_uuid_.empty()) {
+      UnregisterSnapshot();
+    }
+    snapshot_uuid_ = snapshot_uuid;
+    g_pika_server->RegisterRsyncSnapshot(snapshot_uuid_);
+  }
+}
+
+void RsyncServerConn::UnregisterSnapshot() {
+  if (!snapshot_uuid_.empty()) {
+    // Clear any remaining transferring files
+    std::set<std::string> remaining_files;
+    {
+      std::lock_guard<std::mutex> guard(mu_);
+      remaining_files = transferring_files_;
+      transferring_files_.clear();
+    }
+    for (const auto& file : remaining_files) {
+      g_pika_server->UnregisterRsyncTransferringFile(snapshot_uuid_, file);
+    }
+    g_pika_server->UnregisterRsyncSnapshot(snapshot_uuid_);
+    snapshot_uuid_.clear();
+  }


snapshot_uuid_ is read/written without holding mu_ (e.g., comparing/assigning in RegisterSnapshot, and checked in Add/RemoveTransferringFile). Since meta/file requests are scheduled on a thread pool, these methods can run concurrently on the same connection, which makes snapshot_uuid_ a data race (UB) on std::string. Guard all accesses to snapshot_uuid_ with a mutex (e.g., reuse mu_ or add a dedicated mutex) and avoid calling UnregisterSnapshot() while holding the same lock (copy the old uuid first, then unregister outside the lock).

Suggested change

{

std::lock_guard<std::mutex> guard(mu_);

for (int i = 0; i < readers_.size(); i++) {

readers_[i].reset();

}

}

// Release dump ownership when connection closes (Scheme A)

if (!snapshot_uuid_.empty()) {

LOG(INFO) << "[RsyncServerConn] Connection " << conn_id_ << " closing, releasing dump " << snapshot_uuid_;

g_pika_server->ReleaseDump(snapshot_uuid_);

}

// Unregister snapshot when connection closes (outside of mu_ lock)

UnregisterSnapshot();

}

void RsyncServerConn::RegisterSnapshot(const std::string& snapshot_uuid) {

if (!snapshot_uuid.empty() && snapshot_uuid_ != snapshot_uuid) {

// Unregister old snapshot if different

if (!snapshot_uuid_.empty()) {

UnregisterSnapshot();

}

snapshot_uuid_ = snapshot_uuid;

g_pika_server->RegisterRsyncSnapshot(snapshot_uuid_);

}

}

void RsyncServerConn::UnregisterSnapshot() {

if (!snapshot_uuid_.empty()) {

// Clear any remaining transferring files

std::set<std::string> remaining_files;

{

std::lock_guard<std::mutex> guard(mu_);

remaining_files = transferring_files_;

transferring_files_.clear();

}

for (const auto& file : remaining_files) {

g_pika_server->UnregisterRsyncTransferringFile(snapshot_uuid_, file);

}

g_pika_server->UnregisterRsyncSnapshot(snapshot_uuid_);

snapshot_uuid_.clear();

}

std::string snapshot_uuid_copy;

{

std::lock_guard<std::mutex> guard(mu_);

for (int i = 0; i < readers_.size(); i++) {

readers_[i].reset();

}

snapshot_uuid_copy = snapshot_uuid_;

}

// Release dump ownership when connection closes (Scheme A)

if (!snapshot_uuid_copy.empty()) {

LOG(INFO) << "[RsyncServerConn] Connection " << conn_id_ << " closing, releasing dump " << snapshot_uuid_copy;

g_pika_server->ReleaseDump(snapshot_uuid_copy);

}

// Unregister snapshot when connection closes (UnregisterSnapshot handles locking)

UnregisterSnapshot();

}

void RsyncServerConn::RegisterSnapshot(const std::string& snapshot_uuid) {

if (snapshot_uuid.empty()) {

return;

}

// Take a snapshot of the current uuid under lock

std::string current_snapshot_uuid;

{

std::lock_guard<std::mutex> guard(mu_);

current_snapshot_uuid = snapshot_uuid_;

}

if (current_snapshot_uuid == snapshot_uuid) {

return;

}

// Unregister old snapshot (if any) outside the lock

if (!current_snapshot_uuid.empty()) {

UnregisterSnapshot();

}

// Set the new snapshot uuid under lock

{

std::lock_guard<std::mutex> guard(mu_);

snapshot_uuid_ = snapshot_uuid;

}

// Register new snapshot using the provided uuid

g_pika_server->RegisterRsyncSnapshot(snapshot_uuid);

}

void RsyncServerConn::UnregisterSnapshot() {

// Copy state under lock, then perform unregister operations outside the lock

std::string snapshot_uuid_copy;

std::set<std::string> remaining_files;

{

std::lock_guard<std::mutex> guard(mu_);

if (snapshot_uuid_.empty()) {

return;

}

snapshot_uuid_copy = snapshot_uuid_;

remaining_files = transferring_files_;

transferring_files_.clear();

snapshot_uuid_.clear();

}

// Clear any remaining transferring files for this snapshot

for (const auto& file : remaining_files) {

g_pika_server->UnregisterRsyncTransferringFile(snapshot_uuid_copy, file);

}

g_pika_server->UnregisterRsyncSnapshot(snapshot_uuid_copy);

在 RemoveTransferringFile 中增加文件类型检查，只有 .sst 后缀的文件才可能是孤儿文件（硬链接）。其他文件如 info、CURRENT、MANIFEST、OPTIONS、.log 等都是在 dump 过程中新生成的， nlink=1 是正常情况，不应被清理。修复前：info 文件被错误识别为孤儿文件并调度清理，导致 dump 目录失去保护机制修复后：只有 SST 文件会进入孤儿文件检查和延迟清理流程

coderabbitai · 2026-03-09T09:12:24Z

📝 Walkthrough

Walkthrough

This PR extends Rsync snapshot and dump lifecycle management in Pika by introducing tracking APIs for active rsync snapshots and transferring files, dump ownership management with concurrent limits, delayed file cleanup mechanisms, and enhanced backup creation logic. Additionally, it refines CMake build configuration, adjusts default settings, and improves error handling in file operations.

Changes

Cohort / File(s)	Summary
Build System Configuration `CMakeLists.txt`, `src/pstd/CMakeLists.txt`, `src/storage/CMakeLists.txt`	Added `BUILD_TESTS` CMake option to conditionally enable/disable test compilation. Removed `src/build_version.cc` from source collection and simplified `PIKA_BUILD_VERSION_CC` variable declaration.
Configuration Defaults `conf/pika.conf`	Updated default values: rate-limiter-bandwidth reduced from 1099511627776 to 109951162, rate-limiter-auto-tuned changed from yes to no, wash-data changed from true to false.
Rsync Server Connection Tracking `include/rsync_server.h`, `src/rsync_server.cc`	Added snapshot lifecycle management (RegisterSnapshot, UnregisterSnapshot, GetSnapshotUuid) and per-connection file transfer tracking (AddTransferringFile, RemoveTransferringFile, IsFileTransferring, GetTransferringFiles). Introduced global transfer state queries and enhanced HandleMetaRsyncRequest/HandleFileRsyncRequest with dump ownership, integrity checks, and cleanup safeguards.
PikaServer Dump & Rsync Management APIs `include/pika_server.h`, `src/pika_server.cc`	Added comprehensive rsync snapshot tracking (RegisterRsyncSnapshot, UnregisterRsyncSnapshot, IsRsyncSnapshotActive, GetActiveRsyncSnapshots), per-snapshot transfer file tracking, dump ownership management with max concurrent limits (MarkDumpInUse, ReleaseDump, IsDumpInUse, GetDumpPathBySnapshot, GetActiveDumpCount), delayed cleanup scheduling (ScheduleFileForCleanup, ProcessPendingCleanupFiles), and utility method EnsureDirExists. Includes thread-safe state structures with corresponding mutexes and data holders.
Database Bgsave & Dump Management `src/pika_db.cc`	Replaced backup creation with SetBackupContentAndCreate for tighter timing. Implemented new daily dump directory naming scheme (dump-YYYYMMDDNN) with sequence number allocation. Enhanced GetBgSaveMetaData to scan subdirectories, detect orphan SST files (st_nlink == 1), and validate file existence with logging.
Backup Engine Enhancement `src/storage/include/storage/backupable.h`, `src/storage/src/backupable.cc`	Added new public method SetBackupContentAndCreate to perform content gathering and immediate checkpoint creation in a single operation, reducing the time window between GetLiveFiles and checkpoint creation.
Rsync Client Error Handling `src/rsync_client.cc`	Added error condition for empty master response (ret_count == 0 and eof) indicating file unavailability. Modified ComparisonUpdate to replace file_set_ with remote_file_set to align with master state and handle deletions correctly.
Command & Configuration `src/pika_command.cc`, `src/pika_conf.cc`	Removed kCmdFlagsUpdateCache flag from HLenCmd initialization. Changed rate-limiter bandwidth default to 64-bit literal (10LL \* 1024 \* 1024 \* 1024) to prevent overflow.
Rsync Activation & File Operations `src/pika_rm.cc`, `src/pstd/src/env.cc`	Added rate-limited logging (once per 30 seconds per database) for rsync activation. Enhanced GetChildren with try/catch for filesystem_error, returning -1 on failure.
Storage Key Encoding `src/storage/include/storage/storage_define.h`	Fixed EncodeUserKey to advance dst_ptr by remaining length after copying trailing bytes with zero byte, preventing pointer misalignment before delimiter copy.

Sequence Diagram(s)

sequenceDiagram
    participant Client as RsyncClient
    participant Conn as RsyncServerConn
    participant Server as PikaServer
    participant Storage as Storage Layer
    participant Cleanup as Cleanup Worker

    Client->>Conn: HandleMetaRsyncRequest(snapshot)
    Conn->>Server: MarkDumpInUse(snapshot_uuid, conn_id, path)
    Server->>Server: Verify dump ownership & concurrent limits
    Server-->>Conn: Ownership acquired
    Conn->>Conn: RegisterSnapshot(snapshot_uuid)
    Conn->>Conn: Pre-register all files as transferring
    Conn-->>Client: Meta info + file list

    Client->>Conn: HandleFileRsyncRequest(filename)
    Conn->>Conn: AddTransferringFile(filename)
    Conn->>Storage: Read file content
    Storage-->>Conn: File data
    Conn->>Conn: RemoveTransferringFile(filename, is_eof=true)
    Conn->>Server: ScheduleFileForCleanup(if orphan)
    Conn-->>Client: File data

    Client->>Conn: Transfer complete
    Conn->>Server: ReleaseDump(snapshot_uuid)
    Conn->>Conn: UnregisterSnapshot()
    Server-->>Conn: Dump released

    Cleanup->>Server: ProcessPendingCleanupFiles()
    Server->>Storage: Delete orphan SST files
    Storage-->>Server: Cleanup complete
    Server->>Server: Log cleanup summary

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

fix: unsigned int overflow when master-slave sync data #2746: Modifies rsync client/server synchronization logic and dump handling flows with overlapping changes in rsync_client.cc and server sync integration.
feat:add migrate for 4.0 #3180: Extends rsync client/server and dump lifecycle management with connection/transfer tracking and HandleMeta/HandleFile flow enhancements.

Suggested labels

🧹 Updates

Suggested reviewers

wangshao1
chejinge
Mixficsol

Poem

🐰 A snapshot here, a dump tracked there,
With transfers safe and cleanup care!
Rsync now knows what's in the air,
Files move freely, orphans—beware!
The rabbit hops through schemes with flair! 🌿

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 10.71% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title clearly describes the main feature: adjustment of dump cleanup mechanism for master-slave synchronization to optimize storage space usage. This directly aligns with the primary objective of implementing Scheme A for orphan file cleanup.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 10

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

src/rsync_client.cc (1)
323-332: ⚠️ Potential issue | 🟠 Major

Keep file_set_ as the delta to preserve resume.

expired_files already handles master-side deletions. Replacing newly_files with remote_file_set forces every retry to re-copy files that are already complete locally, which defeats interrupted full-sync resume and amplifies IO.
🛠️ Suggested fix
-    // Replace file_set_ with remote_file_set to ensure files deleted on Master
-    // are also removed from local tracking
-    file_set_ = remote_file_set;
+    file_set_ = newly_files;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/rsync_client.cc` around lines 323 - 332, The current code replaces
file_set_ with remote_file_set which forces re-copy of all remote files on
retry; instead keep file_set_ as the delta so resume works: compute newly_files
(difference of remote_file_set and local_file_set) and assign file_set_ =
newly_files, leaving expired_files handling deletions; update the assignment
that currently sets file_set_ = remote_file_set to set file_set_ to the
newly_files set (using the existing newly_files variable) so only missing files
are retried.
src/rsync_server.cc (2)
437-456: ⚠️ Potential issue | 🔴 Critical

Serve file chunks from the connection-reserved snapshot.

The meta phase stores snapshot ownership on conn, but this handler ignores it and re-fetches the current GetDumpUUID(db_name) / db->bgsave_info().path on every file request. Once another slave triggers a newer bgsave, this connection starts reading the new dump mid-transfer and the client aborts on the snapshot UUID mismatch.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/rsync_server.cc` around lines 437 - 456, The handler currently re-queries
GetDumpUUID and db->bgsave_info().path causing it to switch snapshots
mid-transfer; instead read the snapshot UUID and bgsave path from the
connection's reserved snapshot metadata set during the meta phase (e.g. use the
connection's saved snapshot UUID and saved bgsave path fields/methods) and use
those values for response.set_snapshot_uuid(...) and filepath construction; if
the connection has no reserved snapshot metadata, return an error as before.
Update the section that calls g_pika_server->GetDumpUUID(db_name) and
g_pika_server->GetDB(db_name) to use conn's snapshot ownership fields (and keep
the existing error handling if the reserved snapshot is missing).
236-286: ⚠️ Potential issue | 🔴 Critical

Bind dump_path to the same snapshot returned by GetDumpMeta().

GetDumpMeta() gives this handler a specific snapshot_uuid and file list, but the code re-reads db_ptr->bgsave_info().path before the integrity checks and again when claiming ownership. In the concurrent full-sync flow this PR introduces, a newer bgsave can flip that path in between, so Line 273 can delete the wrong directory and Line 372 can associate that wrong path with the older snapshot_uuid. Please fetch {snapshot_uuid, dump_path, filenames} atomically and use that stable path throughout.

Also applies to: 370-372
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/rsync_server.cc` around lines 236 - 286, GetDumpMeta returns
snapshot_uuid and filenames but code later re-reads db_ptr->bgsave_info().path
which can race with concurrent bgsave; capture the dump_path atomically right
after GetDumpMeta (e.g., read db_ptr->bgsave_info().path into a local dump_path
variable immediately after obtaining db_ptr and snapshot_uuid) and use that
stored dump_path for the integrity checks, pstd::DeleteDirIfExist, and when
setting the response/claiming ownership instead of re-reading
db_ptr->bgsave_info().path; ensure all references (the integrity loop that
builds full_path, the LOG messages, the delete call
pstd::DeleteDirIfExist(dump_path), and the final assignment to dump_path used by
the connection) use this stable local variable so the path cannot flip between
checks and deletion.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@conf/pika.conf`:
- Around line 673-676: The config currently sets wash-data: false which prevents
the WashData() migration from running during upgrades; change the default to
wash-data: true so WashData() runs automatically on first startup after
upgrading from 4.0.0, and add a clear comment next to the wash-data entry
instructing operators to set wash-data: false after the first successful startup
(or include alternative upgrade docs); specifically update the wash-data default
and the adjacent comment block referenced by WashData() to reflect this
behavior.

In `@include/pika_server.h`:
- Around line 246-247: PendingCleanupInfo currently only holds filepath so the
delayed-cleanup queue can't determine transfer identity (snapshot_uuid +
filename); modify PendingCleanupInfo to include snapshot_uuid and filename (or a
single transfer_id composed of them), update ScheduleFileForCleanup to accept
snapshot_uuid and filename (in addition to filepath/delay_seconds) and ensure
ProcessPendingCleanupFiles uses the snapshot_uuid+filename identity when
checking whether a file became active again before deletion; touch all uses of
ScheduleFileForCleanup, ProcessPendingCleanupFiles, and any queue logic to
push/pop the updated PendingCleanupInfo structure so cleanup decisions use the
stable transfer identity.

In `@src/pika_db.cc`:
- Around line 344-352: The loop currently checks existence of base_path +
time_sub_path + "/" + bgsave_sub_path_, which only reserves per-DB subpaths and
allows different DBs to reuse the same dump root; change the check to reserve
the dump root itself by testing base_path + time_sub_path (the dump-YYYYMMDD-N
directory) instead of including bgsave_sub_path_. Update the construction of
full_path used in the pstd::FileExists call (and any other place that assumes
the checked path) to point at the dump root (using time_sub_path and base_path)
so each sequence number is exclusively reserved for the entire dump root.
- Around line 431-461: GetBgSaveMetaData must not silently drop files on scan
errors; change its signature to return a Status (e.g., Status
GetBgSaveMetaData(...)) and replace the current silent continue/return behavior
so that any failure from pstd::GetChildren, pstd::IsDir (when it returns -1),
pstd::FileExists checks that indicate unexpected state, or stat(fullPath) != 0
returns a non-OK Status describing the problem. Update callers (the code path
that currently treats GetBgSaveMetaData as void and later returns OK to rsync)
to inspect and propagate the Status so rsync/replication will retry instead of
accepting a truncated manifest. Ensure you reference and update uses of
GetBgSaveMetaData, and keep logging but return error Status on any
directory/file scan failure.
- Around line 345-356: The loop increments seq before checking the 1000-limit,
so the final slot (-999) can be skipped; update the search to stop after
checking up to 1000 candidates by iterating with seq from 0 to <1000 and
breaking when pstd::FileExists(full_path) is false (e.g., replace the do/while
with a for loop or move seq++ to after the FileExists check), ensuring
time_sub_path/full_path are constructed using seq and using bgsave_sub_path_ and
db_name_ as before so the final slot is considered and the seq >= 1000 check
correctly indicates exhaustion.

In `@src/pika_server.cc`:
- Around line 1618-1627: The code currently derives snapshot_uuid using only
db0/info (info_path) and legacy /info, which risks treating multi-DB dumps as
inactive; update the logic around snapshot_uuid/dump_file to scan all db*/info
files under the dump directory (e.g., iterate directory entries matching "db*"
and read each info file to collect snapshot_uuid or any non-empty value) and
only treat the dump as inactive if no dbN/info yields a snapshot UUID;
additionally, if no snapshot_uuid is found, consult dump_owners_ by dump_path as
a fallback before allowing deletion to ensure active dumps tracked in
dump_owners_ are protected (apply same change to the other similar blocks
referencing snapshot_uuid/info_path at the indicated spots).
- Around line 2067-2082: EnsureDirExists currently uses pstd::FileExists which
returns true for regular files and does a racy pre-check; change it to verify
the path is a directory (use an is-directory check such as pstd::IsDirectory or
stat+S_ISDIR) instead of FileExists, avoid the race by not returning success on
the initial FileExists check (or always attempt CreatePath and then re-check
that the path is a directory), and after pstd::CreatePath handle the -1 case by
re-checking whether the path now exists and is a directory (treat that as
success) versus a real failure, referencing EnsureDirExists, pstd::FileExists,
and pstd::CreatePath in your changes.

In `@src/pstd/src/env.cc`:
- Around line 136-138: GetChildren currently returns -1 when
filesystem::is_empty(dir), which signals an error to callers like the Pika
scan/cleanup code; change the behavior so that when the directory exists but is
empty you leave the output container (result) empty and return 0, reserving -1
only for real filesystem errors. In short: in GetChildren, remove/replace the
early return -1 on filesystem::is_empty(dir) with logic that returns 0 and an
empty result, so callers of GetChildren see success with no children rather than
an error.

In `@src/rsync_client.cc`:
- Around line 242-250: The current check in rsync_client.cc incorrectly treats a
valid zero-byte file as missing when ret_count == 0 && resp->file_resp().eof();
change the logic so we only consider the file missing when the RPC response
indicates an error (response.code() != kOk or resp->code() != kOk) rather than
when count==0+eof; update the conditional around the LOG(ERROR) /
Status::IOError("File not available on Master") and return to only trigger on a
non-OK response code while allowing the kOk + eof + count==0 case to proceed as
a legitimate empty file; adjust references to ret_count,
resp->file_resp().eof(), and response.code()/resp->code() accordingly.

In `@src/storage/src/backupable.cc`:
- Around line 89-101: The code computes backup_dir via GetSaveDirByIndex and
calls delete_dir before calling CreateCheckpointWithFiles, but if delete_dir
fails or the checkpoint target already exists earlier in the flow (e.g., in
GetCheckpointFiles / db_checkpoint.cc), file deletions can remain disabled; fix
by validating or ensuring the checkpoint target is removable before calling
GetCheckpointFiles/CreateCheckpointWithFiles (e.g., check delete_dir return and
retry/clear stale dir) and on any early-return path re-enable deletions by
calling the appropriate re-enable helper (same one used by db_checkpoint.cc,
e.g., EnableFileDeletions) so deletions are always restored even on error paths
in CreateCheckpointWithFiles or when the backup dir already exists.

---

Outside diff comments:
In `@src/rsync_client.cc`:
- Around line 323-332: The current code replaces file_set_ with remote_file_set
which forces re-copy of all remote files on retry; instead keep file_set_ as the
delta so resume works: compute newly_files (difference of remote_file_set and
local_file_set) and assign file_set_ = newly_files, leaving expired_files
handling deletions; update the assignment that currently sets file_set_ =
remote_file_set to set file_set_ to the newly_files set (using the existing
newly_files variable) so only missing files are retried.

In `@src/rsync_server.cc`:
- Around line 437-456: The handler currently re-queries GetDumpUUID and
db->bgsave_info().path causing it to switch snapshots mid-transfer; instead read
the snapshot UUID and bgsave path from the connection's reserved snapshot
metadata set during the meta phase (e.g. use the connection's saved snapshot
UUID and saved bgsave path fields/methods) and use those values for
response.set_snapshot_uuid(...) and filepath construction; if the connection has
no reserved snapshot metadata, return an error as before. Update the section
that calls g_pika_server->GetDumpUUID(db_name) and g_pika_server->GetDB(db_name)
to use conn's snapshot ownership fields (and keep the existing error handling if
the reserved snapshot is missing).
- Around line 236-286: GetDumpMeta returns snapshot_uuid and filenames but code
later re-reads db_ptr->bgsave_info().path which can race with concurrent bgsave;
capture the dump_path atomically right after GetDumpMeta (e.g., read
db_ptr->bgsave_info().path into a local dump_path variable immediately after
obtaining db_ptr and snapshot_uuid) and use that stored dump_path for the
integrity checks, pstd::DeleteDirIfExist, and when setting the response/claiming
ownership instead of re-reading db_ptr->bgsave_info().path; ensure all
references (the integrity loop that builds full_path, the LOG messages, the
delete call pstd::DeleteDirIfExist(dump_path), and the final assignment to
dump_path used by the connection) use this stable local variable so the path
cannot flip between checks and deletion.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 74324e56-e82d-42af-8435-545104f4ca02

📥 Commits

Reviewing files that changed from the base of the PR and between 4484840 and 36d8f17.

📒 Files selected for processing (17)

CMakeLists.txt
conf/pika.conf
include/pika_server.h
include/rsync_server.h
src/pika_command.cc
src/pika_conf.cc
src/pika_db.cc
src/pika_rm.cc
src/pika_server.cc
src/pstd/CMakeLists.txt
src/pstd/src/env.cc
src/rsync_client.cc
src/rsync_server.cc
src/storage/CMakeLists.txt
src/storage/include/storage/backupable.h
src/storage/include/storage/storage_define.h
src/storage/src/backupable.cc

coderabbitai · 2026-03-09T09:30:12Z

conf/pika.conf

 # for wash data from 4.0.0 to 4.0.1
 # https://github.com/OpenAtomFoundation/pika/issues/2886
 # default value: true
-wash-data: true
+wash-data: false


⚠️ Potential issue | 🟠 Major

Disabling wash-data by default may break upgrades from 4.0.0 to 4.0.1.

The WashData() function (referenced in the comment at lines 673-675) is essential for migrating hash column family data to the correct internal format when upgrading. With wash-data: false as the default:

Users upgrading from 4.0.0 won't automatically get their data migrated

Hash values without the proper suffix encoding will remain inconsistent

This could cause silent data corruption or read failures

Consider either:

Keeping the default as true and documenting that users should set it to false after the first successful startup post-upgrade

Adding prominent upgrade documentation warning users to set wash-data: true before their first 4.0.1 startup

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@conf/pika.conf` around lines 673 - 676, The config currently sets wash-data: false which prevents the WashData() migration from running during upgrades; change the default to wash-data: true so WashData() runs automatically on first startup after upgrading from 4.0.0, and add a clear comment next to the wash-data entry instructing operators to set wash-data: false after the first successful startup (or include alternative upgrade docs); specifically update the wash-data default and the adjacent comment block referenced by WashData() to reflect this behavior.

coderabbitai · 2026-03-09T09:30:12Z

include/pika_server.h

+  void ScheduleFileForCleanup(const std::string& filepath, int delay_seconds);
+  void ProcessPendingCleanupFiles();


⚠️ Potential issue | 🟠 Major

Keep the same transfer identity in the delayed-cleanup queue.

PendingCleanupInfo only stores filepath, but transfer state is tracked by snapshot_uuid + filename. If a slave retries the same SST during the grace window, the cleanup worker has no stable key to ask whether that file became active again before deleting it.

Minimal shape change

- void ScheduleFileForCleanup(const std::string& filepath, int delay_seconds); + void ScheduleFileForCleanup(const std::string& snapshot_uuid, + const std::string& filename, + const std::string& filepath, + int delay_seconds); struct PendingCleanupInfo { + std::string snapshot_uuid; + std::string filename; std::string filepath; time_t cleanup_time; };

Also applies to: 677-682

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@include/pika_server.h` around lines 246 - 247, PendingCleanupInfo currently only holds filepath so the delayed-cleanup queue can't determine transfer identity (snapshot_uuid + filename); modify PendingCleanupInfo to include snapshot_uuid and filename (or a single transfer_id composed of them), update ScheduleFileForCleanup to accept snapshot_uuid and filename (in addition to filepath/delay_seconds) and ensure ProcessPendingCleanupFiles uses the snapshot_uuid+filename identity when checking whether a file became active again before deletion; touch all uses of ScheduleFileForCleanup, ProcessPendingCleanupFiles, and any queue logic to push/pop the updated PendingCleanupInfo structure so cleanup decisions use the stable transfer identity.

coderabbitai · 2026-03-09T09:30:12Z

src/pika_db.cc

+  // Find first available sequence number
+  int seq = 0;
+  std::string time_sub_path;
+  std::string full_path;
+  do {
+    time_sub_path = prefix + "-" + std::to_string(seq);
+    full_path = base_path + time_sub_path + "/" + bgsave_sub_path_;
+    seq++;
+  } while (pstd::FileExists(full_path) && seq < 1000);  // Max 1000 dumps per day


⚠️ Potential issue | 🟠 Major

Reserve the sequence at the dump root, not the DB path.

The availability check uses .../dump-YYYYMMDD-N/<db_name>. That lets another DB reuse the same dump-YYYYMMDD-N as long as its own subdir is missing, so independent syncs can end up sharing one dump root. That breaks the per-slave exclusive-dump model and makes later dump-level cleanup/ownership ambiguous.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/pika_db.cc` around lines 344 - 352, The loop currently checks existence of base_path + time_sub_path + "/" + bgsave_sub_path_, which only reserves per-DB subpaths and allows different DBs to reuse the same dump root; change the check to reserve the dump root itself by testing base_path + time_sub_path (the dump-YYYYMMDD-N directory) instead of including bgsave_sub_path_. Update the construction of full_path used in the pstd::FileExists call (and any other place that assumes the checked path) to point at the dump root (using time_sub_path and base_path) so each sequence number is exclusively reserved for the entire dump root.

coderabbitai · 2026-03-09T09:30:12Z

src/pika_db.cc

+  int seq = 0;
+  std::string time_sub_path;
+  std::string full_path;
+  do {
+    time_sub_path = prefix + "-" + std::to_string(seq);
+    full_path = base_path + time_sub_path + "/" + bgsave_sub_path_;
+    seq++;
+  } while (pstd::FileExists(full_path) && seq < 1000);  // Max 1000 dumps per day
+
+  if (seq >= 1000) {
+    LOG(ERROR) << db_name_ << " too many dump directories for today";
    return false;


⚠️ Potential issue | 🟡 Minor

The last daily slot is never used.

Because seq is incremented before the limit check, a free ...-999 directory still exits the loop with seq == 1000, and Line 354 reports "too many dump directories". This caps the code at 999 usable slots, not 1000.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/pika_db.cc` around lines 345 - 356, The loop increments seq before checking the 1000-limit, so the final slot (-999) can be skipped; update the search to stop after checking up to 1000 candidates by iterating with seq from 0 to <1000 and breaking when pstd::FileExists(full_path) is false (e.g., replace the do/while with a for loop or move seq++ to after the FileExists check), ensuring time_sub_path/full_path are constructed using seq and using bgsave_sub_path_ and db_name_ as before so the final slot is considered and the seq >= 1000 check correctly indicates exhaustion.

coderabbitai · 2026-03-09T09:30:12Z

src/pika_db.cc

+  int ret = pstd::GetChildren(dbPath, subDirs);
+  LOG(INFO) << "[GetBgSaveMetaData] GetChildren for dbPath returned " << ret
+            << ", subDirs count=" << subDirs.size();
+  if (ret) {
+    LOG(WARNING) << "[GetBgSaveMetaData] Failed to read dbPath: " << dbPath;
+    return;
+  }

-  int db_instance_num = g_pika_conf->db_instance_num();
-  for (int index = 0; index < db_instance_num; index++) {
-    std::string instPath = dbPath + ((dbPath.back() != '/') ? "/" : "") + std::to_string(index);
-    if (!pstd::FileExists(instPath)) {
-      continue ;
+  for (const std::string& subDir : subDirs) {
+    std::string instPath = dbPath + "/" + subDir;
+    // Skip if not exists or is a file (not directory)
+    // Note: IsDir returns 0 for directory, 1 for file, -1 for error
+    if (!pstd::FileExists(instPath) || pstd::IsDir(instPath) != 0) {
+      continue;
    }

    std::vector<std::string> tmpFileNames;
-    int ret = pstd::GetChildren(instPath, tmpFileNames);
+    ret = pstd::GetChildren(instPath, tmpFileNames);
    if (ret) {
-      LOG(WARNING) << dbPath << " read dump meta files failed, path " << instPath;
-      return;
+      LOG(WARNING) << "[GetBgSaveMetaData] Failed to read instPath: " << instPath;
+      continue;
    }

-    for (const std::string fileName : tmpFileNames) {
-      fileNames -> push_back(std::to_string(index) + "/" + fileName);
+    for (const std::string& fileName : tmpFileNames) {
+      std::string fullPath = instPath + "/" + fileName;
+      struct stat st;
+      // Check if file exists and get its stat
+      if (stat(fullPath.c_str(), &st) != 0) {
+        // File doesn't exist, skip it
+        LOG(WARNING) << "[GetBgSaveMetaData] File does not exist: " << fullPath;
+        continue;


⚠️ Potential issue | 🟠 Major

Don't serve a partial dump manifest on scan errors.

These return/continues can drop an entire instance directory or even CURRENT/MANIFEST from fileNames. src/pika_server.cc:827-834 still returns OK to rsync, and src/rsync_server.cc:288-310 then treats the truncated list as the integrity baseline, so the slave can accept an incomplete snapshot instead of retrying a fresh bgsave. This path should surface an error, which likely means GetBgSaveMetaData needs to return a Status.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/pika_db.cc` around lines 431 - 461, GetBgSaveMetaData must not silently drop files on scan errors; change its signature to return a Status (e.g., Status GetBgSaveMetaData(...)) and replace the current silent continue/return behavior so that any failure from pstd::GetChildren, pstd::IsDir (when it returns -1), pstd::FileExists checks that indicate unexpected state, or stat(fullPath) != 0 returns a non-OK Status describing the problem. Update callers (the code path that currently treats GetBgSaveMetaData as void and later returns OK to rsync) to inspect and propagate the Status so rsync/replication will retry instead of accepting a truncated manifest. Ensure you reference and update uses of GetBgSaveMetaData, and keep logging but return error Status on any directory/file scan failure.

coderabbitai · 2026-03-09T09:30:12Z

src/pika_server.cc

+    // Read snapshot_uuid from info file for protection check
+    // TODO: For multi-DB setups, should check ALL db subdirectories (db0, db1, db2...)
+    // If any db is in use, the entire dump should be protected.
+    // Current simple approach only checks db0 for backward compatibility.
+    std::string snapshot_uuid;
+    std::string info_path = dump_file + "/db0/info";
+    if (!pstd::FileExists(info_path)) {
+      // Fallback to legacy path (directly under dump directory)
+      info_path = dump_file + "/info";
+    }


⚠️ Potential issue | 🟠 Major

Don't decide dump liveness from db0/info alone.

This only derives snapshot_uuid from db0/info or legacy /info. In a multi-DB dump, db0 can be absent while another dbN is still syncing, so snapshot_uuid stays empty and the directory can still be deleted through the "corrupted dump" or expiry paths even though that dump is still in use. Please scan all db*/info files, or at minimum fall back to dump_owners_ by dump_path before any delete.

Also applies to: 1647-1673, 1714-1718

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/pika_server.cc` around lines 1618 - 1627, The code currently derives snapshot_uuid using only db0/info (info_path) and legacy /info, which risks treating multi-DB dumps as inactive; update the logic around snapshot_uuid/dump_file to scan all db*/info files under the dump directory (e.g., iterate directory entries matching "db*" and read each info file to collect snapshot_uuid or any non-empty value) and only treat the dump as inactive if no dbN/info yields a snapshot UUID; additionally, if no snapshot_uuid is found, consult dump_owners_ by dump_path as a fallback before allowing deletion to ensure active dumps tracked in dump_owners_ are protected (apply same change to the other similar blocks referencing snapshot_uuid/info_path at the indicated spots).

coderabbitai · 2026-03-09T09:30:12Z

src/pika_server.cc

+bool PikaServer::EnsureDirExists(const std::string& path, mode_t mode) {
+  // First check if directory already exists
+  if (pstd::FileExists(path)) {
+    return true;
+  }
+  // Directory doesn't exist, try to create it
+  int ret = pstd::CreatePath(path, mode);
+  // CreatePath returns 0 on success, -1 on failure
+  // Note: CreatePath also returns -1 if directory already exists (due to the
+  // !filesystem::create_directories check), but we already checked FileExists above
+  if (ret != 0) {
+    LOG(WARNING) << "Failed to create directory: " << path << ", error code: " << ret;
+    return false;
+  }
+  // Verify directory was created successfully
+  return pstd::FileExists(path);


⚠️ Potential issue | 🟡 Minor

EnsureDirExists() should verify a directory, not just path existence.

pstd::FileExists(path) also succeeds for regular files, and the pre-check before CreatePath() is racy with concurrent creators. Both cases can make dump directory setup report the wrong result.

Suggested hardening

bool PikaServer::EnsureDirExists(const std::string& path, mode_t mode) { - // First check if directory already exists - if (pstd::FileExists(path)) { - return true; - } + struct stat st; + if (stat(path.c_str(), &st) == 0) { + return S_ISDIR(st.st_mode); + } // Directory doesn't exist, try to create it int ret = pstd::CreatePath(path, mode); - // CreatePath returns 0 on success, -1 on failure - // Note: CreatePath also returns -1 if directory already exists (due to the - // !filesystem::create_directories check), but we already checked FileExists above - if (ret != 0) { - LOG(WARNING) << "Failed to create directory: " << path << ", error code: " << ret; - return false; + if (ret == 0 && stat(path.c_str(), &st) == 0) { + return S_ISDIR(st.st_mode); } - // Verify directory was created successfully - return pstd::FileExists(path); + if (stat(path.c_str(), &st) == 0 && S_ISDIR(st.st_mode)) { + return true; // created concurrently + } + LOG(WARNING) << "Failed to create directory: " << path << ", error code: " << ret; + return false; }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/pika_server.cc` around lines 2067 - 2082, EnsureDirExists currently uses pstd::FileExists which returns true for regular files and does a racy pre-check; change it to verify the path is a directory (use an is-directory check such as pstd::IsDirectory or stat+S_ISDIR) instead of FileExists, avoid the race by not returning success on the initial FileExists check (or always attempt CreatePath and then re-check that the path is a directory), and after pstd::CreatePath handle the -1 case by re-checking whether the path now exists and is a directory (treat that as success) versus a real failure, referencing EnsureDirExists, pstd::FileExists, and pstd::CreatePath in your changes.

coderabbitai · 2026-03-09T09:30:12Z

src/pstd/src/env.cc

+    if (filesystem::is_empty(dir)) {
+      return -1;
+    }


⚠️ Potential issue | 🟠 Major

Return success for empty directories.

Line 136 changes GetChildren so an existing-but-empty directory returns -1. Callers in src/pika_stable_log.cc and src/pika_server.cc treat any non-zero as a real failure, so this will turn normal empty-directory states into warnings/early returns and can skip cleanup/scan paths. Keep result empty and return 0; reserve -1 for actual filesystem errors.

💡 Proposed fix

int GetChildren(const std::string& dir, std::vector<std::string>& result) { result.clear(); try { - if (filesystem::is_empty(dir)) { - return -1; - } for (auto& de : filesystem::directory_iterator(dir)) { result.emplace_back(de.path().filename()); } } catch (const filesystem::filesystem_error& e) { LOG(WARNING) << "GetChildren failed for " << dir << ": " << e.what();

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/pstd/src/env.cc` around lines 136 - 138, GetChildren currently returns -1 when filesystem::is_empty(dir), which signals an error to callers like the Pika scan/cleanup code; change the behavior so that when the directory exists but is empty you leave the output container (result) empty and return 0, reserving -1 only for real filesystem errors. In short: in GetChildren, remove/replace the early return -1 on filesystem::is_empty(dir) with logic that returns 0 and an empty result, so callers of GetChildren see success with no children rather than an error.

coderabbitai · 2026-03-09T09:30:12Z

src/rsync_client.cc

+      // If Master returns empty response, it means the file was cleaned up during sync
+      // This is an error condition - the sync should fail and retry
+      if (ret_count == 0 && resp->file_resp().eof()) {
+        LOG(ERROR) << "File not available on Master at offset " << offset
+                   << ", filename: " << filename
+                   << ". Will retry and may trigger new bgsave.";
+        s = Status::IOError("File not available on Master");
+        return s;
+      }


⚠️ Potential issue | 🟠 Major

Don't treat a valid zero-byte file as "missing".

kOk + eof=true + count=0 is also the normal response for an existing empty file. Missing files are already reported via response.code() != kOk, so this block will make legitimate empty files fail full sync.

🛠️ Suggested fix

- // If Master returns empty response, it means the file was cleaned up during sync - // This is an error condition - the sync should fail and retry - if (ret_count == 0 && resp->file_resp().eof()) { - LOG(ERROR) << "File not available on Master at offset " << offset - << ", filename: " << filename - << ". Will retry and may trigger new bgsave."; - s = Status::IOError("File not available on Master"); - return s; - } + // A zero-byte file is also represented as kOk + eof=true + count=0. + // Missing files should be handled via resp->code() != kOk.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/rsync_client.cc` around lines 242 - 250, The current check in rsync_client.cc incorrectly treats a valid zero-byte file as missing when ret_count == 0 && resp->file_resp().eof(); change the logic so we only consider the file missing when the RPC response indicates an error (response.code() != kOk or resp->code() != kOk) rather than when count==0+eof; update the conditional around the LOG(ERROR) / Status::IOError("File not available on Master") and return to only trigger on a non-OK response code while allowing the kOk + eof + count==0 case to proceed as a legitimate empty file; adjust references to ret_count, resp->file_resp().eof(), and response.code()/resp->code() accordingly.

coderabbitai · 2026-03-09T09:30:13Z

src/storage/src/backupable.cc

+    std::string backup_dir = GetSaveDirByIndex(dir, engine.first);
+    delete_dir(backup_dir.c_str());
+
+    s = engine.second->CreateCheckpointWithFiles(
+        backup_dir, bcontent.live_files, bcontent.live_wal_files,
+        bcontent.manifest_file_size, bcontent.sequence_number);
+
+    // 3. Re-enable file deletions regardless of success
+    // CreateCheckpointWithFiles already calls EnableFileDeletions in db_checkpoint.cc
+    if (!s.ok()) {
+      LOG(WARNING) << "CreateCheckpointWithFiles failed for index " << engine.first
+                   << ": " << s.ToString();
+      return s;


⚠️ Potential issue | 🟠 Major

Validate the checkpoint target before GetCheckpointFiles().

src/storage/src/db_checkpoint.cc:75-97 leaves file deletions disabled after GetCheckpointFiles, and src/storage/src/db_checkpoint.cc:98-104 returns immediately if the checkpoint dir already exists. Because delete_dir(backup_dir.c_str()) is unchecked here, a stale or undeletable target can strand the instance with file deletions still disabled. Clean up or validate backup_dir before calling GetCheckpointFiles, or explicitly re-enable deletions on this early-failure path.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/storage/src/backupable.cc` around lines 89 - 101, The code computes backup_dir via GetSaveDirByIndex and calls delete_dir before calling CreateCheckpointWithFiles, but if delete_dir fails or the checkpoint target already exists earlier in the flow (e.g., in GetCheckpointFiles / db_checkpoint.cc), file deletions can remain disabled; fix by validating or ensuring the checkpoint target is removable before calling GetCheckpointFiles/CreateCheckpointWithFiles (e.g., check delete_dir return and retry/clear stale dir) and on any early-return path re-enable deletions by calling the appropriate re-enable helper (same one used by db_checkpoint.cc, e.g., EnableFileDeletions) so deletions are always restored even on error paths in CreateCheckpointWithFiles or when the backup dir already exists.

chenbt-hz added 5 commits February 3, 2026 17:51

fix: ttl error for hlen update cache

7be1b42

fix: rate_limiter_bandwidth_ overflow

0b434a6

fix: \x00 set error

a0376e2

change: 移除不需要的文件

974276b

chenbt-hz requested review from Mixficsol, chejinge and wangshao1 March 6, 2026 08:59

chenbt-hz added bug ✏️ Feature New feature or request 4.0.2 labels Mar 6, 2026

Copilot AI review requested due to automatic review settings March 6, 2026 08:59

github-actions bot added Invalid PR Title ☢️ Bug Something isn't working labels Mar 6, 2026

Copilot started reviewing on behalf of chenbt-hz March 6, 2026 08:59 View session

chenbt-hz changed the title ~~Unstable fixbug~~ feature: 主从同步的dump清理机制调整，优化存储空间占用 Mar 6, 2026

Copilot AI reviewed Mar 6, 2026

View reviewed changes

coderabbitai bot reviewed Mar 9, 2026

View reviewed changes

chejinge changed the title ~~feature: 主从同步的dump清理机制调整，优化存储空间占用~~ feat: 主从同步的dump清理机制调整，优化存储空间占用 Mar 12, 2026

chejinge changed the title ~~feat: 主从同步的dump清理机制调整，优化存储空间占用~~ feat: Adjustment of the master-slave synchronization dump cleanup mechanism, optimizing storage space usage Mar 12, 2026

github-actions bot removed the Invalid PR Title label Mar 12, 2026

		void ScheduleFileForCleanup(const std::string& filepath, int delay_seconds);
		void ProcessPendingCleanupFiles();

Conversation

chenbt-hz commented Mar 6, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

chenbt-hz commented Mar 6, 2026

Uh oh!

Issues-translate-bot commented Mar 6, 2026

Uh oh!

chenbt-hz commented Mar 6, 2026

Pika 全量同步方案详解

文档说明

1. 架构概述

1.1 核心设计

1.2 关键组件

1.3 关键数据结构

2. 单个 Slave 单 DB 全量同步流程

2.1 流程时序图

2.2 Master 状态变化

2.3 Slave 状态变化

2.4 数据变化

3. 多 Slave 同步流程

3.1 场景描述

3.2 流程时序

3.3 关键限制

4. 单 Slave 多 DB 同步流程

4.1 场景描述

4.2 目录结构

4.3 文件命名规则

4.4 同步流程

4.5 潜在问题

5. 多 Slave 多 DB 同步流程

5.1 场景描述

5.2 Dump 占用机制

5.3 占用检查

5.4 潜在问题

6. 孤儿文件清理机制（统一延迟清理）

6.1 触发条件

6.2 统一清理策略

6.3 保护机制

6.4 时序说明

6.5 对比：旧方案 vs 新方案

7. Bug 列表

7.1 待修复的 Bug

8. 待办事项

8.1 高优先级

8.2 中优先级

8.3 低优先级

9. 配置建议

9.1 关键配置项

9.2 部署建议

10. 附录

10.1 关键日志

10.2 状态码说明

10.3 文件路径规范

Uh oh!

Issues-translate-bot commented Mar 6, 2026

Detailed explanation of Pika’s full synchronization plan

Document description

1. Architecture Overview

1.1 Core design

1.2 Key components

1.3 Key data structures

2. Single Slave single DB full synchronization process

2.1 Process sequence diagram

2.2 Master status changes

2.3 Slave status changes

2.4 Data changes

3. Multi-Slave synchronization process

3.1 Scene description

3.2 Process Timing

3.3 Key limitations

4. Single Slave multiple DB synchronization process

4.1 Scene description

4.2 Directory structure

4.3 File naming rules

4.4 Synchronization process

4.5 Potential Problems

5. Multi-Slave multi-DB synchronization process

chenbt-hz commented Mar 6, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 9, 2026 •

edited

Loading