Skip to content

feat: Adjustment of the master-slave synchronization dump cleanup mechanism, optimizing storage space usage#3225

Open
chenbt-hz wants to merge 6 commits intoOpenAtomFoundation:unstablefrom
chenbt-hz:unstable-fixbug
Open

feat: Adjustment of the master-slave synchronization dump cleanup mechanism, optimizing storage space usage#3225
chenbt-hz wants to merge 6 commits intoOpenAtomFoundation:unstablefrom
chenbt-hz:unstable-fixbug

Conversation

@chenbt-hz
Copy link
Collaborator

@chenbt-hz chenbt-hz commented Mar 6, 2026

1.hlen命令导致ttl异常
2.rate_limit异常
3.\x00解析异常
4.【优化】调整主从同步时的dump目录清理机制

Summary by CodeRabbit

Release Notes

  • New Features

    • Enhanced backup snapshot tracking and ownership management for safer rsync transfers
    • New backup creation method that reduces time windows during checkpoint operations
  • Bug Fixes

    • Fixed integer overflow in default bandwidth rate limiter configuration
    • Improved error handling for directory operations and file transfers
    • Enhanced dump file integrity validation
  • Configuration Changes

    • Updated default bandwidth rate limiter settings
    • Adjusted wash-data default value

1. 每个Slave独占一个dump目录(dump-YYYYMMDD-NN格式)
2. 传输完成后立即清理文件,释放磁盘空间
3. 限制并发dump数量为3个
4. 完善dump完整性检查和占用管理
备注:1.未兼容多database场景 2.多slave同时全量同步时,清理存在异常 3.当天多次手动同步时孤儿文件清理能力下降
@chenbt-hz chenbt-hz added bug ✏️ Feature New feature or request 4.0.2 labels Mar 6, 2026
Copilot AI review requested due to automatic review settings March 6, 2026 08:59
@github-actions github-actions bot added Invalid PR Title ☢️ Bug Something isn't working labels Mar 6, 2026
@chenbt-hz chenbt-hz changed the title Unstable fixbug feature: 主从同步的dump清理机制调整,优化存储空间占用 Mar 6, 2026
@chenbt-hz
Copy link
Collaborator Author

feat: 独立dump + 即时清理机制,解决Pika全量同步孤儿文件问题:

  1. 每个Slave独占一个dump目录(dump-YYYYMMDD-NN格式)
  2. 传输完成后立即清理文件,释放磁盘空间
  3. 限制并发dump数量为3个
  4. 完善dump完整性检查和占用管理
    备注:1.未兼容多database场景 2.多slave同时全量同步时,清理存在异常 3.当天多次手动同步时孤儿文件清理能力下降

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


feat: independent dump + instant cleanup mechanism to solve the problem of full synchronization of orphan files in Pika:

  1. Each Slave has an exclusive dump directory (dump-YYYYMMDD-NN format)
  2. Clean the files immediately after the transfer is completed to free up disk space
  3. Limit the number of concurrent dumps to 3
  4. Improve dump integrity check and occupancy management
    Notes: 1. Not compatible with multi-database scenarios 2. When multiple slaves are fully synchronized at the same time, there are exceptions in cleaning. 3. The ability to clean orphan files decreases when manual synchronization is performed multiple times in the same day.

@chenbt-hz
Copy link
Collaborator Author

Pika 全量同步方案详解

文档说明

本文档详细描述 Pika 新的全量同步机制(Scheme A),包括各场景下的完整流程、状态变化、数据流转以及已知问题。

  • 方案名称:Scheme A(独立 Dump + 延迟清理)
  • 最后更新:2026-03-06

1. 架构概述

1.1 核心设计

Scheme A 采用以下设计原则:

  1. 每个 Slave 独占一个 Dump 目录dump-YYYYMMDD-NN/db_name 格式
  2. 传输完成后延迟清理:孤儿文件(nlink=1)传输完成后加入延迟清理队列(10分钟后删除)
  3. 最大并发限制:默认最多 3 个并发 dump
  4. 细粒度文件保护:传输中的文件受保护,防止被误删
  5. 统一清理入口:所有孤儿文件清理通过 RemoveTransferringFile 统一处理

1.2 关键组件

组件 文件 职责
RsyncServer rsync_server.cc 处理 Slave 文件同步请求
RsyncServerConn rsync_server.cc 维护单个连接的状态
PikaServer pika_server.cc 管理 Dump 占用、snapshot 注册
DB pika_db.cc 管理 bgsave 和 dump 元数据

1.3 关键数据结构

// Dump 占用信息
struct DumpOwnerInfo {
    std::string conn_id;      // 占用连接的 ID
    std::string dump_path;    // dump 目录路径
};
std::map<std::string, DumpOwnerInfo> dump_owners_;  // snapshot_uuid -> 占用信息

// 传输中文件保护
std::map<std::string, std::set<std::string>> rsync_transferring_files_;  // snapshot_uuid -> 文件集合

// 活跃 snapshot
std::set<std::string> active_rsync_snapshots_;  // 用于孤儿文件清理保护

2. 单个 Slave 单 DB 全量同步流程

db0 为例,详细描述 Master 和 Slave 的状态变化。

2.1 流程时序图

阶段1: 触发全量同步
┌─────────────┐                    ┌─────────────┐
│    Slave    │                    │    Master   │
└──────┬──────┘                    └──────┬──────┘
       │                                   │
       │  1. 判断需要全量同步               │
       │  (repl_state: kTryConnect)       │
       │                                   │
       │  2. 发送 DBSync 请求              │
       │ ───────────────────────────────>│
       │                                   │
       │                              3. 检查是否正在 bgsave
       │                              (IsBgSaving())
       │                                   │
       │                              4. 如果不在 bgsave,触发 bgsave
       │                              (BgSaveDB())
       │                                   │
       │  5. 返回 kErr(等待 bgsave)      │
       │ <───────────────────────────────│
       │                                   │
       │  6. 重试(循环)                   │
       │ ───────────────────────────────>│
       │                              7. 如果仍在 bgsave,返回 kErr
       │ <───────────────────────────────│
       │                                   │

阶段2: bgsave 执行
┌─────────────┐                    ┌─────────────┐
│  Background │                    │    Master   │
│    Thread   │                    │             │
└──────┬──────┘                    └──────┬──────┘
       │                                   │
       │  1. 创建 dump 目录                 │
       │  (InitBgsaveEnv)                  │
       │  dump-20260305-0/db0              │
       │                                   │
       │  2. 创建 RocksDB Checkpoint       │
       │  (创建硬链接)                      │
       │                                   │
       │  3. 生成 info 文件                 │
       │                                   │
       │  4. bgsave 完成                   │
       │  (IsBgSaving() -> false)          │
       │                                   │

阶段3: Meta 请求处理
┌─────────────┐                    ┌─────────────┐
│    Slave    │                    │    Master   │
└──────┬──────┘                    └──────┬──────┘
       │                                   │
       │  1. 再次发送 DBSync 请求          │
       │  (循环重试后 bgsave 已完成)        │
       │ ───────────────────────────────>│
       │                                   │
       │                              2. 获取文件列表
       │                              (GetDumpMeta)
       │                              扫描 dump-20260305-0/db0
       │                              生成 snapshot_uuid
       │                                   │
       │                              3. 检查 dump 完整性
       │                              4. 检查是否已被占用
       │                              5. 检查并发限制
       │                              6. 标记 dump 为占用
       │                              (MarkDumpInUse)
       │                              7. 注册 snapshot
       │                              (RegisterSnapshot)
       │                              8. 预注册所有文件
       │                              (AddTransferringFile)
       │                                   │
       │  9. 返回 Meta 响应                 │
       │  (snapshot_uuid + 文件列表)       │
       │ <───────────────────────────────│
       │                                   │

阶段4: 文件传输
┌─────────────┐                    ┌─────────────┐
│    Slave    │                    │    Master   │
└──────┬──────┘                    └──────┬──────┘
       │                                   │
       │  1. 多线程下载文件                 │
       │  ─────────────────────────────> │
       │                                   │
       │                              2. 检查文件是否存在
       │                              3. 注册文件为传输中
       │                              4. 读取文件内容
       │                              5. 注销文件
       │                              6. 如果是最后一块(is_eof)
       │                                 检查是否为孤儿文件(nlink=1)
       │                                 如果是孤儿,加入延迟清理队列(10分钟)
       │                                   │
       │  7. 返回文件数据                   │
       │ <───────────────────────────────│
       │                                   │
       │  (重复直到所有文件下载完成)         │

阶段5: 清理
┌─────────────┐                    ┌─────────────┐
│    Slave    │                    │    Master   │
└──────┬──────┘                    └──────┬──────┘
       │                                   │
       │  1. 下载完成,关闭连接             │
       │ ───────X────────────────────────>│
       │                                   │
       │                              2. 连接断开,析构 RsyncServerConn
       │                              3. 释放 dump 占用
       │                              (ReleaseDump)
       │                              4. 注销 snapshot
       │                              (UnregisterSnapshot)
       │                                   │
       │                              5. AutoDeleteExpiredDump 定时执行
       │                              处理延迟清理队列(ProcessPendingCleanupFiles)
       │                              删除过期 dump 目录
       │                              (注:CleanupOrphanSstFiles 已移除,延迟清理统一处理)
       │                                   │

2.2 Master 状态变化

阶段 状态 说明
T0 无 dump 初始状态
T1 bgsaving 创建 dump-20260305-0/db0
T2 dump 可用 bgsave 完成,等待 Meta 请求
T3 dump 占用 收到 Meta 请求,标记为占用
T4 传输中 文件传输中,即时清理进行中
T5 dump 释放 Slave 断开,释放占用
T6 dump 过期 AutoDeleteExpiredDump 删除过期 dump

2.3 Slave 状态变化

阶段 状态 说明
T0 kTryConnect 尝试连接 Master
T1 kWaitDBSync 等待 Master bgsave 完成
T2 kWaitDBSync 获取文件列表,开始下载
T3 kWaitDBSync 文件下载中
T4 kConnected 全量同步完成,开始增量同步

2.4 数据变化

Master 磁盘占用变化

时间点 数据目录 Dump 目录 总计
初始 100GB 0 100GB
bgsave 中 100GB 0 (硬链接不占用) 100GB
compaction 后 100GB 部分孤儿文件 100GB + 孤儿文件
传输中 100GB 100GB (dump) 200GB
传输完成 100GB 孤儿文件延迟10分钟清理 100GB ~ 200GB

3. 多 Slave 同步流程

3.1 场景描述

  • Master 有 100GB 数据
  • Slave-1 先发起同步
  • Slave-2 在 Slave-1 同步过程中发起同步

3.2 流程时序

时间线:
T0:
  Slave-1 ──DBSync──> Master
  Master: IsBgSaving? No
  Master: 触发 BgSaveDB()
  Master: 创建 dump-20260305-0/db0
  Slave-1 <──kErr─── Master (等待 bgsave)

T30s:
  Master: bgsave 完成
  Slave-1 ──DBSync──> Master
  Master: 获取文件列表 (dump-0)
  Master: MarkDumpInUse(dump-0, Slave-1)
  Slave-1 <──文件列表── Master
  Slave-1 开始下载...

T31s:
  Slave-2 ──DBSync──> Master
  Master: IsDumpInUse(dump-0)? Yes (被 Slave-1 占用)
  Master: 触发新的 BgSaveDB()
  Master: 创建 dump-20260305-1/db0
  Slave-2 <──kErr─── Master (等待新 bgsave)

T61s:
  Master: 新 bgsave 完成
  Slave-2 ──DBSync──> Master
  Master: MarkDumpInUse(dump-1, Slave-2)
  Slave-2 <──文件列表── Master
  Slave-2 开始下载...

T120s:
  Slave-1: 下载完成,断开连接
  Master: ReleaseDump(dump-0)
  Master: 删除 dump-0 (AutoDeleteExpiredDump)

T180s:
  Slave-2: 下载完成,断开连接
  Master: ReleaseDump(dump-1)
  Master: 删除 dump-1

3.3 关键限制

  • 最大并发 dump 数:3 个(kMaxConcurrentDumps = 3
  • 超过限制:返回 kErr,Slave 重试

4. 单 Slave 多 DB 同步流程

4.1 场景描述

  • Master 配置 3 个 DB:db0, db1, db2
  • 每个 DB 有独立的 RocksDB 实例(db-instance-num=3)
  • Slave 同时同步所有 DB

4.2 目录结构

dump/dump-20260305-0/
├── db0/
│   ├── 0/          # RocksDB 实例 0
│   │   ├── 000001.sst
│   │   └── 000002.sst
│   ├── 1/          # RocksDB 实例 1
│   │   └── 000003.sst
│   ├── 2/          # RocksDB 实例 2
│   │   └── 000004.sst
│   └── info        # dump 元信息
├── db1/
│   ├── 0/
│   ├── 1/
│   ├── 2/
│   └── info
└── db2/
    ├── 0/
    ├── 1/
    ├── 2/
    └── info

4.3 文件命名规则

  • Slave 请求格式:{rocksdb_instance}/{filename}
  • 示例:0/000001.sst, 1/000003.sst
  • 注意:不包含 db0/db1/db2 前缀

4.4 同步流程

每个 DB 独立同步:

  1. Slave 发送 db0 的 DBSync 请求
  2. Master 返回 db0 的文件列表
  3. Slave 下载 db0 的所有文件
  4. 重复步骤 1-3 对 db1 和 db2

4.5 潜在问题

问题:info 文件位置不一致

  • AutoDeleteExpiredDump 查找:dump/dump-xxx/info
  • 实际位置:dump/dump-xxx/db0/info

已修复:先尝试 db0/info,再回退到 info


5. 多 Slave 多 DB 同步流程

这是 Scheme A 最复杂的场景,结合了多 Slave 和多 DB 的特点。

5.1 场景描述

  • Master:3 个 DB (db0, db1, db2)
  • Slave-1:同步 db0, db1, db2
  • Slave-2:同步 db0, db1, db2

5.2 Dump 占用机制

方案 A 设计:每个 Slave 独占整个 dump 目录(包含所有 DB)

Slave-1 占用 dump-20260305-0:
├── db0 (传输中)
├── db1 (传输中)
└── db2 (传输中)

Slave-2 占用 dump-20260305-1:
├── db0 (传输中)
├── db1 (传输中)
└── db2 (传输中)

5.3 占用检查

  • 检查粒度:整个 dump 目录
  • 如果一个 Slave 正在使用 dump-0,其他 Slave 不能使用
  • 触发新的 bgsave 创建 dump-1

5.4 潜在问题

问题 1:DB 级别粒度 vs Dump 级别粒度

  • 当前设计:dump 级别占用
  • 如果 Slave-1 只同步 db0,dump-0 仍不能被 Slave-2 使用
  • 浪费磁盘空间

问题 2:多 DB 的孤儿文件清理

  • AutoDeleteExpiredDump 只检查 db0/info
  • 如果 db1 或 db2 还在传输,可能被误判为可清理

6. 孤儿文件清理机制(统一延迟清理)

6.1 触发条件

孤儿文件:nlink=1 的 SST 文件(只被 dump 引用,不被 RocksDB 引用)

产生原因

  • RocksDB compaction 删除旧 SST
  • dump 中的硬链接变成孤儿

6.2 统一清理策略

设计变更:移除 CleanupOrphanSstFiles 函数,统一使用延迟清理队列

新清理流程

1. 文件传输完成时(RemoveTransferringFile)
   - 检查 is_eof=true(最后一块传输完成)
   - stat 检查文件 nlink
   - 如果 nlink=1(孤儿文件):
     * 加入延迟清理队列(ScheduleFileForCleanup,延迟600秒)
     * 记录日志 "Scheduled orphan file for cleanup"
   - 如果 nlink=2(非孤儿):
     * 不做处理,RocksDB 会管理生命周期

2. AutoDeleteExpiredDump 定时执行(每60秒)
   - 调用 ProcessPendingCleanupFiles()
   - 检查队列中到期的文件
   - 删除到期文件,记录日志 "Deleted delayed cleanup file"
   - 同时检查并删除过期的 dump 目录

6.3 保护机制

保护级别 说明 实现位置
传输中保护 传输中的文件不会被清理 rsync_transferring_files_
延迟保护 孤儿文件延迟10分钟删除,给 Slave 重试时间 ScheduleFileForCleanup(filepath, 600)
nlink 检查 只清理孤儿文件(nlink=1),避免误删 stat 检查

6.4 时序说明

T0: 文件传输完成(is_eof=true)
  └─> RemoveTransferringFile 检查 nlink==1
      └─> ScheduleFileForCleanup(filepath, 600) 加入队列
          └─> 日志: "Scheduled orphan file for cleanup in 10min"

T0+10min: AutoDeleteExpiredDump 定时执行
  └─> ProcessPendingCleanupFiles()
      └─> 检查到期文件
          └─> 删除文件
              └─> 日志: "Deleted delayed cleanup file"

6.5 对比:旧方案 vs 新方案

方面 旧方案(CleanupOrphanSstFiles) 新方案(统一延迟清理)
触发时机 定时扫描所有 dump 目录 传输完成时即时检查
清理延迟 扫描周期不确定 固定延迟10分钟
竞争条件 与传输过程可能竞争 无竞争,统一入口
代码复杂度 ~170行独立函数 ~15行集成逻辑
Slave 重试 可能失败(文件已被删) 10分钟内可重试

7. Bug 列表

7.1 待修复的 Bug

Bug 影响 严重程度 修复方案
多 DB 场景下孤儿文件清理粒度问题 db1/db2 传输中可能被误判 检查所有 DB 的 info 文件
多 Slave 多 DB 的磁盘浪费 每个 Slave 独占整个 dump 支持 DB 级别占用

8. 待办事项

8.1 高优先级

  • 统一孤儿文件清理机制(已完成)

    • 移除 CleanupOrphanSstFiles 函数
    • 统一使用 RemoveTransferringFile + 延迟清理队列
    • 延迟10分钟,给 Slave 重试时间
  • 修复多 DB 孤儿文件清理粒度问题

    • 当前只检查 db0/info
    • 需要检查所有 DB 子目录
    • 如果任何 DB 在使用中,整个 dump 应被保护

8.2 中优先级

  • 优化多 Slave 多 DB 的磁盘占用

    • 当前:每个 Slave 独占整个 dump
    • 优化:支持 DB 级别占用
    • 影响:需要修改占用管理逻辑
  • 完善监控指标

    • Dump 占用数量
    • 孤儿文件清理统计
    • 传输失败率

8.3 低优先级

  • 支持动态调整并发限制

    • 当前:编译期常量 kMaxConcurrentDumps=3
    • 优化:支持配置热更新
  • Dump 压缩传输

    • 减少网络带宽
    • 权衡 CPU 和网络

9. 配置建议

9.1 关键配置项

# pika.conf

# dump 目录前缀
dump-prefix : dump-

# dump 目录路径
dump-path : ./dump/

# dump 过期时间(天)
# 0 表示永不过期
dump-expire : 1

# RocksDB 实例数
db-instance-num : 3

# 最大并发 dump 数(编译期配置)
# kMaxConcurrentDumps = 3

9.2 部署建议

  1. 磁盘空间:预留 3 × 数据量 的空间
  2. 监控:监控 dump 目录数量和磁盘使用率
  3. 日志:关注 [Rsync Meta][RsyncTransfer][Scheduled orphan file[Deleted delayed cleanup 日志

10. 附录

10.1 关键日志

# 查看 Meta 请求处理
grep "Rsync Meta" log/pika.INFO

# 查看文件传输
grep "RsyncTransfer" log/pika.INFO

# 查看孤儿文件延迟清理调度
grep "Scheduled orphan file" log/pika.INFO

# 查看延迟清理执行
grep "Deleted delayed cleanup file" log/pika.INFO

# 查看 dump 占用
grep "DumpOwnership" log/pika.INFO

# 查看错误
grep "File no longer exists" log/pika.WARNING

10.2 状态码说明

状态码 含义 处理
kOk 成功 继续处理
kErr 错误 Slave 重试

10.3 文件路径规范

类型 格式 示例
Dump 目录 dump-YYYYMMDD-NN/db_name dump-20260305-0/db0
RocksDB 实例 {rocksdb_instance}/ 0/, 1/, 2/
SST 文件 {instance}/{filename}.sst 0/000001.sst
Info 文件 db_name/info db0/info

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


Detailed explanation of Pika’s full synchronization plan

Document description

This document describes in detail Pika's new full synchronization mechanism (Scheme A), including the complete process, status changes, data flow and known issues in each scenario.

  • Scheme name: Scheme A (independent Dump + delayed cleanup)
  • Last updated: 2026-03-06

1. Architecture Overview

1.1 Core design

Scheme A uses the following design principles:

  1. Each Slave has an exclusive Dump directory: dump-YYYYMMDD-NN/db_name format
  2. Delayed cleanup after the transfer is completed: Orphan files (nlink=1) are added to the delayed cleanup queue after the transfer is completed (deleted after 10 minutes)
  3. Maximum concurrency limit: Up to 3 concurrent dumps by default
  4. Fine-grained file protection: Files in transit are protected to prevent accidental deletion.
  5. Unified cleanup entry: All orphan file cleanups are processed uniformly through RemoveTransferringFile

1.2 Key components

Components Documentation Responsibilities
RsyncServer rsync_server.cc Handle Slave file synchronization requests
RsyncServerConn rsync_server.cc Maintains the state of a single connection
PikaServer pika_server.cc Manage Dump occupation, snapshot registration
DB pika_db.cc Manage bgsave and dump metadata

1.3 Key data structures

// Dump occupancy information
struct DumpOwnerInfo {
    std::string conn_id; // ID of occupied connection
    std::string dump_path; // dump directory path
};
std::map<std::string, DumpOwnerInfo> dump_owners_; // snapshot_uuid -> occupancy information

// File protection in transit
std::map<std::string, std::set<std::string>> rsync_transferring_files_; // snapshot_uuid -> file collection

// active snapshot
std::set<std::string> active_rsync_snapshots_; // Used for orphan file cleanup protection

2. Single Slave single DB full synchronization process

Taking db0 as an example, describe the status changes of Master and Slave in detail.

2.1 Process sequence diagram

Phase 1: Trigger full synchronization
┌─────────────┐ ┌─────────────┐
│ Slave │ │ Master │
└──────┬──────┘ └──────┬───────┘
       │ │
       │ 1. Determine the need for full synchronization │
       │ (repl_state: kTryConnect) │
       │ │
       │ 2. Send DBSync request │
       │────────────────────────────────>│
       │ │
       │ 3. Check whether bgsave is running
       │ (IsBgSaving())
       │ │
       │ 4. If it is not in bgsave, trigger bgsave
       │ (BgSaveDB())
       │ │
       │ 5. Return kErr (wait for bgsave) │
       │ <────────────────────────────────│
       │ │
       │ 6. Retry (loop) │
       │────────────────────────────────>│
       │ 7. If still in bgsave, return kErr
       │ <───────────────────────────────│
       │ │

Phase 2: bgsave execution
┌─────────────┐ ┌─────────────┐
│Background │ │Master │
│ Thread │ │ │
└──────┬──────┘ └──────┬───────┘
       │ │
       │ 1. Create dump directory │
       │ (InitBgsaveEnv) │
       │ dump-20260305-0/db0 │
       │ │
       │ 2. Create RocksDB Checkpoint │
       │ (Create hard link) │
       │ │
       │ 3. Generate info file │
       │ │
       │ 4. bgsave completed │
       │ (IsBgSaving() -> false) │
       │ │

Phase 3: Meta request processing
┌─────────────┐ ┌─────────────┐
│ Slave │ │ Master │
└──────┬──────┘ └──────┬───────┘
       │ │
       │ 1. Send the DBSync request again │
       │ (bgsave completed after loop retry) │
       │────────────────────────────────>│
       │ │
       │ 2. Get file list
       │ (GetDumpMeta)
       │ Scan dump-20260305-0/db0
       │ Generate snapshot_uuid
       │ │
       │ 3. Check dump integrity
       │ 4. Check if it is occupied
       │ 5. Check concurrency limits
       │ 6. Mark dump as occupied
       │ (MarkDumpInUse)
       │ 7. Register snapshot
       │ (RegisterSnapshot)
       │ 8. Pre-register all documents
       │ (AddTransferringFile)
       │ │
       │ 9. Return Meta response │
       │ (snapshot_uuid + file list) │
       │ <────────────────────────────────│
       │ │

Stage 4: File transfer
┌─────────────┐ ┌─────────────┐
│ Slave │ │ Master │
└──────┬──────┘ └──────┬───────┘
       │ │
       │ 1. Multi-threaded file download │
       │ ──────────────────────────────> │
       │ │
       │ 2. Check if the file exists
       │ 3. The registration file is being transferred
       │ 4. Read file content
       │ 5. Cancellation file
       │ 6. If it is the last block (is_eof)
       │ Check whether it is an orphan file (nlink=1)
       │ If it is an orphan, join the delayed cleanup queue (10 minutes)
       │ │
       │ 7. Return file data │
       │ <────────────────────────────────│
       │ │
       │ (Repeat until all files are downloaded) │

Stage 5: Cleanup
┌─────────────┐ ┌─────────────┐
│ Slave │ │ Master │
└──────┬──────┘ └──────┬───────┘
       │ │
       │ 1. Download completed, close the connection │
       │───────X──────────────────────────>│
       │ │
       │ 2. The connection is disconnected and RsyncServerConn is destroyed
       │ 3. Release dump occupation
       │ (ReleaseDump)
       │ 4. Log out snapshot
       │ (UnregisterSnapshot)
       │ │
       │ 5. AutoDeleteExpiredDump scheduled execution
       │ Processing delayed cleanup queue (ProcessPendingCleanupFiles)
       │ Delete expired dump directory
       │ (Note: CleanupOrphanSstFiles has been removed and delayed cleanup will be processed uniformly)
       │ │

2.2 Master status changes

Stage Status Description
T0 no dump initial state
T1 bgsaving Create dump-20260305-0/db0
T2 dump available bgsave completed, waiting for Meta request
T3 dump occupied Meta request received, marked as occupied
T4 Transferring File transfer in progress, instant cleanup in progress
T5 dump release Slave disconnect, release occupation
T6 dump expired AutoDeleteExpiredDump delete expired dump

2.3 Slave status changes

Stage Status Description
T0 kTryConnect Try to connect to Master
T1 kWaitDBSync Wait for Master bgsave to complete
T2 kWaitDBSync Get file list and start downloading
T3 kWaitDBSync File downloading
T4 kConnected Full synchronization completed, incremental synchronization started

2.4 Data changes

Master disk usage changes:

Time point Data directory Dump directory Total
Initial 100GB 0 100GB
bgsave medium 100GB 0 (hard links are not occupied) 100GB
After compaction 100GB Some orphan files 100GB + orphan files
Transferring 100GB 100GB (dump) 200GB
Transfer completed 100GB Orphan files will be cleaned with a 10-minute delay 100GB ~ 200GB

3. Multi-Slave synchronization process

3.1 Scene description

  • Master has 100GB data
  • Slave-1 initiates synchronization first
  • Slave-2 initiates synchronization during Slave-1 synchronization process

3.2 Process Timing

Timeline:
T0:
  Slave-1──DBSync──> Master
  Master: IsBgSaving? No
  Master: trigger BgSaveDB()
  Master: Create dump-20260305-0/db0
  Slave-1 <──kErr─── Master (waiting for bgsave)

T30s:
  Master: bgsave completed
  Slave-1──DBSync──> Master
  Master: Get file list (dump-0)
  Master: MarkDumpInUse(dump-0, Slave-1)
  Slave-1 <──File list── Master
  Slave-1 starts downloading...

T31s:
  Slave-2 ──DBSync──> Master
  Master: IsDumpInUse(dump-0)? Yes (occupied by Slave-1)
  Master: trigger new BgSaveDB()
  Master: Create dump-20260305-1/db0
  Slave-2 <──kErr─── Master (waiting for new bgsave)

T61s:
  Master: New bgsave completed
  Slave-2 ──DBSync──> Master
  Master: MarkDumpInUse(dump-1, Slave-2)
  Slave-2 <──File list── Master
  Slave-2 starts downloading...

T120s:
  Slave-1: Download completed, disconnect
  Master: ReleaseDump(dump-0)
  Master: Delete dump-0 (AutoDeleteExpiredDump)

T180s:
  Slave-2: Download completed, disconnect
  Master: ReleaseDump(dump-1)
  Master: Delete dump-1

3.3 Key limitations

  • Maximum number of concurrent dumps: 3 (kMaxConcurrentDumps = 3)
  • Limit exceeded: Return kErr, Slave tries again

4. Single Slave multiple DB synchronization process

4.1 Scene description

  • Master configures 3 DBs: db0, db1, db2
  • Each DB has an independent RocksDB instance (db-instance-num=3)
  • Slave synchronizes all DBs at the same time

4.2 Directory structure

dump/dump-20260305-0/
├── db0/
│ ├── 0/ # RocksDB instance 0
│ │ ├── 000001.sst
│ │ └── 000002.sst
│ ├── 1/ # RocksDB instance 1
│ │ └── 000003.sst
│ ├── 2/ # RocksDB instance 2
│ │ └── 000004.sst
│ └── info # dump meta information
├── db1/
│ ├── 0/
│ ├── 1/
│ ├── 2/
│ └── info
└── db2/
    ├── 0/
    ├── 1/
    ├── 2/
    └── info

4.3 File naming rules

  • Slave request format: {rocksdb_instance}/{filename}
  • Example: 0/000001.sst, 1/000003.sst
  • NOTE: Does not include db0/db1/db2 prefix

4.4 Synchronization process

Each DB is synchronized independently:

  1. Slave sends a DBSync request for db0
  2. Master returns the file list of db0
  3. Slave downloads all files of db0
  4. Repeat steps 1-3 for db1 and db2

4.5 Potential Problems

Problem: Info file location is inconsistent

  • AutoDeleteExpiredDump looks for: dump/dump-xxx/info
  • Actual location: dump/dump-xxx/db0/info

FIXED: Try db0/info first, then fall back to info


5. Multi-Slave multi-DB synchronization process

This is the most complex scenario of Scheme A, which combines the characteristics of multiple slaves and multiple DBs.

5.1 Scene description

  • Master: 3 DBs (db0, db1, db2)
  • Slave-1: synchronize db0, db1, db2
  • Slave-2: synchronize db0, db1, db2

5.2 Dump occupation mechanism

Option A Design: Each Slave exclusively owns the entire dump directory (including all DBs)

Slave-1 occupies dump-20260305-0:
├── db0 (transmitting)
├── db1 (transmitting)
└── db2 (transmitting)

Slave-2 occupies dump-20260305-1:
├── db0 (transmitting)
├── db1 (transmitting)
└── db2 (transmitting)

5.3 Occupancy check

  • Check granularity: entire dump directory
  • If one Slave is using dump-0, other Slave cannot use it
  • Trigger new bgsave to create dump-1

5.4 Potential issues

Question 1: DB level granularity vs Dump level granularity

  • Current design: dump level occupancy
  • If Slave-1 only synchronizes db0, dump-0 still cannot be used by Slave-2
  • Waste of disk space

Question 2: Orphan file cleaning for multiple DBs

  • AutoDeleteExpiredDump only checks db0/info
  • If db1 or db2 is still being transferred, it may be mistakenly determined to be cleanable

6. Orphan file cleaning mechanism (unified delayed cleaning)

6.1 Trigger conditions

Orphan file: SST file with nlink=1 (only referenced by dump, not by RocksDB)

Cause:

  • RocksDB compaction delete old SST
  • hard links in dump become orphaned

6.2 Unified cleaning strategy

Design change: Remove the CleanupOrphanSstFiles function and use delayed cleanup queue uniformly

New Cleanup Process:

1. When the file transfer is completed (RemoveTransferringFile)
   - Check is_eof=true (last block transfer completed)
   - stat check file nlink
   - If nlink=1 (orphan file):
     * Add delayed cleanup queue (ScheduleFileForCleanup, delay 600 seconds)
     * Log "Scheduled orphan file for cleanup"
   - If nlink=2 (not orphaned):
     * No processing, RocksDB will manage the life cycle

2. AutoDeleteExpiredDump is executed regularly (every 60 seconds)
   - Call ProcessPendingCleanupFiles()
   - Check the queue for expired files
   - Delete expired files and log "Deleted delayed cleanup file"
   - Also check and delete expired dump directories

6.3 Protection mechanism

Protection level Description Implementation location
In-transfer protection Files in transfer will not be cleaned rsync_transferring_files_
Delay protection Orphan files are deleted with a 10-minute delay to give the Slave time to retry ScheduleFileForCleanup(filepath, 600)
nlink check Only clean up orphan files (nlink=1) to avoid accidental deletion stat check

6.4 Timing description

T0: File transfer completed (is_eof=true)
  └─> RemoveTransferringFile check nlink==1
      └─> ScheduleFileForCleanup(filepath, 600) Add to queue
          └─> Log: "Scheduled orphan file for cleanup in 10min"

T0+10min: AutoDeleteExpiredDump scheduled execution
  └─> ProcessPendingCleanupFiles()
      └─> Check due documents
          └─> Delete files
              └─> Log: "Deleted delayed cleanup file"

6.5 Comparison: old solution vs new solution

Aspects Old solution (CleanupOrphanSstFiles) New solution (Unified Delayed Cleanup)
Trigger timing Regularly scan all dump directories Check immediately when transfer is completed
Cleaning delay Undefined scan cycle Fixed delay of 10 minutes
Race conditions Possible competition with the transfer process No competition, unified entry
Code complexity ~170 lines of independent functions ~15 lines of integrated logic
Slave retry May fail (file has been deleted) Can retry within 10 minutes

7. Bug List

7.1 Bugs to be fixed

Bug Impact Severity Fix
Orphan file cleanup granularity issue in multi-DB scenarios db1/db2 may be misjudged during transmission Medium Check the info files of all DBs
Disk waste of multiple Slaves and multiple DBs Each Slave exclusively occupies the entire dump Low Supports DB level occupancy

8. To-do list

8.1 High priority

  • Unified orphan file cleaning mechanism (completed)

    • Removed CleanupOrphanSstFiles function
    • Use RemoveTransferringFile + delayed cleaning of the queue uniformly
    • Delay for 10 minutes to give Slave time to retry
  • Fix multi-DB orphan file cleaning granularity issue

    • Currently only checking db0/info
    • Need to check all DB subdirectories
    • If any DB is in use, the entire dump should be protected

8.2 Medium priority

  • Optimize the disk usage of multiple slaves and multiple DBs

    • Currently: each Slave exclusively owns the entire dump
    • Optimization: Support DB level occupancy
    • Impact: The occupancy management logic needs to be modified
  • Improve monitoring indicators

    • Dump occupied quantity
    • Orphan file cleaning statistics
      -Transmission failure rate

8.3 Low priority

  • Support dynamic adjustment of concurrency limits

    • Current: compile-time constant kMaxConcurrentDumps=3
    • Optimization: Support configuration hot update
  • Dump compressed transmission

    • Reduce network bandwidth
    • Tradeoff between CPU and network

9. Configuration suggestions

9.1 Key configuration items

#pika.conf

# dump directory prefix
dump-prefix : dump-

# dump directory path
dump-path: ./dump/

# dump expiration time (days)
# 0 means never expires
dump-expire: 1

# Number of RocksDB instances
db-instance-num : 3

#Maximum number of concurrent dumps (configured at compile time)
#kMaxConcurrentDumps = 3

9.2 Deployment recommendations

  1. Disk space: Reserve 3 × data amount space
  2. Monitoring: Monitor the number of dump directories and disk usage
  3. Log: Pay attention to [Rsync Meta], [RsyncTransfer], [Scheduled orphan file, [Deleted delayed cleanup logs

10. Appendix

10.1 Key logs

# View Meta request processing
grep "Rsync Meta" log/pika.INFO

# View file transfer
grep "RsyncTransfer" log/pika.INFO

# Check the orphan file delayed cleanup schedule
grep "Scheduled orphan file" log/pika.INFO

# View delayed cleanup execution
grep "Deleted delayed cleanup file" log/pika.INFO

# Check dump occupancy
grep "DumpOwnership" log/pika.INFO

# View errors
grep "File no longer exists" log/pika.WARNING

10.2 Status code description

Status code Meaning Processing
kOk Success Continue processing
kErr Error Slave retry

10.3 File path specification

Type Format Example
Dump directory dump-YYYYMMDD-NN/db_name dump-20260305-0/db0
RocksDB instance {rocksdb_instance}/ 0/, 1/, 2/
SST file {instance}/{filename}.sst 0/000001.sst
Info file db_name/info db0/info

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR targets multiple stability issues in storage backup/rsync full-sync flows (orphan SST files, dump lifecycle/cleanup) and also adjusts some RocksDB-related defaults and build/test toggles.

Changes:

  • Add a “get checkpoint files + immediately create checkpoint” flow to reduce the compaction window that can produce orphan SSTs during bgsave.
  • Introduce rsync snapshot/dump ownership tracking plus delayed orphan-file cleanup to avoid deleting files still needed by syncing slaves.
  • Make test building optional via BUILD_TESTS, and adjust several config/default behaviors (rate limiter, wash-data, etc.).

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
src/storage/src/backupable.cc Adds SetBackupContentAndCreate to minimize the gap between live-file listing and checkpoint creation.
src/storage/include/storage/backupable.h Declares the new SetBackupContentAndCreate API.
src/storage/include/storage/storage_define.h Fixes pointer advancement in EncodeUserKey when \x00 exists.
src/storage/CMakeLists.txt Makes storage tests conditional on BUILD_TESTS.
src/rsync_server.cc Adds dump reservation, integrity checks, snapshot/file transfer tracking, and delayed orphan cleanup scheduling.
include/rsync_server.h Exposes snapshot/file tracking APIs and adds per-connection tracking state.
src/rsync_client.cc Adds retry/error behavior for missing files and adjusts local tracking update logic.
src/pstd/src/env.cc Wraps GetChildren with exception handling and logging for filesystem errors.
src/pstd/CMakeLists.txt Makes pstd tests conditional on BUILD_TESTS.
src/pika_server.cc Implements global rsync snapshot/file tracking, dump ownership, delayed cleanup processing, and dump cleanup policy updates.
include/pika_server.h Declares new dump ownership / rsync tracking / delayed cleanup APIs and state.
src/pika_db.cc Uses the new immediate-checkpoint backup path and introduces unique dump directory naming with sequence suffixes.
src/pika_rm.cc Rate-limits rsync retry logs.
src/pika_conf.cc Fixes integer literal width for rate limiter default bandwidth.
src/pika_command.cc Adjusts HLEN command flags to avoid cache/TTL anomalies.
conf/pika.conf Changes sample/default settings for rate limiter and wash-data.
CMakeLists.txt Adds BUILD_TESTS option, gates enable_testing(), and adjusts build version source inclusion.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +125 to +153
if (!snapshot_uuid_.empty() && !filename.empty()) {
std::lock_guard<std::mutex> guard(mu_);
transferring_files_.insert(filename);
g_pika_server->RegisterRsyncTransferringFile(snapshot_uuid_, filename);
}
}

void RsyncServerConn::RemoveTransferringFile(const std::string& filename, bool is_eof) {
if (!snapshot_uuid_.empty() && !filename.empty()) {
std::lock_guard<std::mutex> guard(mu_);
transferring_files_.erase(filename);
g_pika_server->UnregisterRsyncTransferringFile(snapshot_uuid_, filename);

// Only process cleanup when file transfer is complete (is_eof=true)
if (is_eof) {
std::string dump_path = g_pika_server->GetDumpPathBySnapshot(snapshot_uuid_);
std::string filepath = dump_path + "/" + filename;

// Check if file is orphan (nlink=1, only referenced by dump, not by db)
struct stat st;
if (stat(filepath.c_str(), &st) == 0 && st.st_nlink == 1) {
// Orphan file: schedule for delayed cleanup (10 minutes)
// This allows Slave to retry if needed before actual deletion
g_pika_server->ScheduleFileForCleanup(filepath, 600);
LOG(INFO) << "[RsyncTransfer] Scheduled orphan file for cleanup: " << filename
<< " for snapshot: " << snapshot_uuid_;
}
// Non-orphan files (nlink=2) are still referenced by RocksDB, no cleanup needed
}
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AddTransferringFile/RemoveTransferringFile hold mu_ while calling into g_pika_server and doing filesystem work (stat, path building, scheduling cleanup). This can unnecessarily block parallel rsync reads and increases deadlock risk due to lock-ordering across different mutexes. Limit mu_ to only protecting transferring_files_ (and snapshot state), then perform server calls and stat/cleanup decisions after releasing the lock using local copies of snapshot_uuid_/dump_path.

Suggested change
if (!snapshot_uuid_.empty() && !filename.empty()) {
std::lock_guard<std::mutex> guard(mu_);
transferring_files_.insert(filename);
g_pika_server->RegisterRsyncTransferringFile(snapshot_uuid_, filename);
}
}
void RsyncServerConn::RemoveTransferringFile(const std::string& filename, bool is_eof) {
if (!snapshot_uuid_.empty() && !filename.empty()) {
std::lock_guard<std::mutex> guard(mu_);
transferring_files_.erase(filename);
g_pika_server->UnregisterRsyncTransferringFile(snapshot_uuid_, filename);
// Only process cleanup when file transfer is complete (is_eof=true)
if (is_eof) {
std::string dump_path = g_pika_server->GetDumpPathBySnapshot(snapshot_uuid_);
std::string filepath = dump_path + "/" + filename;
// Check if file is orphan (nlink=1, only referenced by dump, not by db)
struct stat st;
if (stat(filepath.c_str(), &st) == 0 && st.st_nlink == 1) {
// Orphan file: schedule for delayed cleanup (10 minutes)
// This allows Slave to retry if needed before actual deletion
g_pika_server->ScheduleFileForCleanup(filepath, 600);
LOG(INFO) << "[RsyncTransfer] Scheduled orphan file for cleanup: " << filename
<< " for snapshot: " << snapshot_uuid_;
}
// Non-orphan files (nlink=2) are still referenced by RocksDB, no cleanup needed
}
if (filename.empty()) {
return;
}
std::string snapshot_uuid_copy;
{
std::lock_guard<std::mutex> guard(mu_);
snapshot_uuid_copy = snapshot_uuid_;
if (!snapshot_uuid_copy.empty()) {
transferring_files_.insert(filename);
}
}
if (!snapshot_uuid_copy.empty()) {
g_pika_server->RegisterRsyncTransferringFile(snapshot_uuid_copy, filename);
}
}
void RsyncServerConn::RemoveTransferringFile(const std::string& filename, bool is_eof) {
if (filename.empty()) {
return;
}
std::string snapshot_uuid_copy;
{
std::lock_guard<std::mutex> guard(mu_);
snapshot_uuid_copy = snapshot_uuid_;
if (!snapshot_uuid_copy.empty()) {
transferring_files_.erase(filename);
}
}
if (snapshot_uuid_copy.empty()) {
return;
}
g_pika_server->UnregisterRsyncTransferringFile(snapshot_uuid_copy, filename);
// Only process cleanup when file transfer is complete (is_eof=true)
if (is_eof) {
std::string dump_path = g_pika_server->GetDumpPathBySnapshot(snapshot_uuid_copy);
std::string filepath = dump_path + "/" + filename;
// Check if file is orphan (nlink=1, only referenced by dump, not by db)
struct stat st;
if (stat(filepath.c_str(), &st) == 0 && st.st_nlink == 1) {
// Orphan file: schedule for delayed cleanup (10 minutes)
// This allows Slave to retry if needed before actual deletion
g_pika_server->ScheduleFileForCleanup(filepath, 600);
LOG(INFO) << "[RsyncTransfer] Scheduled orphan file for cleanup: " << filename
<< " for snapshot: " << snapshot_uuid_copy;
}
// Non-orphan files (nlink=2) are still referenced by RocksDB, no cleanup needed

Copilot uses AI. Check for mistakes.
Comment on lines +302 to +315
// Compare filenames (from GetDumpMeta) with actual_files (re-scanned)
std::vector<std::string> missing_files;
for (const auto& expected : filenames) {
bool found = false;
for (const auto& actual : actual_files) {
if (actual == expected) {
found = true;
break;
}
}
if (!found && expected != "info") { // info file is handled separately
missing_files.push_back(expected);
}
}
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second dump integrity check does an O(N*M) nested loop to match filenames against actual_files, which can become expensive with large snapshots. Convert actual_files to a hash set (or sort both vectors and do a linear merge) so the missing-file check is O(N).

Copilot uses AI. Check for mistakes.
Comment on lines +242 to +251
// If Master returns empty response, it means the file was cleaned up during sync
// This is an error condition - the sync should fail and retry
if (ret_count == 0 && resp->file_resp().eof()) {
LOG(ERROR) << "File not available on Master at offset " << offset
<< ", filename: " << filename
<< ". Will retry and may trigger new bgsave.";
s = Status::IOError("File not available on Master");
return s;
}

Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Treating ret_count == 0 && eof == true as an error will break valid zero-length files: the first (and only) read would legitimately return 0 bytes with EOF set, but this code returns an IOError. Missing files are already signaled by resp->code() != kOk (and the server now explicitly returns kErr when the file is gone), so this check should be removed or tightened (e.g., only error when ret_count == 0 && eof == true AND offset > 0, or when an explicit server-side error is returned).

Suggested change
// If Master returns empty response, it means the file was cleaned up during sync
// This is an error condition - the sync should fail and retry
if (ret_count == 0 && resp->file_resp().eof()) {
LOG(ERROR) << "File not available on Master at offset " << offset
<< ", filename: " << filename
<< ". Will retry and may trigger new bgsave.";
s = Status::IOError("File not available on Master");
return s;
}

Copilot uses AI. Check for mistakes.
Comment on lines +2064 to +2082
// Utility function to ensure directory exists
// Returns true if directory exists or was created successfully
// Handles the special case where CreatePath returns 0 for both success and "already exists"
bool PikaServer::EnsureDirExists(const std::string& path, mode_t mode) {
// First check if directory already exists
if (pstd::FileExists(path)) {
return true;
}
// Directory doesn't exist, try to create it
int ret = pstd::CreatePath(path, mode);
// CreatePath returns 0 on success, -1 on failure
// Note: CreatePath also returns -1 if directory already exists (due to the
// !filesystem::create_directories check), but we already checked FileExists above
if (ret != 0) {
LOG(WARNING) << "Failed to create directory: " << path << ", error code: " << ret;
return false;
}
// Verify directory was created successfully
return pstd::FileExists(path);
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EnsureDirExists returns true for any existing path via pstd::FileExists(path), even if path is a regular file. Callers assume this guarantees a directory exists, so this can lead to later failures when treating a file as a directory. Use pstd::IsDir(path) == 0 for the fast path (and consider updating the comment above as it currently describes CreatePath behavior inaccurately).

Suggested change
// Utility function to ensure directory exists
// Returns true if directory exists or was created successfully
// Handles the special case where CreatePath returns 0 for both success and "already exists"
bool PikaServer::EnsureDirExists(const std::string& path, mode_t mode) {
// First check if directory already exists
if (pstd::FileExists(path)) {
return true;
}
// Directory doesn't exist, try to create it
int ret = pstd::CreatePath(path, mode);
// CreatePath returns 0 on success, -1 on failure
// Note: CreatePath also returns -1 if directory already exists (due to the
// !filesystem::create_directories check), but we already checked FileExists above
if (ret != 0) {
LOG(WARNING) << "Failed to create directory: " << path << ", error code: " << ret;
return false;
}
// Verify directory was created successfully
return pstd::FileExists(path);
// Utility function to ensure a directory exists at the given path.
// Returns true if the path already refers to a directory, or if a directory
// was created (or found to already exist) successfully.
// Note: pstd::CreatePath returns 0 both when the directory is created and
// when it already exists, and a non-zero value on failure.
bool PikaServer::EnsureDirExists(const std::string& path, mode_t mode) {
// Fast path: check if a directory already exists at the given path.
if (pstd::IsDir(path) == 0) {
return true;
}
// Directory doesn't exist, try to create it.
int ret = pstd::CreatePath(path, mode);
if (ret != 0) {
LOG(WARNING) << "Failed to create directory: " << path << ", error code: " << ret;
return false;
}
// Verify that the resulting path is indeed a directory.
return pstd::IsDir(path) == 0;

Copilot uses AI. Check for mistakes.
Comment on lines +1022 to +1036
void PikaServer::ProcessPendingCleanupFiles() {
std::lock_guard<std::mutex> lock(pending_cleanup_mutex_);
time_t now = time(nullptr);
int cleaned_count = 0;

for (auto it = pending_cleanup_files_.begin();
it != pending_cleanup_files_.end(); ) {
if (now >= it->second.cleanup_time) {
// Check if file still exists and is still an orphan (nlink=1)
if (pstd::FileExists(it->second.filepath)) {
struct stat st;
if (stat(it->second.filepath.c_str(), &st) == 0 && st.st_nlink == 1) {
pstd::DeleteFile(it->second.filepath);
cleaned_count++;
LOG(INFO) << "[Cleanup] Deleted delayed cleanup file: " << it->second.filepath;
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ProcessPendingCleanupFiles holds pending_cleanup_mutex_ while doing potentially slow I/O (FileExists, stat, DeleteFile) and logging. This can block concurrent scheduling of new cleanup entries and prolong lock hold times during disk stalls. Collect the expired filepaths under the mutex, erase them from the map, then release the lock and perform stat/deletes/logging outside the critical section.

Copilot uses AI. Check for mistakes.
Comment on lines 497 to +508
@@ -505,7 +505,7 @@ default-slot-num : 1024
# if auto_tuned is true: Enables dynamic adjustment of rate limit within the range
#`[rate-limiter-bandwidth / 20, rate-limiter-bandwidth]`, according to the recent demand for background I/O.
# rate limiter auto tune https://rocksdb.org/blog/2017/12/18/17-auto-tuned-rate-limiter.html. the default value is true.
#rate-limiter-auto-tuned : yes
rate-limiter-auto-tuned : no
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments around the rate limiter settings no longer match the effective defaults in this file (e.g., it now enforces a non-default rate-limiter-bandwidth and sets rate-limiter-auto-tuned to no while the comment says the default is true/no-limit). Please update the surrounding comments to reflect the new defaults, or keep these lines commented-out if they are meant as optional overrides rather than shipped defaults.

Copilot uses AI. Check for mistakes.
@@ -673,7 +673,7 @@ internal-used-unfinished-full-sync :
# for wash data from 4.0.0 to 4.0.1
# https://github.com/OpenAtomFoundation/pika/issues/2886
# default value: true
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says wash-data default value is true, but the config now sets wash-data: false. Update the comment to match the new default (or keep the setting commented-out if this is only intended as an example override).

Suggested change
# default value: true
# default value: false

Copilot uses AI. Check for mistakes.
Comment on lines +81 to +121
{
std::lock_guard<std::mutex> guard(mu_);
for (int i = 0; i < readers_.size(); i++) {
readers_[i].reset();
}
}
// Release dump ownership when connection closes (Scheme A)
if (!snapshot_uuid_.empty()) {
LOG(INFO) << "[RsyncServerConn] Connection " << conn_id_ << " closing, releasing dump " << snapshot_uuid_;
g_pika_server->ReleaseDump(snapshot_uuid_);
}
// Unregister snapshot when connection closes (outside of mu_ lock)
UnregisterSnapshot();
}

void RsyncServerConn::RegisterSnapshot(const std::string& snapshot_uuid) {
if (!snapshot_uuid.empty() && snapshot_uuid_ != snapshot_uuid) {
// Unregister old snapshot if different
if (!snapshot_uuid_.empty()) {
UnregisterSnapshot();
}
snapshot_uuid_ = snapshot_uuid;
g_pika_server->RegisterRsyncSnapshot(snapshot_uuid_);
}
}

void RsyncServerConn::UnregisterSnapshot() {
if (!snapshot_uuid_.empty()) {
// Clear any remaining transferring files
std::set<std::string> remaining_files;
{
std::lock_guard<std::mutex> guard(mu_);
remaining_files = transferring_files_;
transferring_files_.clear();
}
for (const auto& file : remaining_files) {
g_pika_server->UnregisterRsyncTransferringFile(snapshot_uuid_, file);
}
g_pika_server->UnregisterRsyncSnapshot(snapshot_uuid_);
snapshot_uuid_.clear();
}
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

snapshot_uuid_ is read/written without holding mu_ (e.g., comparing/assigning in RegisterSnapshot, and checked in Add/RemoveTransferringFile). Since meta/file requests are scheduled on a thread pool, these methods can run concurrently on the same connection, which makes snapshot_uuid_ a data race (UB) on std::string. Guard all accesses to snapshot_uuid_ with a mutex (e.g., reuse mu_ or add a dedicated mutex) and avoid calling UnregisterSnapshot() while holding the same lock (copy the old uuid first, then unregister outside the lock).

Suggested change
{
std::lock_guard<std::mutex> guard(mu_);
for (int i = 0; i < readers_.size(); i++) {
readers_[i].reset();
}
}
// Release dump ownership when connection closes (Scheme A)
if (!snapshot_uuid_.empty()) {
LOG(INFO) << "[RsyncServerConn] Connection " << conn_id_ << " closing, releasing dump " << snapshot_uuid_;
g_pika_server->ReleaseDump(snapshot_uuid_);
}
// Unregister snapshot when connection closes (outside of mu_ lock)
UnregisterSnapshot();
}
void RsyncServerConn::RegisterSnapshot(const std::string& snapshot_uuid) {
if (!snapshot_uuid.empty() && snapshot_uuid_ != snapshot_uuid) {
// Unregister old snapshot if different
if (!snapshot_uuid_.empty()) {
UnregisterSnapshot();
}
snapshot_uuid_ = snapshot_uuid;
g_pika_server->RegisterRsyncSnapshot(snapshot_uuid_);
}
}
void RsyncServerConn::UnregisterSnapshot() {
if (!snapshot_uuid_.empty()) {
// Clear any remaining transferring files
std::set<std::string> remaining_files;
{
std::lock_guard<std::mutex> guard(mu_);
remaining_files = transferring_files_;
transferring_files_.clear();
}
for (const auto& file : remaining_files) {
g_pika_server->UnregisterRsyncTransferringFile(snapshot_uuid_, file);
}
g_pika_server->UnregisterRsyncSnapshot(snapshot_uuid_);
snapshot_uuid_.clear();
}
std::string snapshot_uuid_copy;
{
std::lock_guard<std::mutex> guard(mu_);
for (int i = 0; i < readers_.size(); i++) {
readers_[i].reset();
}
snapshot_uuid_copy = snapshot_uuid_;
}
// Release dump ownership when connection closes (Scheme A)
if (!snapshot_uuid_copy.empty()) {
LOG(INFO) << "[RsyncServerConn] Connection " << conn_id_ << " closing, releasing dump " << snapshot_uuid_copy;
g_pika_server->ReleaseDump(snapshot_uuid_copy);
}
// Unregister snapshot when connection closes (UnregisterSnapshot handles locking)
UnregisterSnapshot();
}
void RsyncServerConn::RegisterSnapshot(const std::string& snapshot_uuid) {
if (snapshot_uuid.empty()) {
return;
}
// Take a snapshot of the current uuid under lock
std::string current_snapshot_uuid;
{
std::lock_guard<std::mutex> guard(mu_);
current_snapshot_uuid = snapshot_uuid_;
}
if (current_snapshot_uuid == snapshot_uuid) {
return;
}
// Unregister old snapshot (if any) outside the lock
if (!current_snapshot_uuid.empty()) {
UnregisterSnapshot();
}
// Set the new snapshot uuid under lock
{
std::lock_guard<std::mutex> guard(mu_);
snapshot_uuid_ = snapshot_uuid;
}
// Register new snapshot using the provided uuid
g_pika_server->RegisterRsyncSnapshot(snapshot_uuid);
}
void RsyncServerConn::UnregisterSnapshot() {
// Copy state under lock, then perform unregister operations outside the lock
std::string snapshot_uuid_copy;
std::set<std::string> remaining_files;
{
std::lock_guard<std::mutex> guard(mu_);
if (snapshot_uuid_.empty()) {
return;
}
snapshot_uuid_copy = snapshot_uuid_;
remaining_files = transferring_files_;
transferring_files_.clear();
snapshot_uuid_.clear();
}
// Clear any remaining transferring files for this snapshot
for (const auto& file : remaining_files) {
g_pika_server->UnregisterRsyncTransferringFile(snapshot_uuid_copy, file);
}
g_pika_server->UnregisterRsyncSnapshot(snapshot_uuid_copy);

Copilot uses AI. Check for mistakes.
在 RemoveTransferringFile 中增加文件类型检查,只有 .sst 后缀的文件才可能是孤儿文件(硬链接)。
其他文件如 info、CURRENT、MANIFEST、OPTIONS、.log 等都是在 dump 过程中新生成的,
nlink=1 是正常情况,不应被清理。

修复前:info 文件被错误识别为孤儿文件并调度清理,导致 dump 目录失去保护机制
修复后:只有 SST 文件会进入孤儿文件检查和延迟清理流程
@coderabbitai
Copy link

coderabbitai bot commented Mar 9, 2026

📝 Walkthrough

Walkthrough

This PR extends Rsync snapshot and dump lifecycle management in Pika by introducing tracking APIs for active rsync snapshots and transferring files, dump ownership management with concurrent limits, delayed file cleanup mechanisms, and enhanced backup creation logic. Additionally, it refines CMake build configuration, adjusts default settings, and improves error handling in file operations.

Changes

Cohort / File(s) Summary
Build System Configuration
CMakeLists.txt, src/pstd/CMakeLists.txt, src/storage/CMakeLists.txt
Added BUILD_TESTS CMake option to conditionally enable/disable test compilation. Removed src/build_version.cc from source collection and simplified PIKA_BUILD_VERSION_CC variable declaration.
Configuration Defaults
conf/pika.conf
Updated default values: rate-limiter-bandwidth reduced from 1099511627776 to 109951162, rate-limiter-auto-tuned changed from yes to no, wash-data changed from true to false.
Rsync Server Connection Tracking
include/rsync_server.h, src/rsync_server.cc
Added snapshot lifecycle management (RegisterSnapshot, UnregisterSnapshot, GetSnapshotUuid) and per-connection file transfer tracking (AddTransferringFile, RemoveTransferringFile, IsFileTransferring, GetTransferringFiles). Introduced global transfer state queries and enhanced HandleMetaRsyncRequest/HandleFileRsyncRequest with dump ownership, integrity checks, and cleanup safeguards.
PikaServer Dump & Rsync Management APIs
include/pika_server.h, src/pika_server.cc
Added comprehensive rsync snapshot tracking (RegisterRsyncSnapshot, UnregisterRsyncSnapshot, IsRsyncSnapshotActive, GetActiveRsyncSnapshots), per-snapshot transfer file tracking, dump ownership management with max concurrent limits (MarkDumpInUse, ReleaseDump, IsDumpInUse, GetDumpPathBySnapshot, GetActiveDumpCount), delayed cleanup scheduling (ScheduleFileForCleanup, ProcessPendingCleanupFiles), and utility method EnsureDirExists. Includes thread-safe state structures with corresponding mutexes and data holders.
Database Bgsave & Dump Management
src/pika_db.cc
Replaced backup creation with SetBackupContentAndCreate for tighter timing. Implemented new daily dump directory naming scheme (dump-YYYYMMDDNN) with sequence number allocation. Enhanced GetBgSaveMetaData to scan subdirectories, detect orphan SST files (st_nlink == 1), and validate file existence with logging.
Backup Engine Enhancement
src/storage/include/storage/backupable.h, src/storage/src/backupable.cc
Added new public method SetBackupContentAndCreate to perform content gathering and immediate checkpoint creation in a single operation, reducing the time window between GetLiveFiles and checkpoint creation.
Rsync Client Error Handling
src/rsync_client.cc
Added error condition for empty master response (ret_count == 0 and eof) indicating file unavailability. Modified ComparisonUpdate to replace file_set_ with remote_file_set to align with master state and handle deletions correctly.
Command & Configuration
src/pika_command.cc, src/pika_conf.cc
Removed kCmdFlagsUpdateCache flag from HLenCmd initialization. Changed rate-limiter bandwidth default to 64-bit literal (10LL \* 1024 \* 1024 \* 1024) to prevent overflow.
Rsync Activation & File Operations
src/pika_rm.cc, src/pstd/src/env.cc
Added rate-limited logging (once per 30 seconds per database) for rsync activation. Enhanced GetChildren with try/catch for filesystem_error, returning -1 on failure.
Storage Key Encoding
src/storage/include/storage/storage_define.h
Fixed EncodeUserKey to advance dst_ptr by remaining length after copying trailing bytes with zero byte, preventing pointer misalignment before delimiter copy.

Sequence Diagram(s)

sequenceDiagram
    participant Client as RsyncClient
    participant Conn as RsyncServerConn
    participant Server as PikaServer
    participant Storage as Storage Layer
    participant Cleanup as Cleanup Worker

    Client->>Conn: HandleMetaRsyncRequest(snapshot)
    Conn->>Server: MarkDumpInUse(snapshot_uuid, conn_id, path)
    Server->>Server: Verify dump ownership & concurrent limits
    Server-->>Conn: Ownership acquired
    Conn->>Conn: RegisterSnapshot(snapshot_uuid)
    Conn->>Conn: Pre-register all files as transferring
    Conn-->>Client: Meta info + file list

    Client->>Conn: HandleFileRsyncRequest(filename)
    Conn->>Conn: AddTransferringFile(filename)
    Conn->>Storage: Read file content
    Storage-->>Conn: File data
    Conn->>Conn: RemoveTransferringFile(filename, is_eof=true)
    Conn->>Server: ScheduleFileForCleanup(if orphan)
    Conn-->>Client: File data

    Client->>Conn: Transfer complete
    Conn->>Server: ReleaseDump(snapshot_uuid)
    Conn->>Conn: UnregisterSnapshot()
    Server-->>Conn: Dump released

    Cleanup->>Server: ProcessPendingCleanupFiles()
    Server->>Storage: Delete orphan SST files
    Storage-->>Server: Cleanup complete
    Server->>Server: Log cleanup summary
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested labels

🧹 Updates

Suggested reviewers

  • wangshao1
  • chejinge
  • Mixficsol

Poem

🐰 A snapshot here, a dump tracked there,
With transfers safe and cleanup care!
Rsync now knows what's in the air,
Files move freely, orphans—beware!
The rabbit hops through schemes with flair! 🌿

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 10.71% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title clearly describes the main feature: adjustment of dump cleanup mechanism for master-slave synchronization to optimize storage space usage. This directly aligns with the primary objective of implementing Scheme A for orphan file cleanup.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
src/rsync_client.cc (1)

323-332: ⚠️ Potential issue | 🟠 Major

Keep file_set_ as the delta to preserve resume.

expired_files already handles master-side deletions. Replacing newly_files with remote_file_set forces every retry to re-copy files that are already complete locally, which defeats interrupted full-sync resume and amplifies IO.

🛠️ Suggested fix
-    // Replace file_set_ with remote_file_set to ensure files deleted on Master
-    // are also removed from local tracking
-    file_set_ = remote_file_set;
+    file_set_ = newly_files;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/rsync_client.cc` around lines 323 - 332, The current code replaces
file_set_ with remote_file_set which forces re-copy of all remote files on
retry; instead keep file_set_ as the delta so resume works: compute newly_files
(difference of remote_file_set and local_file_set) and assign file_set_ =
newly_files, leaving expired_files handling deletions; update the assignment
that currently sets file_set_ = remote_file_set to set file_set_ to the
newly_files set (using the existing newly_files variable) so only missing files
are retried.
src/rsync_server.cc (2)

437-456: ⚠️ Potential issue | 🔴 Critical

Serve file chunks from the connection-reserved snapshot.

The meta phase stores snapshot ownership on conn, but this handler ignores it and re-fetches the current GetDumpUUID(db_name) / db->bgsave_info().path on every file request. Once another slave triggers a newer bgsave, this connection starts reading the new dump mid-transfer and the client aborts on the snapshot UUID mismatch.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/rsync_server.cc` around lines 437 - 456, The handler currently re-queries
GetDumpUUID and db->bgsave_info().path causing it to switch snapshots
mid-transfer; instead read the snapshot UUID and bgsave path from the
connection's reserved snapshot metadata set during the meta phase (e.g. use the
connection's saved snapshot UUID and saved bgsave path fields/methods) and use
those values for response.set_snapshot_uuid(...) and filepath construction; if
the connection has no reserved snapshot metadata, return an error as before.
Update the section that calls g_pika_server->GetDumpUUID(db_name) and
g_pika_server->GetDB(db_name) to use conn's snapshot ownership fields (and keep
the existing error handling if the reserved snapshot is missing).

236-286: ⚠️ Potential issue | 🔴 Critical

Bind dump_path to the same snapshot returned by GetDumpMeta().

GetDumpMeta() gives this handler a specific snapshot_uuid and file list, but the code re-reads db_ptr->bgsave_info().path before the integrity checks and again when claiming ownership. In the concurrent full-sync flow this PR introduces, a newer bgsave can flip that path in between, so Line 273 can delete the wrong directory and Line 372 can associate that wrong path with the older snapshot_uuid. Please fetch {snapshot_uuid, dump_path, filenames} atomically and use that stable path throughout.

Also applies to: 370-372

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/rsync_server.cc` around lines 236 - 286, GetDumpMeta returns
snapshot_uuid and filenames but code later re-reads db_ptr->bgsave_info().path
which can race with concurrent bgsave; capture the dump_path atomically right
after GetDumpMeta (e.g., read db_ptr->bgsave_info().path into a local dump_path
variable immediately after obtaining db_ptr and snapshot_uuid) and use that
stored dump_path for the integrity checks, pstd::DeleteDirIfExist, and when
setting the response/claiming ownership instead of re-reading
db_ptr->bgsave_info().path; ensure all references (the integrity loop that
builds full_path, the LOG messages, the delete call
pstd::DeleteDirIfExist(dump_path), and the final assignment to dump_path used by
the connection) use this stable local variable so the path cannot flip between
checks and deletion.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@conf/pika.conf`:
- Around line 673-676: The config currently sets wash-data: false which prevents
the WashData() migration from running during upgrades; change the default to
wash-data: true so WashData() runs automatically on first startup after
upgrading from 4.0.0, and add a clear comment next to the wash-data entry
instructing operators to set wash-data: false after the first successful startup
(or include alternative upgrade docs); specifically update the wash-data default
and the adjacent comment block referenced by WashData() to reflect this
behavior.

In `@include/pika_server.h`:
- Around line 246-247: PendingCleanupInfo currently only holds filepath so the
delayed-cleanup queue can't determine transfer identity (snapshot_uuid +
filename); modify PendingCleanupInfo to include snapshot_uuid and filename (or a
single transfer_id composed of them), update ScheduleFileForCleanup to accept
snapshot_uuid and filename (in addition to filepath/delay_seconds) and ensure
ProcessPendingCleanupFiles uses the snapshot_uuid+filename identity when
checking whether a file became active again before deletion; touch all uses of
ScheduleFileForCleanup, ProcessPendingCleanupFiles, and any queue logic to
push/pop the updated PendingCleanupInfo structure so cleanup decisions use the
stable transfer identity.

In `@src/pika_db.cc`:
- Around line 344-352: The loop currently checks existence of base_path +
time_sub_path + "/" + bgsave_sub_path_, which only reserves per-DB subpaths and
allows different DBs to reuse the same dump root; change the check to reserve
the dump root itself by testing base_path + time_sub_path (the dump-YYYYMMDD-N
directory) instead of including bgsave_sub_path_. Update the construction of
full_path used in the pstd::FileExists call (and any other place that assumes
the checked path) to point at the dump root (using time_sub_path and base_path)
so each sequence number is exclusively reserved for the entire dump root.
- Around line 431-461: GetBgSaveMetaData must not silently drop files on scan
errors; change its signature to return a Status (e.g., Status
GetBgSaveMetaData(...)) and replace the current silent continue/return behavior
so that any failure from pstd::GetChildren, pstd::IsDir (when it returns -1),
pstd::FileExists checks that indicate unexpected state, or stat(fullPath) != 0
returns a non-OK Status describing the problem. Update callers (the code path
that currently treats GetBgSaveMetaData as void and later returns OK to rsync)
to inspect and propagate the Status so rsync/replication will retry instead of
accepting a truncated manifest. Ensure you reference and update uses of
GetBgSaveMetaData, and keep logging but return error Status on any
directory/file scan failure.
- Around line 345-356: The loop increments seq before checking the 1000-limit,
so the final slot (-999) can be skipped; update the search to stop after
checking up to 1000 candidates by iterating with seq from 0 to <1000 and
breaking when pstd::FileExists(full_path) is false (e.g., replace the do/while
with a for loop or move seq++ to after the FileExists check), ensuring
time_sub_path/full_path are constructed using seq and using bgsave_sub_path_ and
db_name_ as before so the final slot is considered and the seq >= 1000 check
correctly indicates exhaustion.

In `@src/pika_server.cc`:
- Around line 1618-1627: The code currently derives snapshot_uuid using only
db0/info (info_path) and legacy /info, which risks treating multi-DB dumps as
inactive; update the logic around snapshot_uuid/dump_file to scan all db*/info
files under the dump directory (e.g., iterate directory entries matching "db*"
and read each info file to collect snapshot_uuid or any non-empty value) and
only treat the dump as inactive if no dbN/info yields a snapshot UUID;
additionally, if no snapshot_uuid is found, consult dump_owners_ by dump_path as
a fallback before allowing deletion to ensure active dumps tracked in
dump_owners_ are protected (apply same change to the other similar blocks
referencing snapshot_uuid/info_path at the indicated spots).
- Around line 2067-2082: EnsureDirExists currently uses pstd::FileExists which
returns true for regular files and does a racy pre-check; change it to verify
the path is a directory (use an is-directory check such as pstd::IsDirectory or
stat+S_ISDIR) instead of FileExists, avoid the race by not returning success on
the initial FileExists check (or always attempt CreatePath and then re-check
that the path is a directory), and after pstd::CreatePath handle the -1 case by
re-checking whether the path now exists and is a directory (treat that as
success) versus a real failure, referencing EnsureDirExists, pstd::FileExists,
and pstd::CreatePath in your changes.

In `@src/pstd/src/env.cc`:
- Around line 136-138: GetChildren currently returns -1 when
filesystem::is_empty(dir), which signals an error to callers like the Pika
scan/cleanup code; change the behavior so that when the directory exists but is
empty you leave the output container (result) empty and return 0, reserving -1
only for real filesystem errors. In short: in GetChildren, remove/replace the
early return -1 on filesystem::is_empty(dir) with logic that returns 0 and an
empty result, so callers of GetChildren see success with no children rather than
an error.

In `@src/rsync_client.cc`:
- Around line 242-250: The current check in rsync_client.cc incorrectly treats a
valid zero-byte file as missing when ret_count == 0 && resp->file_resp().eof();
change the logic so we only consider the file missing when the RPC response
indicates an error (response.code() != kOk or resp->code() != kOk) rather than
when count==0+eof; update the conditional around the LOG(ERROR) /
Status::IOError("File not available on Master") and return to only trigger on a
non-OK response code while allowing the kOk + eof + count==0 case to proceed as
a legitimate empty file; adjust references to ret_count,
resp->file_resp().eof(), and response.code()/resp->code() accordingly.

In `@src/storage/src/backupable.cc`:
- Around line 89-101: The code computes backup_dir via GetSaveDirByIndex and
calls delete_dir before calling CreateCheckpointWithFiles, but if delete_dir
fails or the checkpoint target already exists earlier in the flow (e.g., in
GetCheckpointFiles / db_checkpoint.cc), file deletions can remain disabled; fix
by validating or ensuring the checkpoint target is removable before calling
GetCheckpointFiles/CreateCheckpointWithFiles (e.g., check delete_dir return and
retry/clear stale dir) and on any early-return path re-enable deletions by
calling the appropriate re-enable helper (same one used by db_checkpoint.cc,
e.g., EnableFileDeletions) so deletions are always restored even on error paths
in CreateCheckpointWithFiles or when the backup dir already exists.

---

Outside diff comments:
In `@src/rsync_client.cc`:
- Around line 323-332: The current code replaces file_set_ with remote_file_set
which forces re-copy of all remote files on retry; instead keep file_set_ as the
delta so resume works: compute newly_files (difference of remote_file_set and
local_file_set) and assign file_set_ = newly_files, leaving expired_files
handling deletions; update the assignment that currently sets file_set_ =
remote_file_set to set file_set_ to the newly_files set (using the existing
newly_files variable) so only missing files are retried.

In `@src/rsync_server.cc`:
- Around line 437-456: The handler currently re-queries GetDumpUUID and
db->bgsave_info().path causing it to switch snapshots mid-transfer; instead read
the snapshot UUID and bgsave path from the connection's reserved snapshot
metadata set during the meta phase (e.g. use the connection's saved snapshot
UUID and saved bgsave path fields/methods) and use those values for
response.set_snapshot_uuid(...) and filepath construction; if the connection has
no reserved snapshot metadata, return an error as before. Update the section
that calls g_pika_server->GetDumpUUID(db_name) and g_pika_server->GetDB(db_name)
to use conn's snapshot ownership fields (and keep the existing error handling if
the reserved snapshot is missing).
- Around line 236-286: GetDumpMeta returns snapshot_uuid and filenames but code
later re-reads db_ptr->bgsave_info().path which can race with concurrent bgsave;
capture the dump_path atomically right after GetDumpMeta (e.g., read
db_ptr->bgsave_info().path into a local dump_path variable immediately after
obtaining db_ptr and snapshot_uuid) and use that stored dump_path for the
integrity checks, pstd::DeleteDirIfExist, and when setting the response/claiming
ownership instead of re-reading db_ptr->bgsave_info().path; ensure all
references (the integrity loop that builds full_path, the LOG messages, the
delete call pstd::DeleteDirIfExist(dump_path), and the final assignment to
dump_path used by the connection) use this stable local variable so the path
cannot flip between checks and deletion.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 74324e56-e82d-42af-8435-545104f4ca02

📥 Commits

Reviewing files that changed from the base of the PR and between 4484840 and 36d8f17.

📒 Files selected for processing (17)
  • CMakeLists.txt
  • conf/pika.conf
  • include/pika_server.h
  • include/rsync_server.h
  • src/pika_command.cc
  • src/pika_conf.cc
  • src/pika_db.cc
  • src/pika_rm.cc
  • src/pika_server.cc
  • src/pstd/CMakeLists.txt
  • src/pstd/src/env.cc
  • src/rsync_client.cc
  • src/rsync_server.cc
  • src/storage/CMakeLists.txt
  • src/storage/include/storage/backupable.h
  • src/storage/include/storage/storage_define.h
  • src/storage/src/backupable.cc

Comment on lines 673 to +676
# for wash data from 4.0.0 to 4.0.1
# https://github.com/OpenAtomFoundation/pika/issues/2886
# default value: true
wash-data: true
wash-data: false
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Disabling wash-data by default may break upgrades from 4.0.0 to 4.0.1.

The WashData() function (referenced in the comment at lines 673-675) is essential for migrating hash column family data to the correct internal format when upgrading. With wash-data: false as the default:

  1. Users upgrading from 4.0.0 won't automatically get their data migrated
  2. Hash values without the proper suffix encoding will remain inconsistent
  3. This could cause silent data corruption or read failures

Consider either:

  • Keeping the default as true and documenting that users should set it to false after the first successful startup post-upgrade
  • Adding prominent upgrade documentation warning users to set wash-data: true before their first 4.0.1 startup
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@conf/pika.conf` around lines 673 - 676, The config currently sets wash-data:
false which prevents the WashData() migration from running during upgrades;
change the default to wash-data: true so WashData() runs automatically on first
startup after upgrading from 4.0.0, and add a clear comment next to the
wash-data entry instructing operators to set wash-data: false after the first
successful startup (or include alternative upgrade docs); specifically update
the wash-data default and the adjacent comment block referenced by WashData() to
reflect this behavior.

Comment on lines +246 to +247
void ScheduleFileForCleanup(const std::string& filepath, int delay_seconds);
void ProcessPendingCleanupFiles();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Keep the same transfer identity in the delayed-cleanup queue.

PendingCleanupInfo only stores filepath, but transfer state is tracked by snapshot_uuid + filename. If a slave retries the same SST during the grace window, the cleanup worker has no stable key to ask whether that file became active again before deleting it.

Minimal shape change
-  void ScheduleFileForCleanup(const std::string& filepath, int delay_seconds);
+  void ScheduleFileForCleanup(const std::string& snapshot_uuid,
+                              const std::string& filename,
+                              const std::string& filepath,
+                              int delay_seconds);

   struct PendingCleanupInfo {
+    std::string snapshot_uuid;
+    std::string filename;
     std::string filepath;
     time_t cleanup_time;
   };

Also applies to: 677-682

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@include/pika_server.h` around lines 246 - 247, PendingCleanupInfo currently
only holds filepath so the delayed-cleanup queue can't determine transfer
identity (snapshot_uuid + filename); modify PendingCleanupInfo to include
snapshot_uuid and filename (or a single transfer_id composed of them), update
ScheduleFileForCleanup to accept snapshot_uuid and filename (in addition to
filepath/delay_seconds) and ensure ProcessPendingCleanupFiles uses the
snapshot_uuid+filename identity when checking whether a file became active again
before deletion; touch all uses of ScheduleFileForCleanup,
ProcessPendingCleanupFiles, and any queue logic to push/pop the updated
PendingCleanupInfo structure so cleanup decisions use the stable transfer
identity.

Comment on lines +344 to +352
// Find first available sequence number
int seq = 0;
std::string time_sub_path;
std::string full_path;
do {
time_sub_path = prefix + "-" + std::to_string(seq);
full_path = base_path + time_sub_path + "/" + bgsave_sub_path_;
seq++;
} while (pstd::FileExists(full_path) && seq < 1000); // Max 1000 dumps per day
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Reserve the sequence at the dump root, not the DB path.

The availability check uses .../dump-YYYYMMDD-N/<db_name>. That lets another DB reuse the same dump-YYYYMMDD-N as long as its own subdir is missing, so independent syncs can end up sharing one dump root. That breaks the per-slave exclusive-dump model and makes later dump-level cleanup/ownership ambiguous.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/pika_db.cc` around lines 344 - 352, The loop currently checks existence
of base_path + time_sub_path + "/" + bgsave_sub_path_, which only reserves
per-DB subpaths and allows different DBs to reuse the same dump root; change the
check to reserve the dump root itself by testing base_path + time_sub_path (the
dump-YYYYMMDD-N directory) instead of including bgsave_sub_path_. Update the
construction of full_path used in the pstd::FileExists call (and any other place
that assumes the checked path) to point at the dump root (using time_sub_path
and base_path) so each sequence number is exclusively reserved for the entire
dump root.

Comment on lines +345 to 356
int seq = 0;
std::string time_sub_path;
std::string full_path;
do {
time_sub_path = prefix + "-" + std::to_string(seq);
full_path = base_path + time_sub_path + "/" + bgsave_sub_path_;
seq++;
} while (pstd::FileExists(full_path) && seq < 1000); // Max 1000 dumps per day

if (seq >= 1000) {
LOG(ERROR) << db_name_ << " too many dump directories for today";
return false;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

The last daily slot is never used.

Because seq is incremented before the limit check, a free ...-999 directory still exits the loop with seq == 1000, and Line 354 reports "too many dump directories". This caps the code at 999 usable slots, not 1000.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/pika_db.cc` around lines 345 - 356, The loop increments seq before
checking the 1000-limit, so the final slot (-999) can be skipped; update the
search to stop after checking up to 1000 candidates by iterating with seq from 0
to <1000 and breaking when pstd::FileExists(full_path) is false (e.g., replace
the do/while with a for loop or move seq++ to after the FileExists check),
ensuring time_sub_path/full_path are constructed using seq and using
bgsave_sub_path_ and db_name_ as before so the final slot is considered and the
seq >= 1000 check correctly indicates exhaustion.

Comment on lines +431 to +461
int ret = pstd::GetChildren(dbPath, subDirs);
LOG(INFO) << "[GetBgSaveMetaData] GetChildren for dbPath returned " << ret
<< ", subDirs count=" << subDirs.size();
if (ret) {
LOG(WARNING) << "[GetBgSaveMetaData] Failed to read dbPath: " << dbPath;
return;
}

int db_instance_num = g_pika_conf->db_instance_num();
for (int index = 0; index < db_instance_num; index++) {
std::string instPath = dbPath + ((dbPath.back() != '/') ? "/" : "") + std::to_string(index);
if (!pstd::FileExists(instPath)) {
continue ;
for (const std::string& subDir : subDirs) {
std::string instPath = dbPath + "/" + subDir;
// Skip if not exists or is a file (not directory)
// Note: IsDir returns 0 for directory, 1 for file, -1 for error
if (!pstd::FileExists(instPath) || pstd::IsDir(instPath) != 0) {
continue;
}

std::vector<std::string> tmpFileNames;
int ret = pstd::GetChildren(instPath, tmpFileNames);
ret = pstd::GetChildren(instPath, tmpFileNames);
if (ret) {
LOG(WARNING) << dbPath << " read dump meta files failed, path " << instPath;
return;
LOG(WARNING) << "[GetBgSaveMetaData] Failed to read instPath: " << instPath;
continue;
}

for (const std::string fileName : tmpFileNames) {
fileNames -> push_back(std::to_string(index) + "/" + fileName);
for (const std::string& fileName : tmpFileNames) {
std::string fullPath = instPath + "/" + fileName;
struct stat st;
// Check if file exists and get its stat
if (stat(fullPath.c_str(), &st) != 0) {
// File doesn't exist, skip it
LOG(WARNING) << "[GetBgSaveMetaData] File does not exist: " << fullPath;
continue;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't serve a partial dump manifest on scan errors.

These return/continues can drop an entire instance directory or even CURRENT/MANIFEST from fileNames. src/pika_server.cc:827-834 still returns OK to rsync, and src/rsync_server.cc:288-310 then treats the truncated list as the integrity baseline, so the slave can accept an incomplete snapshot instead of retrying a fresh bgsave. This path should surface an error, which likely means GetBgSaveMetaData needs to return a Status.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/pika_db.cc` around lines 431 - 461, GetBgSaveMetaData must not silently
drop files on scan errors; change its signature to return a Status (e.g., Status
GetBgSaveMetaData(...)) and replace the current silent continue/return behavior
so that any failure from pstd::GetChildren, pstd::IsDir (when it returns -1),
pstd::FileExists checks that indicate unexpected state, or stat(fullPath) != 0
returns a non-OK Status describing the problem. Update callers (the code path
that currently treats GetBgSaveMetaData as void and later returns OK to rsync)
to inspect and propagate the Status so rsync/replication will retry instead of
accepting a truncated manifest. Ensure you reference and update uses of
GetBgSaveMetaData, and keep logging but return error Status on any
directory/file scan failure.

Comment on lines +1618 to +1627
// Read snapshot_uuid from info file for protection check
// TODO: For multi-DB setups, should check ALL db subdirectories (db0, db1, db2...)
// If any db is in use, the entire dump should be protected.
// Current simple approach only checks db0 for backward compatibility.
std::string snapshot_uuid;
std::string info_path = dump_file + "/db0/info";
if (!pstd::FileExists(info_path)) {
// Fallback to legacy path (directly under dump directory)
info_path = dump_file + "/info";
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't decide dump liveness from db0/info alone.

This only derives snapshot_uuid from db0/info or legacy /info. In a multi-DB dump, db0 can be absent while another dbN is still syncing, so snapshot_uuid stays empty and the directory can still be deleted through the "corrupted dump" or expiry paths even though that dump is still in use. Please scan all db*/info files, or at minimum fall back to dump_owners_ by dump_path before any delete.

Also applies to: 1647-1673, 1714-1718

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/pika_server.cc` around lines 1618 - 1627, The code currently derives
snapshot_uuid using only db0/info (info_path) and legacy /info, which risks
treating multi-DB dumps as inactive; update the logic around
snapshot_uuid/dump_file to scan all db*/info files under the dump directory
(e.g., iterate directory entries matching "db*" and read each info file to
collect snapshot_uuid or any non-empty value) and only treat the dump as
inactive if no dbN/info yields a snapshot UUID; additionally, if no
snapshot_uuid is found, consult dump_owners_ by dump_path as a fallback before
allowing deletion to ensure active dumps tracked in dump_owners_ are protected
(apply same change to the other similar blocks referencing
snapshot_uuid/info_path at the indicated spots).

Comment on lines +2067 to +2082
bool PikaServer::EnsureDirExists(const std::string& path, mode_t mode) {
// First check if directory already exists
if (pstd::FileExists(path)) {
return true;
}
// Directory doesn't exist, try to create it
int ret = pstd::CreatePath(path, mode);
// CreatePath returns 0 on success, -1 on failure
// Note: CreatePath also returns -1 if directory already exists (due to the
// !filesystem::create_directories check), but we already checked FileExists above
if (ret != 0) {
LOG(WARNING) << "Failed to create directory: " << path << ", error code: " << ret;
return false;
}
// Verify directory was created successfully
return pstd::FileExists(path);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

EnsureDirExists() should verify a directory, not just path existence.

pstd::FileExists(path) also succeeds for regular files, and the pre-check before CreatePath() is racy with concurrent creators. Both cases can make dump directory setup report the wrong result.

Suggested hardening
 bool PikaServer::EnsureDirExists(const std::string& path, mode_t mode) {
-  // First check if directory already exists
-  if (pstd::FileExists(path)) {
-    return true;
-  }
+  struct stat st;
+  if (stat(path.c_str(), &st) == 0) {
+    return S_ISDIR(st.st_mode);
+  }
   // Directory doesn't exist, try to create it
   int ret = pstd::CreatePath(path, mode);
-  // CreatePath returns 0 on success, -1 on failure
-  // Note: CreatePath also returns -1 if directory already exists (due to the
-  // !filesystem::create_directories check), but we already checked FileExists above
-  if (ret != 0) {
-    LOG(WARNING) << "Failed to create directory: " << path << ", error code: " << ret;
-    return false;
+  if (ret == 0 && stat(path.c_str(), &st) == 0) {
+    return S_ISDIR(st.st_mode);
   }
-  // Verify directory was created successfully
-  return pstd::FileExists(path);
+  if (stat(path.c_str(), &st) == 0 && S_ISDIR(st.st_mode)) {
+    return true;  // created concurrently
+  }
+  LOG(WARNING) << "Failed to create directory: " << path << ", error code: " << ret;
+  return false;
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/pika_server.cc` around lines 2067 - 2082, EnsureDirExists currently uses
pstd::FileExists which returns true for regular files and does a racy pre-check;
change it to verify the path is a directory (use an is-directory check such as
pstd::IsDirectory or stat+S_ISDIR) instead of FileExists, avoid the race by not
returning success on the initial FileExists check (or always attempt CreatePath
and then re-check that the path is a directory), and after pstd::CreatePath
handle the -1 case by re-checking whether the path now exists and is a directory
(treat that as success) versus a real failure, referencing EnsureDirExists,
pstd::FileExists, and pstd::CreatePath in your changes.

Comment on lines +136 to +138
if (filesystem::is_empty(dir)) {
return -1;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Return success for empty directories.

Line 136 changes GetChildren so an existing-but-empty directory returns -1. Callers in src/pika_stable_log.cc and src/pika_server.cc treat any non-zero as a real failure, so this will turn normal empty-directory states into warnings/early returns and can skip cleanup/scan paths. Keep result empty and return 0; reserve -1 for actual filesystem errors.

💡 Proposed fix
 int GetChildren(const std::string& dir, std::vector<std::string>& result) {
   result.clear();
   try {
-    if (filesystem::is_empty(dir)) {
-      return -1;
-    }
     for (auto& de : filesystem::directory_iterator(dir)) {
       result.emplace_back(de.path().filename());
     }
   } catch (const filesystem::filesystem_error& e) {
     LOG(WARNING) << "GetChildren failed for " << dir << ": " << e.what();
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/pstd/src/env.cc` around lines 136 - 138, GetChildren currently returns -1
when filesystem::is_empty(dir), which signals an error to callers like the Pika
scan/cleanup code; change the behavior so that when the directory exists but is
empty you leave the output container (result) empty and return 0, reserving -1
only for real filesystem errors. In short: in GetChildren, remove/replace the
early return -1 on filesystem::is_empty(dir) with logic that returns 0 and an
empty result, so callers of GetChildren see success with no children rather than
an error.

Comment on lines +242 to +250
// If Master returns empty response, it means the file was cleaned up during sync
// This is an error condition - the sync should fail and retry
if (ret_count == 0 && resp->file_resp().eof()) {
LOG(ERROR) << "File not available on Master at offset " << offset
<< ", filename: " << filename
<< ". Will retry and may trigger new bgsave.";
s = Status::IOError("File not available on Master");
return s;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't treat a valid zero-byte file as "missing".

kOk + eof=true + count=0 is also the normal response for an existing empty file. Missing files are already reported via response.code() != kOk, so this block will make legitimate empty files fail full sync.

🛠️ Suggested fix
-      // If Master returns empty response, it means the file was cleaned up during sync
-      // This is an error condition - the sync should fail and retry
-      if (ret_count == 0 && resp->file_resp().eof()) {
-        LOG(ERROR) << "File not available on Master at offset " << offset
-                   << ", filename: " << filename
-                   << ". Will retry and may trigger new bgsave.";
-        s = Status::IOError("File not available on Master");
-        return s;
-      }
+      // A zero-byte file is also represented as kOk + eof=true + count=0.
+      // Missing files should be handled via resp->code() != kOk.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/rsync_client.cc` around lines 242 - 250, The current check in
rsync_client.cc incorrectly treats a valid zero-byte file as missing when
ret_count == 0 && resp->file_resp().eof(); change the logic so we only consider
the file missing when the RPC response indicates an error (response.code() !=
kOk or resp->code() != kOk) rather than when count==0+eof; update the
conditional around the LOG(ERROR) / Status::IOError("File not available on
Master") and return to only trigger on a non-OK response code while allowing the
kOk + eof + count==0 case to proceed as a legitimate empty file; adjust
references to ret_count, resp->file_resp().eof(), and
response.code()/resp->code() accordingly.

Comment on lines +89 to +101
std::string backup_dir = GetSaveDirByIndex(dir, engine.first);
delete_dir(backup_dir.c_str());

s = engine.second->CreateCheckpointWithFiles(
backup_dir, bcontent.live_files, bcontent.live_wal_files,
bcontent.manifest_file_size, bcontent.sequence_number);

// 3. Re-enable file deletions regardless of success
// CreateCheckpointWithFiles already calls EnableFileDeletions in db_checkpoint.cc
if (!s.ok()) {
LOG(WARNING) << "CreateCheckpointWithFiles failed for index " << engine.first
<< ": " << s.ToString();
return s;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate the checkpoint target before GetCheckpointFiles().

src/storage/src/db_checkpoint.cc:75-97 leaves file deletions disabled after GetCheckpointFiles, and src/storage/src/db_checkpoint.cc:98-104 returns immediately if the checkpoint dir already exists. Because delete_dir(backup_dir.c_str()) is unchecked here, a stale or undeletable target can strand the instance with file deletions still disabled. Clean up or validate backup_dir before calling GetCheckpointFiles, or explicitly re-enable deletions on this early-failure path.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/storage/src/backupable.cc` around lines 89 - 101, The code computes
backup_dir via GetSaveDirByIndex and calls delete_dir before calling
CreateCheckpointWithFiles, but if delete_dir fails or the checkpoint target
already exists earlier in the flow (e.g., in GetCheckpointFiles /
db_checkpoint.cc), file deletions can remain disabled; fix by validating or
ensuring the checkpoint target is removable before calling
GetCheckpointFiles/CreateCheckpointWithFiles (e.g., check delete_dir return and
retry/clear stale dir) and on any early-return path re-enable deletions by
calling the appropriate re-enable helper (same one used by db_checkpoint.cc,
e.g., EnableFileDeletions) so deletions are always restored even on error paths
in CreateCheckpointWithFiles or when the backup dir already exists.

@chejinge chejinge changed the title feature: 主从同步的dump清理机制调整,优化存储空间占用 feat: 主从同步的dump清理机制调整,优化存储空间占用 Mar 12, 2026
@chejinge chejinge changed the title feat: 主从同步的dump清理机制调整,优化存储空间占用 feat: Adjustment of the master-slave synchronization dump cleanup mechanism, optimizing storage space usage Mar 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

4.0.2 bug ☢️ Bug Something isn't working ✏️ Feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants