Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 9 additions & 4 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,13 @@ endif()
set(CMAKE_CXX_STANDARD 17)
project(pika)
set(CMAKE_EXPORT_COMPILE_COMMANDS ON)
enable_testing()

# Option to control whether tests are built
option(BUILD_TESTS "Build tests" ON)

if(BUILD_TESTS)
enable_testing()
endif()

if (CMAKE_CXX_COMPILER_ID STREQUAL "Clang")
# using Clang
Expand Down Expand Up @@ -760,6 +766,7 @@ if (USE_PIKA_TOOLS)
add_subdirectory(tools)
endif()
aux_source_directory(src DIR_SRCS)
list(REMOVE_ITEM DIR_SRCS "src/build_version.cc")

# # generate version
string(TIMESTAMP TS "%Y-%m-%d %H:%M:%S")
Expand Down Expand Up @@ -790,9 +797,7 @@ message("pika GIT_DATE = ${PIKA_GIT_DATE}")
message("pika GIT_TAG = ${PIKA_GIT_TAG}")
message("pika BUILD_DATE = ${PIKA_BUILD_DATE}")

set(PIKA_BUILD_VERSION_CC ${CMAKE_BINARY_DIR}/pika_build_version.cc
src/pika_cache_load_thread.cc
)
set(PIKA_BUILD_VERSION_CC ${CMAKE_BINARY_DIR}/pika_build_version.cc)
message("PIKA_BUILD_VERSION_CC : " ${PIKA_BUILD_VERSION_CC})
configure_file(${CMAKE_CURRENT_SOURCE_DIR}/src/build_version.cc.in ${PIKA_BUILD_VERSION_CC} @ONLY)

Expand Down
6 changes: 3 additions & 3 deletions conf/pika.conf
Original file line number Diff line number Diff line change
Expand Up @@ -496,7 +496,7 @@ default-slot-num : 1024

# rate limiter bandwidth, units in bytes, default 1024GB/s (No limit)
# [Support Dynamically changeable] send 'rate-limiter-bandwidth' to a running pika can change it's value dynamically
#rate-limiter-bandwidth : 1099511627776
rate-limiter-bandwidth : 109951162

#rate-limiter-refill-period-us : 100000
#
Expand All @@ -505,7 +505,7 @@ default-slot-num : 1024
# if auto_tuned is true: Enables dynamic adjustment of rate limit within the range
#`[rate-limiter-bandwidth / 20, rate-limiter-bandwidth]`, according to the recent demand for background I/O.
# rate limiter auto tune https://rocksdb.org/blog/2017/12/18/17-auto-tuned-rate-limiter.html. the default value is true.
#rate-limiter-auto-tuned : yes
rate-limiter-auto-tuned : no
Comment on lines 497 to +508
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments around the rate limiter settings no longer match the effective defaults in this file (e.g., it now enforces a non-default rate-limiter-bandwidth and sets rate-limiter-auto-tuned to no while the comment says the default is true/no-limit). Please update the surrounding comments to reflect the new defaults, or keep these lines commented-out if they are meant as optional overrides rather than shipped defaults.

Copilot uses AI. Check for mistakes.

################################## RocksDB Blob Configure #####################
# rocksdb blob configure
Expand Down Expand Up @@ -673,7 +673,7 @@ internal-used-unfinished-full-sync :
# for wash data from 4.0.0 to 4.0.1
# https://github.com/OpenAtomFoundation/pika/issues/2886
# default value: true
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says wash-data default value is true, but the config now sets wash-data: false. Update the comment to match the new default (or keep the setting commented-out if this is only intended as an example override).

Suggested change
# default value: true
# default value: false

Copilot uses AI. Check for mistakes.
wash-data: true
wash-data: false
Comment on lines 673 to +676
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Disabling wash-data by default may break upgrades from 4.0.0 to 4.0.1.

The WashData() function (referenced in the comment at lines 673-675) is essential for migrating hash column family data to the correct internal format when upgrading. With wash-data: false as the default:

  1. Users upgrading from 4.0.0 won't automatically get their data migrated
  2. Hash values without the proper suffix encoding will remain inconsistent
  3. This could cause silent data corruption or read failures

Consider either:

  • Keeping the default as true and documenting that users should set it to false after the first successful startup post-upgrade
  • Adding prominent upgrade documentation warning users to set wash-data: true before their first 4.0.1 startup
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@conf/pika.conf` around lines 673 - 676, The config currently sets wash-data:
false which prevents the WashData() migration from running during upgrades;
change the default to wash-data: true so WashData() runs automatically on first
startup after upgrading from 4.0.0, and add a clear comment next to the
wash-data entry instructing operators to set wash-data: false after the first
successful startup (or include alternative upgrade docs); specifically update
the wash-data default and the adjacent comment block referenced by WashData() to
reflect this behavior.


# Pika automatic compact compact strategy, a complement to rocksdb compact.
# Trigger the compact background task periodically according to `compact-interval`
Expand Down
76 changes: 76 additions & 0 deletions include/pika_server.h
Original file line number Diff line number Diff line change
Expand Up @@ -212,6 +212,40 @@ class PikaServer : public pstd::noncopyable {
pstd::Status GetDumpMeta(const std::string& db_name, std::vector<std::string>* files, std::string* snapshot_uuid);
void TryDBSync(const std::string& ip, int port, const std::string& db_name, int32_t top);

/*
* Rsync snapshot tracking (for orphan file cleanup protection)
*/
void RegisterRsyncSnapshot(const std::string& snapshot_uuid);
void UnregisterRsyncSnapshot(const std::string& snapshot_uuid);
bool IsRsyncSnapshotActive(const std::string& snapshot_uuid);
std::set<std::string> GetActiveRsyncSnapshots();

/*
* Rsync file transfer tracking (for safe orphan file cleanup during sync)
*/
void RegisterRsyncTransferringFile(const std::string& snapshot_uuid, const std::string& filename);
void UnregisterRsyncTransferringFile(const std::string& snapshot_uuid, const std::string& filename);
bool IsRsyncFileTransferring(const std::string& snapshot_uuid, const std::string& filename);
std::set<std::string> GetRsyncTransferringFiles(const std::string& snapshot_uuid);

/*
* Dump ownership management (Scheme A: each slave has exclusive dump)
*/
bool MarkDumpInUse(const std::string& snapshot_uuid, const std::string& conn_id, const std::string& dump_path);
void ReleaseDump(const std::string& snapshot_uuid);
bool IsDumpInUse(const std::string& snapshot_uuid) const;
std::string GetDumpPathBySnapshot(const std::string& snapshot_uuid) const;
size_t GetActiveDumpCount() const;
static constexpr size_t kMaxConcurrentDumps = 3; // Max concurrent dumps allowed

/*
* Delayed file cleanup for orphan SST files (Scheme A)
* Files are scheduled for cleanup 10 minutes after transfer completes
* to allow for retries and ensure data consistency
*/
void ScheduleFileForCleanup(const std::string& filepath, int delay_seconds);
void ProcessPendingCleanupFiles();
Comment on lines +246 to +247
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Keep the same transfer identity in the delayed-cleanup queue.

PendingCleanupInfo only stores filepath, but transfer state is tracked by snapshot_uuid + filename. If a slave retries the same SST during the grace window, the cleanup worker has no stable key to ask whether that file became active again before deleting it.

Minimal shape change
-  void ScheduleFileForCleanup(const std::string& filepath, int delay_seconds);
+  void ScheduleFileForCleanup(const std::string& snapshot_uuid,
+                              const std::string& filename,
+                              const std::string& filepath,
+                              int delay_seconds);

   struct PendingCleanupInfo {
+    std::string snapshot_uuid;
+    std::string filename;
     std::string filepath;
     time_t cleanup_time;
   };

Also applies to: 677-682

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@include/pika_server.h` around lines 246 - 247, PendingCleanupInfo currently
only holds filepath so the delayed-cleanup queue can't determine transfer
identity (snapshot_uuid + filename); modify PendingCleanupInfo to include
snapshot_uuid and filename (or a single transfer_id composed of them), update
ScheduleFileForCleanup to accept snapshot_uuid and filename (in addition to
filepath/delay_seconds) and ensure ProcessPendingCleanupFiles uses the
snapshot_uuid+filename identity when checking whether a file became active again
before deletion; touch all uses of ScheduleFileForCleanup,
ProcessPendingCleanupFiles, and any queue logic to push/pop the updated
PendingCleanupInfo structure so cleanup decisions use the stable transfer
identity.


/*
* Keyscan used
*/
Expand Down Expand Up @@ -498,6 +532,13 @@ class PikaServer : public pstd::noncopyable {
*/
void DisableCompact();

/*
* Utility function to ensure directory exists
* Returns true if directory exists or was created successfully
* Handles the special case where CreatePath returns 0 for both success and "already exists"
*/
static bool EnsureDirExists(const std::string& path, mode_t mode = 0755);

/*
* lastsave used
*/
Expand Down Expand Up @@ -605,6 +646,41 @@ class PikaServer : public pstd::noncopyable {
std::unique_ptr<PikaRsyncService> pika_rsync_service_;
std::unique_ptr<rsync::RsyncServer> rsync_server_;

/*
* Rsync snapshot tracking used (for orphan file cleanup protection)
*/
std::set<std::string> active_rsync_snapshots_;
std::mutex active_rsync_snapshots_mutex_;

/*
* Rsync file transfer tracking used (for safe orphan file cleanup during sync)
* Tracks which files are currently being transferred for each snapshot
*/
std::map<std::string, std::set<std::string>> rsync_transferring_files_;
std::mutex rsync_transferring_files_mutex_;

/*
* Dump ownership tracking used (Scheme A: each slave has exclusive dump)
* snapshot_uuid -> {connection id, dump path}
*/
struct DumpOwnerInfo {
std::string conn_id;
std::string dump_path;
};
std::map<std::string, DumpOwnerInfo> dump_owners_;
mutable std::mutex dump_owners_mutex_;

/*
* Pending cleanup tracking for delayed file deletion (Scheme A)
* filepath -> {cleanup_time}
*/
struct PendingCleanupInfo {
std::string filepath;
time_t cleanup_time;
};
std::map<std::string, PendingCleanupInfo> pending_cleanup_files_;
mutable std::mutex pending_cleanup_mutex_;

/*
* Pubsub used
*/
Expand Down
22 changes: 21 additions & 1 deletion include/rsync_server.h
Original file line number Diff line number Diff line change
Expand Up @@ -54,10 +54,30 @@ class RsyncServerConn : public net::PbConn {
int DealMessage() override;
static void HandleMetaRsyncRequest(void* arg);
static void HandleFileRsyncRequest(void* arg);

// Snapshot tracking for orphan file cleanup protection
void RegisterSnapshot(const std::string& snapshot_uuid);
void UnregisterSnapshot();
std::string GetSnapshotUuid() const { return snapshot_uuid_; }

// File transfer tracking for safe orphan file cleanup during sync
void AddTransferringFile(const std::string& filename);
// Remove file from transfer tracking, optionally cleanup if transfer is complete (is_eof=true)
void RemoveTransferringFile(const std::string& filename, bool is_eof = false);
bool IsFileTransferring(const std::string& filename) const;
std::set<std::string> GetTransferringFiles() const;
// Global check if a file is being transferred by any connection
static bool IsFileTransferringGlobally(const std::string& snapshot_uuid, const std::string& filename);

// Public member for dump ownership tracking (Scheme A)
std::string conn_id_; // Connection ID for dump ownership tracking

private:
std::vector<std::shared_ptr<RsyncReader> > readers_;
std::mutex mu_;
mutable std::mutex mu_;
void* data_ = nullptr;
std::string snapshot_uuid_; // Current snapshot being synced
std::set<std::string> transferring_files_; // Files currently being read
};

class RsyncServerThread : public net::HolyThread {
Expand Down
2 changes: 1 addition & 1 deletion src/pika_command.cc
Original file line number Diff line number Diff line change
Expand Up @@ -417,7 +417,7 @@ void InitCmdTable(CmdTable* cmd_table) {
cmd_table->insert(std::pair<std::string, std::unique_ptr<Cmd>>(kCmdNameHKeys, std::move(hkeysptr)));
////HLenCmd
std::unique_ptr<Cmd> hlenptr =
std::make_unique<HLenCmd>(kCmdNameHLen, 2, kCmdFlagsRead | kCmdFlagsHash | kCmdFlagsUpdateCache | kCmdFlagsDoThroughDB | kCmdFlagsFast | kCmdFlagsReadCache);
std::make_unique<HLenCmd>(kCmdNameHLen, 2, kCmdFlagsRead | kCmdFlagsHash | kCmdFlagsDoThroughDB | kCmdFlagsFast | kCmdFlagsReadCache);
cmd_table->insert(std::pair<std::string, std::unique_ptr<Cmd>>(kCmdNameHLen, std::move(hlenptr)));
////HMgetCmd
std::unique_ptr<Cmd> hmgetptr =
Expand Down
2 changes: 1 addition & 1 deletion src/pika_conf.cc
Original file line number Diff line number Diff line change
Expand Up @@ -451,7 +451,7 @@ int PikaConf::Load() {
rate_limiter_auto_tuned_ = at == "yes" || at.empty();
// if rate limiter autotune enable, `rate_limiter_bandwidth_` will still be respected as an upper-bound.
if (rate_limiter_auto_tuned_) {
rate_limiter_bandwidth_ = 10 * 1024 * 1024 * 1024; // 10GB/s
rate_limiter_bandwidth_ = 10LL * 1024 * 1024 * 1024; // 10GB/s
}

// max_write_buffer_num
Expand Down
128 changes: 103 additions & 25 deletions src/pika_db.cc
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
// LICENSE file in the root directory of this source tree. An additional grant
// of patent rights can be found in the PATENTS file in the same directory.

#include <sys/stat.h>
#include <fstream>
#include <utility>

Expand Down Expand Up @@ -300,8 +301,9 @@ bool DB::RunBgsaveEngine() {
LOG(INFO) << db_name_ << " bgsave_info: path=" << info.path << ", filenum=" << info.offset.b_offset.filenum
<< ", offset=" << info.offset.b_offset.offset;

// Backup to tmp dir
rocksdb::Status s = bgsave_engine_->CreateNewBackup(info.path);
// Use SetBackupContentAndCreate to minimize time window between GetLiveFiles and CreateCheckpoint
// This reduces the chance of compaction occurring and creating orphan files
rocksdb::Status s = bgsave_engine_->SetBackupContentAndCreate(info.path);

if (!s.ok()) {
LOG(WARNING) << db_name_ << " create new backup failed :" << s.ToString();
Expand All @@ -324,29 +326,61 @@ void DB::FinishBgsave() {
}

// Prepare engine, need bgsave_protector protect
// Scheme A: Each slave has exclusive dump, so we need unique dump directories
bool DB::InitBgsaveEnv() {
std::lock_guard l(bgsave_protector_);
// Prepare for bgsave dir
bgsave_info_.start_time = time(nullptr);
char s_time[32];
int len = static_cast<int32_t>(strftime(s_time, sizeof(s_time), "%Y%m%d%H%M%S", localtime(&bgsave_info_.start_time)));
bgsave_info_.s_start_time.assign(s_time, len);
std::string time_sub_path = g_pika_conf->bgsave_prefix() + std::string(s_time, 8);
bgsave_info_.path = g_pika_conf->bgsave_path() + time_sub_path + "/" + bgsave_sub_path_;
if (!pstd::DeleteDirIfExist(bgsave_info_.path)) {
LOG(WARNING) << db_name_ << " remove exist bgsave dir failed";

// Scheme A: Use unique directory name with sequence number
// Format: dump-YYYYMMDD-NN/db_name where NN is sequence number
std::string base_path = g_pika_conf->bgsave_path();
std::string date_str(s_time, 8);
std::string prefix = g_pika_conf->bgsave_prefix() + date_str;

// Find first available sequence number
int seq = 0;
std::string time_sub_path;
std::string full_path;
do {
time_sub_path = prefix + "-" + std::to_string(seq);
full_path = base_path + time_sub_path + "/" + bgsave_sub_path_;
seq++;
} while (pstd::FileExists(full_path) && seq < 1000); // Max 1000 dumps per day
Comment on lines +344 to +352
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Reserve the sequence at the dump root, not the DB path.

The availability check uses .../dump-YYYYMMDD-N/<db_name>. That lets another DB reuse the same dump-YYYYMMDD-N as long as its own subdir is missing, so independent syncs can end up sharing one dump root. That breaks the per-slave exclusive-dump model and makes later dump-level cleanup/ownership ambiguous.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/pika_db.cc` around lines 344 - 352, The loop currently checks existence
of base_path + time_sub_path + "/" + bgsave_sub_path_, which only reserves
per-DB subpaths and allows different DBs to reuse the same dump root; change the
check to reserve the dump root itself by testing base_path + time_sub_path (the
dump-YYYYMMDD-N directory) instead of including bgsave_sub_path_. Update the
construction of full_path used in the pstd::FileExists call (and any other place
that assumes the checked path) to point at the dump root (using time_sub_path
and base_path) so each sequence number is exclusively reserved for the entire
dump root.


if (seq >= 1000) {
LOG(ERROR) << db_name_ << " too many dump directories for today";
return false;
Comment on lines +345 to 356
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

The last daily slot is never used.

Because seq is incremented before the limit check, a free ...-999 directory still exits the loop with seq == 1000, and Line 354 reports "too many dump directories". This caps the code at 999 usable slots, not 1000.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/pika_db.cc` around lines 345 - 356, The loop increments seq before
checking the 1000-limit, so the final slot (-999) can be skipped; update the
search to stop after checking up to 1000 candidates by iterating with seq from 0
to <1000 and breaking when pstd::FileExists(full_path) is false (e.g., replace
the do/while with a for loop or move seq++ to after the FileExists check),
ensuring time_sub_path/full_path are constructed using seq and using
bgsave_sub_path_ and db_name_ as before so the final slot is considered and the
seq >= 1000 check correctly indicates exhaustion.

}
pstd::CreatePath(bgsave_info_.path, 0755);
// Prepare for failed dir
if (!pstd::DeleteDirIfExist(bgsave_info_.path + "_FAILED")) {
LOG(WARNING) << db_name_ << " remove exist fail bgsave dir failed :";

bgsave_info_.path = full_path;
LOG(INFO) << db_name_ << " preparing bgsave dir: " << bgsave_info_.path;

// Note: In Scheme A, we don't delete existing directories
// because other slaves may be using them
// Just create the new path
if (!PikaServer::EnsureDirExists(bgsave_info_.path, 0755)) {
LOG(WARNING) << db_name_ << " create bgsave dir failed: " << bgsave_info_.path
<< ", errno=" << errno << ", error=" << strerror(errno);
// Clear the path on failure to avoid using invalid path in GetDumpMeta
bgsave_info_.path.clear();
return false;
}

// Prepare for failed dir
std::string failed_dir = bgsave_info_.path + "_FAILED";
if (pstd::FileExists(failed_dir)) {
pstd::DeleteDirIfExist(failed_dir);
}
return true;
}

// Prepare bgsave env, need bgsave_protector protect
// Note: SetBackupContent is now done in RunBgsaveEngine using SetBackupContentAndCreate
// to minimize time window between GetLiveFiles and CreateCheckpoint
bool DB::InitBgsaveEngine() {
bgsave_engine_.reset();
rocksdb::Status s = storage::BackupEngine::Open(storage().get(), bgsave_engine_, g_pika_conf->db_instance_num());
Expand All @@ -371,11 +405,7 @@ bool DB::InitBgsaveEngine() {
std::lock_guard l(bgsave_protector_);
bgsave_info_.offset = bgsave_offset;
}
s = bgsave_engine_->SetBackupContent();
if (!s.ok()) {
LOG(WARNING) << db_name_ << " set backup content failed " << s.ToString();
return false;
}
// SetBackupContent is now done in RunBgsaveEngine to minimize time window
}
return true;
}
Expand All @@ -390,25 +420,73 @@ void DB::Init() {

void DB::GetBgSaveMetaData(std::vector<std::string>* fileNames, std::string* snapshot_uuid) {
const std::string dbPath = bgsave_info().path;
size_t total_sst_files = 0;
size_t orphan_sst_files = 0;

LOG(INFO) << "[GetBgSaveMetaData] Starting scan, dbPath=" << dbPath;

// dbPath is already the specific DB path (e.g., .../dump/dump-9454-20260302/db0)
// We need to scan its subdirectories (0, 1, 2 for rocksdb instances)
std::vector<std::string> subDirs;
int ret = pstd::GetChildren(dbPath, subDirs);
LOG(INFO) << "[GetBgSaveMetaData] GetChildren for dbPath returned " << ret
<< ", subDirs count=" << subDirs.size();
if (ret) {
LOG(WARNING) << "[GetBgSaveMetaData] Failed to read dbPath: " << dbPath;
return;
}

int db_instance_num = g_pika_conf->db_instance_num();
for (int index = 0; index < db_instance_num; index++) {
std::string instPath = dbPath + ((dbPath.back() != '/') ? "/" : "") + std::to_string(index);
if (!pstd::FileExists(instPath)) {
continue ;
for (const std::string& subDir : subDirs) {
std::string instPath = dbPath + "/" + subDir;
// Skip if not exists or is a file (not directory)
// Note: IsDir returns 0 for directory, 1 for file, -1 for error
if (!pstd::FileExists(instPath) || pstd::IsDir(instPath) != 0) {
continue;
}

std::vector<std::string> tmpFileNames;
int ret = pstd::GetChildren(instPath, tmpFileNames);
ret = pstd::GetChildren(instPath, tmpFileNames);
if (ret) {
LOG(WARNING) << dbPath << " read dump meta files failed, path " << instPath;
return;
LOG(WARNING) << "[GetBgSaveMetaData] Failed to read instPath: " << instPath;
continue;
}

for (const std::string fileName : tmpFileNames) {
fileNames -> push_back(std::to_string(index) + "/" + fileName);
for (const std::string& fileName : tmpFileNames) {
std::string fullPath = instPath + "/" + fileName;
struct stat st;
// Check if file exists and get its stat
if (stat(fullPath.c_str(), &st) != 0) {
// File doesn't exist, skip it
LOG(WARNING) << "[GetBgSaveMetaData] File does not exist: " << fullPath;
continue;
Comment on lines +431 to +461
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't serve a partial dump manifest on scan errors.

These return/continues can drop an entire instance directory or even CURRENT/MANIFEST from fileNames. src/pika_server.cc:827-834 still returns OK to rsync, and src/rsync_server.cc:288-310 then treats the truncated list as the integrity baseline, so the slave can accept an incomplete snapshot instead of retrying a fresh bgsave. This path should surface an error, which likely means GetBgSaveMetaData needs to return a Status.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/pika_db.cc` around lines 431 - 461, GetBgSaveMetaData must not silently
drop files on scan errors; change its signature to return a Status (e.g., Status
GetBgSaveMetaData(...)) and replace the current silent continue/return behavior
so that any failure from pstd::GetChildren, pstd::IsDir (when it returns -1),
pstd::FileExists checks that indicate unexpected state, or stat(fullPath) != 0
returns a non-OK Status describing the problem. Update callers (the code path
that currently treats GetBgSaveMetaData as void and later returns OK to rsync)
to inspect and propagate the Status so rsync/replication will retry instead of
accepting a truncated manifest. Ensure you reference and update uses of
GetBgSaveMetaData, and keep logging but return error Status on any
directory/file scan failure.

}

// Check if it's an SST file and if it's an orphan (Links=1)
if (fileName.size() > 4 && fileName.substr(fileName.size() - 4) == ".sst") {
total_sst_files++;
if (st.st_nlink == 1) {
// This is an orphan file, but we need to include it in the meta
// to ensure data consistency. The file will be cleaned up after
// a delay to allow for retries.
orphan_sst_files++;
LOG(INFO) << "[GetBgSaveMetaData] Including orphan SST file: " << fullPath
<< ", size=" << st.st_size;
// NOTE: We no longer skip orphan files here. They will be included
// in the file list and cleaned up with a delay after transfer.
}
}
// Construct relative path like "0/xxx.sst" or "1/xxx.sst"
fileNames->push_back(subDir + "/" + fileName);
}
}

if (orphan_sst_files > 0) {
LOG(INFO) << "[GetBgSaveMetaData] Summary for " << dbPath
<< ": total_sst=" << total_sst_files
<< ", orphan_included=" << orphan_sst_files
<< ", returned=" << fileNames->size();
}

fileNames->push_back(kBgsaveInfoFile);
pstd::Status s = GetBgSaveUUID(snapshot_uuid);
if (!s.ok()) {
Expand Down
Loading
Loading