Spr 865 fs snapshot validation#573
Conversation
SPR-865 Filesystem check utility
Create a utility that walks the system tables and verifies that every extent in every live table is reachable. For now we don't need to read in each row of each extent, as the extent header has a checksum that validates the extent data. In the future we may want to validate the schema of the extent against each row. To start with lets validate the tables found in the tableroots table by iterating through the tablenames table. (see system_tables.hh for table definitions) The tablenames table contains:
If an exists flag is false it indicates that the table was deleted at that XID and all prior XIDs are no longer valid, so should not be checked. The exists flag may become true at a later XID if the table is recreated. The tableroots table contains:
Validate a table as follows:
In future, we can also then validate the other system tables and the indexes for a table. |
There was a problem hiding this comment.
Looks good, I tried running on my system with vacuum and could see we are validating all the metadata, but for data - we are verifying extents only at the last xid. Can we verify extents between cutoff_xid and last xid, may be as a separate PR?
Basically we have to re-run FSCheck with all xids between cutoff_xid/0 and max_xid.
|
Just adding a note here based on my observations:
|
craigsoules
left a comment
There was a problem hiding this comment.
A couple of minor comments... let me know once you've reviewed and I can approve.
| if (!vacuumer_enabled || _max_xid >= cutoff_xid) { | ||
| _db_id_to_cutoff_xid.insert(std::make_pair(db_id, cutoff_xid)); | ||
| } |
There was a problem hiding this comment.
It looks like other parts of this code assume that this map is always populated via calls to _db_id_to_cutoff_xid.at(), but it seems like it might be possible that this doesn't get populated under the right conditions?
There was a problem hiding this comment.
Yes, this is correct. Basically we setup max_xid and vacuumer gives us cutoff xid per database. If max_xid is less thant cutoff xid for some database, then this database won't be evaluated because we won't have the data to work with.
@craigsoules do you think we want to run the check when the system is not running? |
@rsdcbabu The plan was to only run when not running. As the vacuumer could move past that point while the check is running. Unless we can block the vacuumer if a check is running. |
@garthgoodson @ella-springtail ok in that case, we may not be able to use the method in the vacuumer as it requires xid_client to be able to get the last committed xid. We can as well pass the cutoff xid in the param itself? |
Two potential issues:
|
@garthgoodson Maybe I should keep track of cutoff xid ran by vacuumer in the redis per db? that may simplify this, but yeah there is a redis-call overhead. |
| LOG_INFO("Verifying database {}:{} with first_xid = {} and cuttoff_xid = {}", | ||
| db_id, db_name, first_xid, cutoff_xid); | ||
|
|
||
| uint64_t start_xid = (first_xid < cutoff_xid)? cutoff_xid : first_xid; |
There was a problem hiding this comment.
if all_xids is false, start_xid is initialized with LATEST_XID, resulting in the following:
./src/sys_tbl_mgr/file_system_check | grep "Verifying data" [2025-08-23 04:09:07.771 +00:00] [info] [file_system_check.cc:397:_check_db] [thread 254662] Verifying database 1:springtail with first_xid = 18446744073709551615 and cuttoff_xid = 1115 [2025-08-23 04:09:07.771 +00:00] [info] [file_system_check.cc:402:_check_db] [thread 254662] Verifying database 1:springtail iteration max_xid = 18446744073709551615
and so extents are not validated
rsdcbabu
left a comment
There was a problem hiding this comment.
Saw a minor bug wrt start_xid if all_xids are not passed, added my comment inline.
Fixed in the last commit, it works: |
|



Finished snapshot validation.