Removes leftover odf images based on an identifier, referenced as LAB GUID
- OpenShift CNV + ODF
- GUID: A unique identifier used to group and track resources belonging to a specific lab environment
- Appears in OpenShift namespace names as: sandbox-{GUID}-*
- Appears in ODF volume names as: ocp4-cluster-{GUID}-{UUID}
- The GUID itself is a unique identifier string (ie: "abc123")
Volumes: LAB GUID is embedded in the volume name CSI Snapshots: Named as "csi-snap-..." but connected to volumes through parent-child relationships
The script uses a three-phase approach with direct matching, comprehensive descendant analysis, and dependency resolution:
def _find_images_by_criteria(self, source: str, guid_check: bool, csi_only: bool) -> List[OdfImage]:
"""Find images from pool or trash filtered by GUID and type"""
# source = "pool" or "trash"
# Filters by GUID presence in name and csi-snap vs regular volume
# Called three times: pool volumes, trash volumes, pool csi-snaps
pool_images = self._find_images_by_criteria("pool", guid_check=True, csi_only=False)
trash_images = self._find_images_by_criteria("trash", guid_check=True, csi_only=False)
csi_snaps = self._find_images_by_criteria("pool", guid_check=True, csi_only=True)def _discover_descendants_and_dependencies() -> Tuple[List[OdfImage], Dict[str, List[str]]]:
"""Recursively scan for missing descendants and track trash dependencies"""
# Single scan handles both:
# 1. Active descendant discovery (for tree hierarchy)
# 2. Trash dependency tracking (for cleanup strategy)
for desc in descendants:
if desc.get('trash', False):
# Track trash dependency only
active_to_trash_deps[image.name].append(desc_name)
else:
# Add active descendant to discovery
current_batch.append(new_image)Key Benefits:
- Eliminates "still has descendants" errors by finding ALL blocking children
- Optimized Performance - Single RBD API scan instead of duplicate calls
- Complete Hierarchy - Recursive discovery ensures no missing depth levels
- Dual Purpose - Discovers descendants AND tracks trash dependencies simultaneously
# Complete discovery with trash csi-snaps
trash_csi_snaps = self._find_trash_csi_snaps()
all_discovered = initial_images + additional_images + trash_csi_snaps
# Build hierarchical tree with proper relationships
self.build_tree(all_discovered)The script uses a two-phase execution strategy with automatic retry and verification:
def execute_cleanup(self, removal_order: List[OdfImage]):
# Initial cleanup attempt
initial_failed_count = self._execute_removal_batch(removal_order, "Initial cleanup")The script first attempts to remove all discovered items in the calculated dependency order (children → parents).
# If we had failures, try trash purge and retry
if initial_failed_count > 0:
print("RETRY STRATEGY - FAILURES DETECTED")
print("Attempting trash purge to clear blocking items...")
if self._purge_expired_trash():
# Get only the failed items from the last attempt
failed_items = [item for item in removal_order
if item.name in self.removal_stats['failed_removals']]
# Clear previous failures for retry
self.removal_stats['failed_removals'] = []
# Retry only failed items
retry_failed_count = self._execute_removal_batch(failed_items, "Retry after purge")Key Benefits:
- Performance: Only runs trash purge when actually needed
- Efficiency: Only retries items that actually failed
- Resilience: Handles blocking trash dependencies automatically
def _final_verification(self):
"""Final verification that no objects with the GUID remain in the pool"""
# Check active pool images
all_rbd_images = rbd.RBD().list(self.ioctx)
remaining_active = [img for img in all_rbd_images if self.lab_guid in img]
# Check trash items
trash_items = rbd.RBD().trash_list(self.ioctx)
remaining_trash = [item['name'] for item in trash_items if self.lab_guid in item['name']]Final verification only runs when:
- Zero failed removals
- Zero failed trash restorations
- Complete cleanup success
execute_cleanup()
├── _execute_removal_batch() [Initial attempt]
├── Check failures?
│ ├── No failures → _final_verification() → Done
│ └── Failures detected
│ ├── _purge_expired_trash()
│ ├── _execute_removal_batch() [Retry failed items]
│ └── _final_verification() [If retry successful]
└── _generate_report()
Successful first attempt:
Initial cleanup for 5 items...
[All items removed successfully]
FINAL VERIFICATION - Checking for remaining objects...
SUCCESS: No objects with GUID found in pool
Cleanup completed successfully for LAB GUID: abc123
With intelligent retry:
Initial cleanup for 5 items...
[2 items fail due to blocking dependencies]
============================================================
RETRY STRATEGY - FAILURES DETECTED
============================================================
Initial cleanup had 2 failures
Attempting trash purge to clear blocking items...
Purging expired trash items from pool 'ocpv-tenants'...
SUCCESS: Purged 3 expired trash items
Retry after purge for 2 items...
[Previously failed items now succeed]
All previously failed items successfully removed after trash purge!
FINAL VERIFICATION - Checking for remaining objects...
SUCCESS: No objects with GUID found in pool
- Python 3.6+
- Ceph/ODF tools installed and accessible
- OpenShift CLI (oc) or kubectl for comparison tool
- Valid kubeconfig with cluster access (for comparison tool)
# For both scripts
pip install rados rbd
# Additional for comparison tool
pip install kubernetes- ODF configuration file (ceph.conf)
- ODF keyring with sufficient privileges to list and manage RBD images
CL_POOL- ODF pool name (e.g., "ocpv-tenants")CL_CONF- Path to Ceph configuration fileCL_KEYRING- Path to Ceph keyring file
CL_LAB- LAB GUID to clean up
DRY_RUN- Enable dry-run mode (default: "true")DEBUG- Enable debug output (default: "false")
git clone https://github.com/yordangit/odf-cleanup.git odf-cleanup
cd odf-cleanup
source env.sh
# Edit env.sh with your specific values
export CL_LAB="your-lab-guid"
python3 odf-cleanup.pycd odf-cleanup
source env.sh
# Edit env.sh with your specific values (CL_LAB not needed)
python3 utils/odf-oc-compare.py
# if needed move the generated script next to odf-cleanup.pycd odf-cleanup
source env.sh
# Edit env.sh with your specific values (CL_LAB not needed)
python3 utils/odf-cleanup-monitor.py --format csv --csv failures.csv"Missing environment variable" error:
# Make sure all required variables are set
source env.sh
env | grep CL_"Error connecting to ODF cluster":
- Verify ODF configuration file path exists:
ls -la $CL_CONF - Check keyring file permissions:
ls -la $CL_KEYRING - Test ODF connectivity:
rbd -p $CL_POOL list
"Error discovering namespaces" (comparison tool):
- Verify kubeconfig is valid:
oc whoamiorkubectl auth can-i get namespaces - Check cluster connectivity:
oc get projectsorkubectl get namespaces
"RBD image not found" warnings:
- These are normal for orphaned CSI snapshots
- Enable debug mode for more details:
export DEBUG="true"
Performance Issues:
- Large clusters may take longer for initial discovery
- Script uses caching to optimize repeated operations
- Consider running during off-peak hours for large cleanups
Enable detailed logging for troubleshooting:
export DEBUG="true"
python3 odf-oc-compare.pyThis shows:
- Connection details
- GUID extraction process
- CSI snapshot parent analysis
- Detailed error messages
- Feel free to open a PR
- Open an Issue