Skip to content

configurable artifact retention with background cleanup and dry-run CLI#728

Closed
ishitaajain22-tech wants to merge 25 commits into
utksh1:mainfrom
ishitaajain22-tech:ishitaajain22-tech-filesCheck
Closed

configurable artifact retention with background cleanup and dry-run CLI#728
ishitaajain22-tech wants to merge 25 commits into
utksh1:mainfrom
ishitaajain22-tech:ishitaajain22-tech-filesCheck

Conversation

@ishitaajain22-tech

Copy link
Copy Markdown

Description

Adds production-grade artifact retention for scan tasks and their raw output files.

  • Configurable cleanup by age (--max-age-days) and count (--max-task-count) with a status guard that never auto-purges running or queued tasks
  • Background cleanup loop wired into the FastAPI lifespan, mirroring the existing WorkflowScheduler pattern
  • One-shot dry-run CLI command (python -m backend.secuscan.cli cleanup --dry-run) that reports what would be removed without touching the DB or disk
  • Audit log entry (retention_purge) written for every real deletion
  • Child rows (findings, reports, audit_log) explicitly removed before the task row
  • .env.example extended with commented-out retention knobs — all policies default to 0 (disabled), so existing deployments are unaffected

Related Issues

Closes #217

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update

How Has This Been Tested?

27 new unit tests added in testing/backend/unit/test_retention.py.

Run the new tests:

$env:SECUSCAN_VAULT_KEY="test"
python -m pytest testing/backend/unit/test_retention.py -v
# Expected: 27 passed

Run full unit suite to confirm no regressions:

python -m pytest testing/backend/unit/ -v
# Expected: all previously passing tests still pass

Manually test dry-run CLI:

python -m backend.secuscan.cli cleanup --max-age-days 30 --dry-run
# Expected:
# [DRY-RUN] Tasks would be removed: 0
# [DRY-RUN] Files  would be removed: 0

Test coverage includes:

  • Dry-run: no DB writes, correct counts reported, no audit entries written
  • Age threshold: removes old tasks, respects cutoff boundary, disabled when 0
  • Count threshold: keeps newest N tasks, no-op when within limit, disabled when 0
  • keep_statuses guard: running and queued never purged regardless of policy
  • File deletion: existing raw output file removed; missing file does not raise
  • Failed deletion: error captured in result.errors, cleanup continues for remaining tasks
  • Audit entries: retention_purge written per deleted task; not written on dry-run
  • DB references: findings and reports deleted with the task; surviving task's children untouched
  • Scheduler lifecycle: start/stop, idempotent double-start, stop-before-start safe, tick error tolerance

Checklist

  • My code follows the code style of this project.
  • I have performed a self-review of my own code.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have made corresponding changes to the documentation.
  • My changes generate no new warnings.

@utksh1 utksh1 added level:advanced 55 pts difficulty label for advanced contributor PRs type:devops DevOps or infrastructure work category bonus label type:feature Feature work category bonus label area:backend Backend API, database, or service work gssoc:invalid Admin validation: invalid for GSSoC scoring labels Jun 9, 2026
@utksh1

utksh1 commented Jun 9, 2026

Copy link
Copy Markdown
Owner

Closing as invalid/duplicate of the previously reviewed and closed artifact-retention PR #475. This still has failing formatting-hygiene checks and is too large/risky to merge in its current form. Please open a smaller focused PR after addressing the earlier review feedback.

@utksh1 utksh1 closed this Jun 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:backend Backend API, database, or service work gssoc:invalid Admin validation: invalid for GSSoC scoring level:advanced 55 pts difficulty label for advanced contributor PRs type:devops DevOps or infrastructure work category bonus label type:feature Feature work category bonus label

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BACKEND] Add scan artifact retention and cleanup policies

2 participants