refactor(backup): remove built-in WuKongIM backup module (#139)#167
refactor(backup): remove built-in WuKongIM backup module (#139)#167an9xyz wants to merge 2 commits into
Conversation
…OSS#139) Delete modules/backup (cron tar+COS uploader, /v1/manager/backup/* API, ~1.6k LoC). Infrastructure-layer backup (volume snapshots, sidecar sync) is the right home for this, not the business service. Keep backup_config / backup_history tables — no drop migration — so existing production rows survive. Add PurgeRemovedMigrationIDs startup shim to delete the now-orphaned gorp_migrations rows (both legacy backup-YYYYMMDD-NN.sql and rewritten *_backup_legacy01.sql forms). Without this, sql-migrate's PlanMigration panics with "unknown migration in database" on existing deployments. Wired into main.go after RewriteLegacyMigrationIDs / ReconcileThreadSchemaRecords.
Address review on Mininglamp-OSS#139: deleting the SQL files + purging gorp_migrations rows would silently re-run the original CREATE TABLE on rollback to a pre-removal release (the tables are intentionally left in place, so the unconditional CREATE would fail at startup with MySQL 1050). Restore the two backup SQL files with their original contents and a minimal package backup/1module.go that registers only the SQLDir (SetupAPI=nil — no HTTP surface). All Go business code (cron scheduler, API, tar+COS storage, disk precheck, models, service) stays deleted. Revert PurgeRemovedMigrationIDs + its tests + the main.go wire-up and restore the two mapping JSON entries (total back to 124). sql-migrate continues to recognise both backup IDs, fresh installs run them as historical migrations, and a rollback finds the IDs already recorded in gorp_migrations so the old module's CREATE TABLE is skipped as expected.
lml2468
left a comment
There was a problem hiding this comment.
[APPROVE] Clean module removal with correct sql-migrate tombstone pattern. -1564 LoC, zero risk.
Verification
| Check | Status | Evidence |
|---|---|---|
| All Go code deleted (8 files) | ✅ | api_manager, db, diskcheck, diskcheck_test, model, scheduler, service, storage, storage_cos — all removed |
1module.go retains only SQLDir registration |
✅ | SetupAPI field omitted (nil), no HTTP routes, no scheduler, no service init |
config.Context import removed |
✅ | No longer needed — factory param is interface{} and only register is used |
| SQL migration files kept as tombstones | ✅ | Not in diff — unchanged, IDs remain in embedded FS for sql-migrate |
Tables backup_config / backup_history not dropped |
✅ | No drop migration — existing production data preserved |
| Rollback safety | ✅ | Pre-removal release finds IDs already in gorp_migrations → skips CREATE TABLE; tables still on disk |
| Fresh install safety | ✅ | Tombstone migrations run as historical no-ops creating unused empty tables |
go build ./... clean (per PR) |
✅ | No dangling references to deleted types/functions |
No findings. The tombstone pattern is exactly right for sql-migrate — the package-level godoc in 1module.go explains the "why" thoroughly enough that a future maintainer won't accidentally delete the SQL files.
✅ Highlight: The godoc in 1module.go is exemplary for a tombstone — it explains why the package still exists, why the SQL files can't be deleted, and what would break if someone tried.
CI note: Build / Lint / Vet / Test pending at review time.
lml2468
left a comment
There was a problem hiding this comment.
Review: APPROVED ✅
PR #167 removes the built-in WuKongIM backup module (~1,564 LoC deleted), implementing issue #139's removal plan.
What remains (tombstone pattern)
| File | Purpose |
|---|---|
1module.go |
Tombstone — registers module with SQLDir only, no SetupAPI (zero HTTP surface) |
sql/20260331000001_backup_legacy01.sql |
Historical migration — keeps sql-migrate happy |
sql/20260401000001_backup_legacy01.sql |
Same |
What's deleted
9 Go files: api_manager.go, db.go, diskcheck.go, diskcheck_test.go, model.go, scheduler.go, service.go, storage.go, storage_cos.go.
Correctness
- sql-migrate compat: Migration IDs stay in embedded FS →
PlanMigrationwon't panic with "unknown migration in database" on existing deployments. ✅ - Rollback safety: Keeping the IDs prevents a pre-removal release from re-running
CREATE TABLE(noIF NOT EXISTS) and hitting MySQL 1050. ✅ - Data preservation:
backup_config/backup_historytables intentionally left in DB per issue #139. ✅ - No dangling references: Module registers with
Name: "backup"+SQLDironly. No other module importsbackup. ✅ - Package comment: Thorough explanation of the tombstone rationale. ✅
No blocking findings.
Jerry-Xin
left a comment
There was a problem hiding this comment.
Project relevance gate passed. This PR is in scope: it removes an internal modules/backup API/scheduler implementation while preserving migration compatibility for octo-server.
💬 Non-blocking
- 🟡 Warning:
go.mod:25andgo.mod:37still listgithub.com/google/uuidandgithub.com/robfig/cron/v3as direct dependencies even though the direct imports were removed with the backup module.go mod tidy -diffmoves both to the indirect block. This is cleanup-only, not a blocker. - 🔵 Suggestion: Add a small regression test around
modules/backup/1module.go:35confirming the backup module remains registered withSQLDirand noSetupAPI. The risk here is future cleanup accidentally deleting the tombstone and reintroducing sql-migrate startup failures.
✅ Highlights
modules/backup/1module.go:35keeps the module registered, andmodules/backup/1module.go:38keeps the embedded SQL migrations discoverable.modules/backup/1module.go:37omitsSetupAPI, which matches the intended route removal; the module setup path supports nilSetupAPI.- The historical SQL files are still present, preserving migration ID compatibility for existing deployments and rollback safety.
go build ./...passes.go test ./modules/backup ./pkg/dbpasses. Fullgo test ./...was attempted but this local environment lacks MySQL/Redis on127.0.0.1, so unrelated integration-style packages failed with connection refused.
yujiawei
left a comment
There was a problem hiding this comment.
Code Review — PR #167 (octo-server)
Verdict: APPROVED
The PR is a careful, well-reasoned removal. The tombstone approach correctly addresses the upgrade/rollback hazard surfaced in the prior review round. No P0/P1 issues found.
1. Verification
| Check | Result | Evidence |
|---|---|---|
go build ./... |
✅ pass | local build clean |
go vet ./... |
✅ pass | clean |
go test ./pkg/db/ -race |
✅ pass | ok github.com/Mininglamp-OSS/octo-server/pkg/db 1.077s |
External symbol references to modules/backup |
✅ none | only _ blank imports remain (internal/modules.go:24, modules/category/testinit_test.go:6) |
BackupConfig / BackupHistory / BackupService Go-symbol leakage |
✅ none | grep across repo returns 0 hits |
/v1/manager/backup/* route bindings |
✅ removed | only doc-comment mention left in modules/backup/1module.go:2 |
register.Module{ SetupAPI: nil, SQLDir: …} is loader-safe |
✅ confirmed | octo-lib/module/module.go:35 guards with if m.SetupAPI != nil (and the same nil-guard for Start/Stop/SetupTask) |
| Migration ID mapping consistency | ✅ consistent | both pkg/db/migration_id_mapping.json and tools/migrate-rename/mapping.json restored to total: 124 with the two backup-* entries (verified vs. the prior-commit revert) |
2. Tombstone approach — correctness analysis
The core tradeoff (delete SQL + purge gorp_migrations rows vs. keep SQL as no-op tombstone) is correctly resolved in favour of the tombstone. The four scenarios all behave safely:
- Existing deployment, post-rename IDs (
20260331000001_backup_legacy01.sqlalready ingorp_migrations) —sql-migratefinds the embedded file, marks already-applied, skips. ✅ - Existing deployment, legacy IDs (
backup-20260331-01.sql) —RewriteLegacyMigrationIDs(pkg/db/migrate_compat.go) renames the row first, then case 1 applies. ✅ - Fresh install — both migrations run as no-ops, creating the (unused)
backup_config/backup_historytables. Inelegant but harmless. ✅ - Rollback to pre-removal release — the old module's
CREATE TABLE(noIF NOT EXISTS, seemodules/backup/sql/20260331000001_backup_legacy01.sql:4) finds the ID already recorded → skipped → no MySQL 1050. ✅
The reverted approach (PurgeRemovedMigrationIDs + delete SQL) would have broken case 4 silently, so reverting it is the right call.
3. Security review (PR is security_sensitive)
This is an attack-surface reduction PR — net positive for security:
- ✅ Removes the
/v1/manager/backup/*superAdmin API surface (config/trigger/history/download/status). Any client request to those paths now 404s at the router (no module registers them). - ✅ Removes the cron scheduler that could trigger arbitrary tar+upload jobs against the WuKongIM data directory.
- ✅ Removes COS credential handling code (
storage_cos.gousedcredentials.NewStaticV4(cfg.AccessKey, cfg.SecretKey, "")). Fewer in-process places where COS access keys are materialised. - ✅ Removes filesystem read of
data_dir(the WuKongIM data directory) — historically a path-traversal-shaped feature that's now simply gone. ⚠️ Data residue (acknowledged, not a blocker):backup_configandbackup_historyrows are intentionally kept on disk per issue #139.backup_history.error_messageisTEXTand could in principle contain operational details from prior runs. There's no API surface to read them anymore, so the residue is dormant — but worth flagging for the human reviewer if the deployment has any compliance constraints around stale operational logs. The follow-up drop migration mentioned in the PR description is the right place to address this.⚠️ Operational regression (documented): Operators previously relying on the built-in backup lose it. The release notes section in the PR body documents the migration paths (k8sVolumeSnapshot, host-level cron, cloud disk snapshot) — please verify the deployment/runbooks are updated downstream before/at release.
Nothing here justifies CHANGES_REQUESTED — the PR is a security improvement.
4. Findings
P0 / P1
None.
P2 (non-blocking, suggestions)
- Smoke-deploy items in the test plan are still unchecked (
Smoke-deploy against a database whose gorp_migrations already contains the two backup IDs,Smoke-deploy against a fresh database,Confirm /v1/manager/backup/* returns 404). These are exactly the cases the tombstone approach exists to make safe — they should be performed at least once before this lands in a release that ships to production. - Frontend audit: this PR removes a public superAdmin API. If
dmwork-web(or any internal admin UI / runbook scripts) has a "Backup" page that hits/v1/manager/backup/configor/trigger, it will start showing 404 errors. Worth a quick grep on the web repo before release. Out of scope for this PR but worth coordinating in the release ticket. - Drop migration follow-up: as noted, please track the eventual
DROP TABLE backup_config / backup_historymigration in a follow-up so the schema doesn't carry dead tables forever. Tying it to an explicit operator opt-in window (e.g. "kept for 2 releases") would be reasonable. - Doc comment in
modules/backup/1module.gois unusually long (24 lines) but it's load-bearing institutional knowledge about a non-obvious sql-migrate failure mode — keeping it is the right call. No change requested.
5. Additional observations
- The two-commit history is well-shaped: commit 1 is the deletion + the (later reverted)
PurgeRemovedMigrationIDsshim, commit 2 is the tombstone correction. The PR description accurately describes the final state. A squash merge would lose the failure-mode reasoning encoded in the second commit message; the maintainers should decide whether that's worth keeping. check-sprint / check-sprintis failing on this PR but it's a project-board metadata check ("Sprint not set"), not a code or build issue.backup_historyschema includesstarted_at/finished_atmigrated fromTIMESTAMPtoDATETIMEin the second SQL file; both migrations are preserved verbatim, so no schema drift between the original module's expectations and the tombstones. ✅
Summary
Approved. The tombstone design is the right answer to the rollback-safety concern, and the security posture improves on net. Please complete the smoke-deploy checks and audit any frontend admin UI before release.
yujiawei
left a comment
There was a problem hiding this comment.
/codex Review — PR #167
Verdict: LGTM ✅
逐条对照派单的 6 个场景化检查点:
1. callers 全清理 ✅
grep -rn 确认仓库中仅剩两处 modules/backup 引用,均为 blank import(_ "...")保留 tombstone 注册:
internal/modules.go:24— 保留,使init()触发register.AddModule注册SQLDirmodules/category/testinit_test.go:6— 测试 init 链保留
无任何 Go 代码 import backup 的类型/函数/变量。go build ./... clean(PR test plan 已验证)。
2. 1module.go +23/-5 残留 ✅ 设计正确
保留内容:
- 24 行 godoc 注释解释 tombstone 存在理由
//go:embed sql+register.AddModule只注册Name: "backup"+SQLDirSetupAPI为 nil:零 HTTP surface,octo-lib/module/module.go:35有 nil-guard 不会 panic
这不是"该删没删",而是 sql-migrate 兼容性要求的最小占位。删掉 SQL 文件 → PlanMigration panic "unknown migration in database"。设计合理。
3. API 兼容性 ✅ 无阻塞
原 /v1/manager/backup/* 路由组(config/trigger/history/download/status)全部随 api_manager.go 删除。SetupAPI=nil → 路由不注册 → 请求 404。
这是 superAdmin API(非用户端),issue #139 body 已明确讨论 deprecation vs outright removal。PR body release notes 段给出了三个替代方案(VolumeSnapshot / host cron / cloud snapshot)。未发现 dmwork-web 有 backup 管理页(grep repo 无命中),但 PR 描述 test plan 有 "confirm returns 404" 待勾。建议在 merge 前完成该 smoke check。非阻塞。
4. 数据库 schema ✅ 故意不 drop
backup_config / backup_history 两表 故意保留在 DB 中,无 drop migration。这与 issue #139 的两步方案一致(step 1 = 删代码保表,step 2 = 后续确认无依赖后再 drop)。
两个 SQL tombstone 文件内容与原始 migration 完全一致(CREATE TABLE + ALTER TIMESTAMP→DATETIME),ID 在 gorp_migrations 中已有 → 跳过。
四种部署场景验证(与 yujiawei review 一致):
- 已有部署(post-rename ID)→ skip ✅
- 已有部署(legacy ID)→
RewriteLegacyMigrationIDs先重命名 → skip ✅ - 全新部署 → 两个 migration 正常跑,建出空表,无害 ✅
- 回滚到旧版本 → ID 已在 gorp_migrations → 旧
CREATE TABLE(无IF NOT EXISTS)被 skip,不会 MySQL 1050 ✅
5. storage_cos.go 删除 ✅ 无生产影响
COS 集成(credentials.NewStaticV4 凭证处理)随模块一起删除。该功能仅在 backup cron 路径中使用,无其他模块 import COS storage 相关类型。删除 = 减少攻击面(COS access key 不再在进程内物化)。
6. 关联 issue #139 ✅ 替代方案明确
Issue #139 body 详细说明了五个移除理由(scope mismatch / deployment fragility / consistency risk / better alternatives / maintenance cost)+ 三个替代方案(K8s VolumeSnapshot / host cron / cloud snapshot)。PR body release notes 段复述了同样的 migration guidance。
汇总
| 分类 | 描述 |
|---|---|
| ✅ LGTM | callers 清理完整,tombstone 设计正确,四种部署场景安全 |
| 🟡 建议 | merge 前完成 PR test plan 中三项未勾 smoke check(特别是 "existing DB + tombstone IDs" 场景) |
| 🟡 建议 | Jerry-Xin 提的 go mod tidy 清理 google/uuid + robfig/cron/v3 可顺手做 |
无阻塞项。
Summary
Closes #139.
Remove the built-in
modules/backupmodule — a cron job that tar+uploaded the WuKongIM data directory to COS, plus a/v1/manager/backup/*superAdmin API surface (~1.6k LoC). Infrastructure-layer backup (volume snapshots, sidecar sync, the service's own tooling) is the right home for this, not the business service. See the issue for the full rationale (scope mismatch, deployment fragility, live-tar consistency risk, maintenance cost).Removed
modules/backup/{api_manager,db,diskcheck,model,scheduler,service,storage,storage_cos}.goanddiskcheck_test.go/v1/manager/backup/*route group (config / trigger / history / download / status)NewManagerKept (rollback-safe tombstone)
modules/backup/sql/20260331000001_backup_legacy01.sqland20260401000001_backup_legacy01.sql— unchanged contentsmodules/backup/1module.gothat registers only the embeddedSQLDir(SetupAPI=nil, no HTTP routes, no scheduler, no services)backup_configandbackup_historytables on disk — no drop migration — so existing production rows are not lostWhy tombstones instead of deleting the SQL files
Deleting the SQL files would leave the corresponding rows in
gorp_migrationsorphaned, and sql-migrate's PlanMigration stage panics withunknown migration in databaseon the next startup of an existing deployment. Purging those rows on startup is also unsafe: rolling back to a pre-removal release would let sql-migrate re-run the originalCREATE TABLE backup_config / backup_history(noIF NOT EXISTS) against the still-present tables and fail at startup with MySQL 1050.The tombstone keeps both IDs recognised by sql-migrate, so:
gorp_migrations→ skipped as applied.CREATE TABLEfinds the ID already recorded and is skipped.Migration guidance for operators (release notes)
VolumeSnapshotagainst the WuKongIM PVC.cronjob or use cloud disk snapshots.backup_config/backup_historyonce we are confident no deployment depends on the data.Test plan
go build ./...go vet ./...go test ./pkg/db/ -racegorp_migrationsalready contains the two backup IDs (post-rewrite form) and confirm sql-migrate plans cleanly with no panic./v1/manager/backup/*returns 404 after deploy.