v0.3.0: WeChat API proxy + vault hardening + mmp resume#2
Merged
Conversation
## P0a — WECHAT_API_PROXY env support Real-account verification of v0.2 surfaced split-routing as a recurring pain: home / mobile / cafe / office IPs all change, and split routing can send `api.weixin.qq.com` traffic via different egress than `ipinfo.io` sees. WeChat's 50-IP whitelist eventually fills. This change routes WeChat API calls through a user-chosen static-IP bastion. Whitelist the bastion once; stop chasing IP drift. - `_api_base()` reads `WECHAT_API_PROXY` env per call (not at import) - Strips trailing slash; appends `/cgi-bin` - Empty / whitespace-only env falls back to `api.weixin.qq.com` - Module-level `API_BASE` constant kept for back-compat `docs/wechat-api-proxy.md` walks through bastion options (Cloudflare Workers, VPS, Vercel, Tailscale), ships a complete `worker.js` template (~30 lines), explains IP whitelist setup, security considerations, troubleshooting, and verification flow. 5 new tests in `test_api_proxy.py`. ## P0b — Vault hardening (3 fixes from Plan 1 review I1+I2 + lost-key) 1. **Atomic write**: `_atomic_write_bytes()` writes to `<vault>.tmp`, `os.fsync`, then `os.replace`. Prevents zero-byte vault on crash / ^C / OOM mid-write. 2. **Concurrent set protection**: `_vault_lock()` is a context manager over `fcntl.flock` on `vault.lock` (separate file from vault, so `os.replace` isn't blocked by the lock handle). The whole read-modify-write cycle in `CredentialStore.set` / `delete` runs inside the lock via the new `Backend.update(mutator)` API. Verified via 4-thread test: previously caused `pyrage.DecryptError: failed to fill whole buffer` AND lost updates; now all 4 writes survive. 3. **Lost-key UX**: if `age-key.txt` is deleted but `credentials.json.age` remains, `_read_or_create_key` no longer silently regenerates a new identity (which would permanently lock the user out). Instead raises new `VaultIntegrityError(MMPError)` with remediation guidance: "restore key from backup or delete vault to start over". First-use path (no vault exists yet) still auto-generates the key normally — only refuses when vault data is at risk. 4 new tests in `test_credentials.py`: - `test_atomic_write_no_zero_byte_on_crash` - `test_concurrent_set_no_lost_update` (4 threads) - `test_lost_key_with_existing_vault_raises` - `test_lost_key_with_no_vault_regenerates`
v0.3 baseline implementation of `mmp resume <run-dir> [--target X]`. Re-runs prepare+execute for any target whose previous status != ok. Targets already at status=ok are skipped (logged as RESUME_SKIP) and their previous result is carried forward into the new result.json so external_id / mode_actual aren't lost. This is the simple version. Future v0.4+ work: step-level resume using Run.checkpoint() to skip already-uploaded thumbs etc. The data model (ProviderExecutionError.retryable, Run.checkpoint/read_checkpoint) is already in place from Plan 1; v0.3 just doesn't consume it yet. CLI: mmp resume <run-dir> # rerun every non-ok target mmp resume <run-dir> --target X # rerun only target X Tests in tests/integration/test_resume_e2e.py: - skips ok targets, reruns failed - --target filter touches only chosen target - missing run dir → exit 2 with clear error - run dir missing manifest.yaml or result.json → exit 2 Log markers added: RESUME_START, RESUME_SKIP, RESUME_RETRY (alongside existing RUN/TARGET markers).
Nowhitestar
added a commit
that referenced
this pull request
May 7, 2026
* feat(v0.3-P0): WeChat API proxy + vault hardening ## P0a — WECHAT_API_PROXY env support Real-account verification of v0.2 surfaced split-routing as a recurring pain: home / mobile / cafe / office IPs all change, and split routing can send `api.weixin.qq.com` traffic via different egress than `ipinfo.io` sees. WeChat's 50-IP whitelist eventually fills. This change routes WeChat API calls through a user-chosen static-IP bastion. Whitelist the bastion once; stop chasing IP drift. - `_api_base()` reads `WECHAT_API_PROXY` env per call (not at import) - Strips trailing slash; appends `/cgi-bin` - Empty / whitespace-only env falls back to `api.weixin.qq.com` - Module-level `API_BASE` constant kept for back-compat `docs/wechat-api-proxy.md` walks through bastion options (Cloudflare Workers, VPS, Vercel, Tailscale), ships a complete `worker.js` template (~30 lines), explains IP whitelist setup, security considerations, troubleshooting, and verification flow. 5 new tests in `test_api_proxy.py`. ## P0b — Vault hardening (3 fixes from Plan 1 review I1+I2 + lost-key) 1. **Atomic write**: `_atomic_write_bytes()` writes to `<vault>.tmp`, `os.fsync`, then `os.replace`. Prevents zero-byte vault on crash / ^C / OOM mid-write. 2. **Concurrent set protection**: `_vault_lock()` is a context manager over `fcntl.flock` on `vault.lock` (separate file from vault, so `os.replace` isn't blocked by the lock handle). The whole read-modify-write cycle in `CredentialStore.set` / `delete` runs inside the lock via the new `Backend.update(mutator)` API. Verified via 4-thread test: previously caused `pyrage.DecryptError: failed to fill whole buffer` AND lost updates; now all 4 writes survive. 3. **Lost-key UX**: if `age-key.txt` is deleted but `credentials.json.age` remains, `_read_or_create_key` no longer silently regenerates a new identity (which would permanently lock the user out). Instead raises new `VaultIntegrityError(MMPError)` with remediation guidance: "restore key from backup or delete vault to start over". First-use path (no vault exists yet) still auto-generates the key normally — only refuses when vault data is at risk. 4 new tests in `test_credentials.py`: - `test_atomic_write_no_zero_byte_on_crash` - `test_concurrent_set_no_lost_update` (4 threads) - `test_lost_key_with_existing_vault_raises` - `test_lost_key_with_no_vault_regenerates` * feat(v0.3-P1a): mmp resume — target-level retry on existing run dir v0.3 baseline implementation of `mmp resume <run-dir> [--target X]`. Re-runs prepare+execute for any target whose previous status != ok. Targets already at status=ok are skipped (logged as RESUME_SKIP) and their previous result is carried forward into the new result.json so external_id / mode_actual aren't lost. This is the simple version. Future v0.4+ work: step-level resume using Run.checkpoint() to skip already-uploaded thumbs etc. The data model (ProviderExecutionError.retryable, Run.checkpoint/read_checkpoint) is already in place from Plan 1; v0.3 just doesn't consume it yet. CLI: mmp resume <run-dir> # rerun every non-ok target mmp resume <run-dir> --target X # rerun only target X Tests in tests/integration/test_resume_e2e.py: - skips ok targets, reruns failed - --target filter touches only chosen target - missing run dir → exit 2 with clear error - run dir missing manifest.yaml or result.json → exit 2 Log markers added: RESUME_START, RESUME_SKIP, RESUME_RETRY (alongside existing RUN/TARGET markers).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three independent improvements landed together as the v0.3.0 release:
P0a — WeChat API proxy support
Real-account verification of v0.2 surfaced split-routing as recurring user pain: home / mobile / cafe / office IPs all change, and split routing can send
api.weixin.qq.comtraffic via a different egress thanipinfo.iosees. WeChat's 50-IP whitelist eventually fills.WECHAT_API_PROXY=https://my-bastion.comroutes all WeChat API calls through a user-chosen static-IP bastion. Whitelist the bastion once; stop chasing IP drift.docs/wechat-api-proxy.mdships a complete Cloudflare Worker template (~30 lines), discusses bastion options (Workers / VPS / Vercel / Tailscale), security considerations, troubleshooting.P0b — Vault hardening
3 fixes to known v0.2 vault concerns (Plan 1 review I1 + I2 + lost-key UX):
_atomic_write_bytes()writes to<vault>.tmp, fsyncs,os.replaces. Prevents zero-byte vault on crash / OOM mid-write.fcntl.flockon a separatevault.lockfile wraps the whole read-modify-write cycle inCredentialStore.set/delete. Verified via 4-thread concurrent-write test (previously causedpyrage.DecryptError; now all 4 writes survive).age-key.txtis deleted butcredentials.json.ageremains,_read_or_create_keyno longer silently regenerates a new identity (which would permanently lock the vault). RaisesVaultIntegrityErrorwith remediation guidance instead. First-use path (no vault yet) still auto-generates normally.P1a —
mmp resume <run-dir>Replaces the v0.2 stub. Target-level retry: re-runs
prepare + executefor any target whose previous status != ok. Already-ok targets are skipped (loggedRESUME_SKIP) and their previous result carried forward into the newresult.jsonsoexternal_id/mode_actualaren't lost.Future v0.4+ work: step-level resume using
Run.checkpoint()for finer granularity (skip already-uploaded thumbs etc). The data model (ProviderExecutionError.retryable,Run.checkpoint/read_checkpoint) is in place from Plan 1; v0.3 just doesn't consume it yet.Test Plan
make test— ruff check + ruff format check + mypy + 111 pytest + smoke_api_base()proxy semanticsmmp resume(skip-ok, target filter, missing-dir, missing-result)v0.3.x roadmap (not in this PR)
🤖 Generated with Claude Code