Skip to content

v0.3.0: WeChat API proxy + vault hardening + mmp resume#2

Merged
Nowhitestar merged 2 commits into
mainfrom
v0.3-wechat-proxy
May 6, 2026
Merged

v0.3.0: WeChat API proxy + vault hardening + mmp resume#2
Nowhitestar merged 2 commits into
mainfrom
v0.3-wechat-proxy

Conversation

@Nowhitestar
Copy link
Copy Markdown
Owner

Summary

Three independent improvements landed together as the v0.3.0 release:

P0a — WeChat API proxy support

Real-account verification of v0.2 surfaced split-routing as recurring user pain: home / mobile / cafe / office IPs all change, and split routing can send api.weixin.qq.com traffic via a different egress than ipinfo.io sees. WeChat's 50-IP whitelist eventually fills.

WECHAT_API_PROXY=https://my-bastion.com routes all WeChat API calls through a user-chosen static-IP bastion. Whitelist the bastion once; stop chasing IP drift.

docs/wechat-api-proxy.md ships a complete Cloudflare Worker template (~30 lines), discusses bastion options (Workers / VPS / Vercel / Tailscale), security considerations, troubleshooting.

P0b — Vault hardening

3 fixes to known v0.2 vault concerns (Plan 1 review I1 + I2 + lost-key UX):

  1. Atomic write: _atomic_write_bytes() writes to <vault>.tmp, fsyncs, os.replaces. Prevents zero-byte vault on crash / OOM mid-write.
  2. Concurrent set protection: fcntl.flock on a separate vault.lock file wraps the whole read-modify-write cycle in CredentialStore.set/delete. Verified via 4-thread concurrent-write test (previously caused pyrage.DecryptError; now all 4 writes survive).
  3. Lost-key UX: if age-key.txt is deleted but credentials.json.age remains, _read_or_create_key no longer silently regenerates a new identity (which would permanently lock the vault). Raises VaultIntegrityError with remediation guidance instead. First-use path (no vault yet) still auto-generates normally.

P1a — mmp resume <run-dir>

Replaces the v0.2 stub. Target-level retry: re-runs prepare + execute for any target whose previous status != ok. Already-ok targets are skipped (logged RESUME_SKIP) and their previous result carried forward into the new result.json so external_id / mode_actual aren't lost.

mmp resume <run-dir>              # rerun every non-ok target
mmp resume <run-dir> --target X   # rerun only target X

Future v0.4+ work: step-level resume using Run.checkpoint() for finer granularity (skip already-uploaded thumbs etc). The data model (ProviderExecutionError.retryable, Run.checkpoint/read_checkpoint) is in place from Plan 1; v0.3 just doesn't consume it yet.

Test Plan

  • make test — ruff check + ruff format check + mypy + 111 pytest + smoke
  • 5 new unit tests for _api_base() proxy semantics
  • 4 new unit tests for vault hardening (atomic write, concurrent set, lost-key)
  • 4 new integration tests for mmp resume (skip-ok, target filter, missing-dir, missing-result)
  • CI matrix green (ubuntu + macos × py3.10/3.11/3.12)

v0.3.x roadmap (not in this PR)

  • Real connectors for x-article + substack (browser automation; v0.3.1)
  • Drop deprecation shims (6 scripts; v0.3.x)
  • User-folder provider trust prompt (closes safety-policy.md / provider-contract.md doc-vs-impl gap; v0.3.x)
  • Manual verification of remaining providers (wechat-image guide accuracy, etc.)

🤖 Generated with Claude Code

## P0a — WECHAT_API_PROXY env support

Real-account verification of v0.2 surfaced split-routing as a recurring
pain: home / mobile / cafe / office IPs all change, and split routing
can send `api.weixin.qq.com` traffic via different egress than
`ipinfo.io` sees. WeChat's 50-IP whitelist eventually fills.

This change routes WeChat API calls through a user-chosen static-IP
bastion. Whitelist the bastion once; stop chasing IP drift.

- `_api_base()` reads `WECHAT_API_PROXY` env per call (not at import)
- Strips trailing slash; appends `/cgi-bin`
- Empty / whitespace-only env falls back to `api.weixin.qq.com`
- Module-level `API_BASE` constant kept for back-compat

`docs/wechat-api-proxy.md` walks through bastion options (Cloudflare
Workers, VPS, Vercel, Tailscale), ships a complete `worker.js`
template (~30 lines), explains IP whitelist setup, security
considerations, troubleshooting, and verification flow.

5 new tests in `test_api_proxy.py`.

## P0b — Vault hardening (3 fixes from Plan 1 review I1+I2 + lost-key)

1. **Atomic write**: `_atomic_write_bytes()` writes to `<vault>.tmp`,
   `os.fsync`, then `os.replace`. Prevents zero-byte vault on crash /
   ^C / OOM mid-write.

2. **Concurrent set protection**: `_vault_lock()` is a context manager
   over `fcntl.flock` on `vault.lock` (separate file from vault, so
   `os.replace` isn't blocked by the lock handle). The whole
   read-modify-write cycle in `CredentialStore.set` / `delete` runs
   inside the lock via the new `Backend.update(mutator)` API.

   Verified via 4-thread test: previously caused
   `pyrage.DecryptError: failed to fill whole buffer` AND lost
   updates; now all 4 writes survive.

3. **Lost-key UX**: if `age-key.txt` is deleted but `credentials.json.age`
   remains, `_read_or_create_key` no longer silently regenerates a new
   identity (which would permanently lock the user out). Instead raises
   new `VaultIntegrityError(MMPError)` with remediation guidance: "restore
   key from backup or delete vault to start over".

   First-use path (no vault exists yet) still auto-generates the key
   normally — only refuses when vault data is at risk.

4 new tests in `test_credentials.py`:
- `test_atomic_write_no_zero_byte_on_crash`
- `test_concurrent_set_no_lost_update` (4 threads)
- `test_lost_key_with_existing_vault_raises`
- `test_lost_key_with_no_vault_regenerates`
v0.3 baseline implementation of `mmp resume <run-dir> [--target X]`.
Re-runs prepare+execute for any target whose previous status != ok.
Targets already at status=ok are skipped (logged as RESUME_SKIP) and
their previous result is carried forward into the new result.json so
external_id / mode_actual aren't lost.

This is the simple version. Future v0.4+ work: step-level resume using
Run.checkpoint() to skip already-uploaded thumbs etc. The data model
(ProviderExecutionError.retryable, Run.checkpoint/read_checkpoint) is
already in place from Plan 1; v0.3 just doesn't consume it yet.

CLI:
  mmp resume <run-dir>              # rerun every non-ok target
  mmp resume <run-dir> --target X   # rerun only target X

Tests in tests/integration/test_resume_e2e.py:
- skips ok targets, reruns failed
- --target filter touches only chosen target
- missing run dir → exit 2 with clear error
- run dir missing manifest.yaml or result.json → exit 2

Log markers added: RESUME_START, RESUME_SKIP, RESUME_RETRY (alongside
existing RUN/TARGET markers).
@Nowhitestar Nowhitestar merged commit d5d0508 into main May 6, 2026
6 checks passed
Nowhitestar added a commit that referenced this pull request May 7, 2026
* feat(v0.3-P0): WeChat API proxy + vault hardening

## P0a — WECHAT_API_PROXY env support

Real-account verification of v0.2 surfaced split-routing as a recurring
pain: home / mobile / cafe / office IPs all change, and split routing
can send `api.weixin.qq.com` traffic via different egress than
`ipinfo.io` sees. WeChat's 50-IP whitelist eventually fills.

This change routes WeChat API calls through a user-chosen static-IP
bastion. Whitelist the bastion once; stop chasing IP drift.

- `_api_base()` reads `WECHAT_API_PROXY` env per call (not at import)
- Strips trailing slash; appends `/cgi-bin`
- Empty / whitespace-only env falls back to `api.weixin.qq.com`
- Module-level `API_BASE` constant kept for back-compat

`docs/wechat-api-proxy.md` walks through bastion options (Cloudflare
Workers, VPS, Vercel, Tailscale), ships a complete `worker.js`
template (~30 lines), explains IP whitelist setup, security
considerations, troubleshooting, and verification flow.

5 new tests in `test_api_proxy.py`.

## P0b — Vault hardening (3 fixes from Plan 1 review I1+I2 + lost-key)

1. **Atomic write**: `_atomic_write_bytes()` writes to `<vault>.tmp`,
   `os.fsync`, then `os.replace`. Prevents zero-byte vault on crash /
   ^C / OOM mid-write.

2. **Concurrent set protection**: `_vault_lock()` is a context manager
   over `fcntl.flock` on `vault.lock` (separate file from vault, so
   `os.replace` isn't blocked by the lock handle). The whole
   read-modify-write cycle in `CredentialStore.set` / `delete` runs
   inside the lock via the new `Backend.update(mutator)` API.

   Verified via 4-thread test: previously caused
   `pyrage.DecryptError: failed to fill whole buffer` AND lost
   updates; now all 4 writes survive.

3. **Lost-key UX**: if `age-key.txt` is deleted but `credentials.json.age`
   remains, `_read_or_create_key` no longer silently regenerates a new
   identity (which would permanently lock the user out). Instead raises
   new `VaultIntegrityError(MMPError)` with remediation guidance: "restore
   key from backup or delete vault to start over".

   First-use path (no vault exists yet) still auto-generates the key
   normally — only refuses when vault data is at risk.

4 new tests in `test_credentials.py`:
- `test_atomic_write_no_zero_byte_on_crash`
- `test_concurrent_set_no_lost_update` (4 threads)
- `test_lost_key_with_existing_vault_raises`
- `test_lost_key_with_no_vault_regenerates`

* feat(v0.3-P1a): mmp resume — target-level retry on existing run dir

v0.3 baseline implementation of `mmp resume <run-dir> [--target X]`.
Re-runs prepare+execute for any target whose previous status != ok.
Targets already at status=ok are skipped (logged as RESUME_SKIP) and
their previous result is carried forward into the new result.json so
external_id / mode_actual aren't lost.

This is the simple version. Future v0.4+ work: step-level resume using
Run.checkpoint() to skip already-uploaded thumbs etc. The data model
(ProviderExecutionError.retryable, Run.checkpoint/read_checkpoint) is
already in place from Plan 1; v0.3 just doesn't consume it yet.

CLI:
  mmp resume <run-dir>              # rerun every non-ok target
  mmp resume <run-dir> --target X   # rerun only target X

Tests in tests/integration/test_resume_e2e.py:
- skips ok targets, reruns failed
- --target filter touches only chosen target
- missing run dir → exit 2 with clear error
- run dir missing manifest.yaml or result.json → exit 2

Log markers added: RESUME_START, RESUME_SKIP, RESUME_RETRY (alongside
existing RUN/TARGET markers).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant