feat(server): implement multi-tenancy provider with file and HTTP backends#979
feat(server): implement multi-tenancy provider with file and HTTP backends#979Pangjiping wants to merge 9 commits into
Conversation
Pangjiping
left a comment
There was a problem hiding this comment.
Solid architecture — TenantProvider protocol, ContextVar propagation, namespace-scoped isolation all look correct. One critical issue (sync HTTP blocking the event loop) should be fixed before merge; the rest are improvements.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d02b0be6c3
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Pangjiping
left a comment
There was a problem hiding this comment.
Thanks for the thorough review. All feedback addressed:
Critical (sync HTTP): Already using asyncio.to_thread() in auth middleware to offload lookup() from the event loop.
Thundering herd: Already implemented per-key singleflight with threading.Event in _fetch_and_cache.
HTTPS enforcement: Warning logged at startup for non-HTTPS endpoints.
Naming / tests / frozen dataclass: resolve_tenants_path renamed, validate_tenant_config extracted and tested, api_keys changed to Tuple[str, ...].
Additional fixes in follow-up commit:
- Helm chart: removed
subPathfrom tenants ConfigMap mount → directory mount enables hot-reload - HTTP provider: leader errors propagated to singleflight waiters (no more 401 for provider outages)
- K8s service: cross-namespace sandbox lookup fallback for background renew workers and proxy path
- Spec: namespace validation deferred to sandbox creation time
e49fab8 to
835d150
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 835d150352
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 65ed62764b
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
…kends Wire TenantProvider interface through full request chain: - TenantProvider Protocol + FileTenantProvider + HTTPTenantProvider - Auth middleware multi-tenant/single-tenant mode branching - ContextVar-based tenant propagation (tenants/context.py) - Dynamic namespace resolution in KubernetesSandboxService - [tenants] config section in server.toml (provider=file|http) - Startup guards: docker+tenants fatal, api_key+tenants mutual exclusion - HTTP provider: per-key TTL cache, sync fetch, max_stale, 401 eviction - 27 e2e tests covering both providers end-to-end - Update OSEP-0012 with formal interface spec and HTTP contract Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- New script scripts/python-k8s-e2e-multi-tenant.sh with file/http variants - Add multi-tenant-file and multi-tenant-http matrix entries to nightly workflow - Helm chart: support tenantsToml value for mounting tenants.toml ConfigMap - HTTP variant: mock Python provider on host, accessible via Kind bridge IP - Both variants verify auth (valid key→200, invalid→401, missing→401) then run full SDK kubernetes-mini test suite with tenant key Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use Kind control-plane container's gateway IP as primary detection, with fallbacks. Add pre-deploy reachability check for the mock endpoint. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix sync HTTP blocking event loop: use asyncio.to_thread() in auth middleware - Add per-key singleflight pattern to prevent thundering herd on TTL expiry - Log warning at startup when HTTP tenant endpoint is not HTTPS - Rename _resolve_tenants_path to resolve_tenants_path (public API) - Use tuple for TenantEntry.api_keys for true frozen immutability - Extract validate_tenant_config() so startup guards are testable - Add [tenants] section to configuration.md, README, and example configs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Helm chart: remove subPath from tenants ConfigMap mount, enabling hot-reload via directory mount with kubelet-driven updates - HTTP provider: propagate leader fetch errors to singleflight waiters so provider outages return 503 instead of 401 INVALID_API_KEY - K8s service: add cross-namespace sandbox lookup fallback for background workers (renew, proxy) that lack tenant context - Spec: defer namespace validation from startup to sandbox creation - Wire tenant_provider into sandbox_service after import Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two tenant-isolation fixes: 1. Store namespace on SnapshotRecord, filter listings and verify ownership on get/delete/restore so cross-tenant snapshot access is blocked. 2. Pass tenant-resolved namespace into SnapshotRuntime create_snapshot and delete_snapshot so SandboxSnapshot CRs are created in the correct tenant namespace instead of the static config default. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
65ed627 to
ca2802f
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ca2802ffe8
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| record = SnapshotRecord( | ||
| id=str(uuid4()), | ||
| source_sandbox_id=sandbox_id, | ||
| namespace=namespace, |
There was a problem hiding this comment.
Pass snapshot namespaces during recovery
Because new snapshot records now store the tenant namespace, restart recovery also needs to use that stored namespace. The recovery path still calls inspect_snapshot(record.id, ...) and delete_snapshot(record.id, ...) without namespace=record.namespace, so a server restart while a tenant snapshot is CREATING/DELETING makes the Kubernetes runtime inspect/delete the CR in its configured default namespace instead of the tenant namespace, causing ready snapshots to be marked failed or deleting snapshots to leave their CR behind.
Useful? React with 👍 / 👎.
| egress_mode: str = EGRESS_MODE_DNS, | ||
| credential_proxy_enabled: bool = False, | ||
| ) -> Dict[str, Any]: |
There was a problem hiding this comment.
Restore BatchSandbox credential proxy parameter
In Kubernetes deployments using the default batchsandbox workload provider, sandbox creation now fails before reaching Kubernetes because KubernetesSandboxService.create_sandbox() still always passes the credential_proxy_enabled keyword to create_workload, but this signature no longer accepts it. That TypeError is converted into a 500 for every BatchSandbox create request; restoring this parameter and forwarding it to the egress/main-container setup also preserves credential-proxy behavior.
Useful? React with 👍 / 👎.
- Use sandbox_service.namespace as fallback for single-tenant snapshot creation instead of hardcoded "default" - Track snapshot_id -> namespace in KubernetesSnapshotRuntime so inspect/wait/delete use the creation namespace, not static config - Use .get() instead of .pop() for inflight_error so all waiters see the leader's error, not just the first one - Convert K8s namespace-not-found errors to 400 instead of 500 in sandbox creation path Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
Summary
Wire TenantProvider interface through full request chain:
Testing
Breaking Changes
Checklist