Support persist sandbox metadaba to database#730
Open
zhangjaycee wants to merge 13 commits intoalibaba:masterfrom
Open
Support persist sandbox metadaba to database#730zhangjaycee wants to merge 13 commits intoalibaba:masterfrom
zhangjaycee wants to merge 13 commits intoalibaba:masterfrom
Conversation
Add DatabaseConfig dataclass (url field) to rock/config.py and wire it into RockConfig both as a field and in the from_env() YAML parser.
- Add Base(DeclarativeBase) as the single SQLAlchemy declarative base - Add SandboxRecord ORM model with all sandbox metadata columns - Add LIST_BY_ALLOWLIST and _NOT_NULL_DEFAULTS class-level constants - Add DatabaseProvider with async engine/session factory - Add DatabaseConfig dataclass to RockConfig - _convert_url handles sqlite://, postgresql://, and postgres:// (Heroku) shorthand; URLs with existing driver specifier pass through unchanged - Default state column value uses string literal "pending" instead of State.PENDING enum instance for explicit column semantics
- Add SandboxTable with insert/get/update/delete/list_by/list_by_in - _filter_data strips unknown keys; _NOT_NULL_DEFAULTS fills NOT NULL cols - LIST_BY_ALLOWLIST prevents arbitrary column queries (injection guard) - _record_to_sandbox_info uses lru_cache to avoid repeated get_type_hints calls in bulk list_by scenarios - Add SandboxInfoField generated type and generation script
- Redis alive/timeout keys remain the source of truth for live state
- DB writes are fire-and-forget via asyncio.create_task + _safe_db_call
- batch_get: Redis hits served directly; DB fallback uses a single
list_by_in("sandbox_id", miss_ids) query instead of N serial gets,
leveraging the primary key index for O(1) lookup per row
- iter_alive_sandbox_ids queries DB by state IN (running, pending)
instead of Redis scan_iter, enabling indexed filtering
…e to meta_repo - Replace MetaStore with SandboxRepository throughout SandboxManager, GemManager, BaseManager, and SandboxProxyService - Wire SandboxRepository (Redis + SandboxTable) in admin/main.py startup - stop(): add early return after archive() in the ValueError except branch to prevent double archive when the Ray actor is already gone Made-with: Cursor
- Add TestSandboxTableWithSQLite: full CRUD coverage using SQLite in-memory database (no external dependencies, runs in fast CI) including list_by_in, NOT NULL defaults, and noop-on-missing-id cases - Add TestSandboxTableWithPostgres: PostgreSQL-specific tests (JSONB, real container) marked need_docker + need_database - Add comprehensive SandboxRepository tests: create/update/delete/archive/ get/exists/batch_get/list_by/refresh_timeout/is_expired - Consistent lowercase "stopped" state string throughout test data, matching the State enum value convention (running/pending)
- Add single-column indexes on all commonly queried fields (user_id, state, namespace, experiment_id, cluster_name, image, host_ip, host_name, create_user_gray_flag) - Add scripts/gen_ddl.py to emit CREATE TABLE / CREATE INDEX DDL - Add *.db and ddl/ to .gitignore (generated artifacts)
OperatorContext was missing redis_provider, leaving RayOperator._redis_provider as None. This caused the use_rocklet get_status path to crash with 'NoneType object has no attribute get' because build_sandbox_from_redis skips the lookup entirely when redis_provider is None.
StephenRi
reviewed
Apr 2, 2026
StephenRi
reviewed
Apr 2, 2026
StephenRi
reviewed
Apr 2, 2026
- Rename class SandboxRepository to SandboxMetaStore to better reflect its role as a coordinator for Redis (hot path) + DB (query path) dual-write - Rename _meta_repo to _meta_store across all files - Rename sandbox_repository.py to sandbox_meta_store.py - Update all imports and references - Use legacy states (_TERMINAL_STATES, _LIST_BY_BLACKLIST) from SandboxMetaStore as the authoritative source; removed duplicate definitions elsewhere
- Add rock/sandbox/utils/timeout.py with SandboxTimeoutHelper: pure calculation helpers (make_timeout_info, refresh_timeout, is_expired) with no I/O dependency - Add SandboxMetaStore.update_timeout() for raw Redis set of timeout key; remove refresh_timeout() and is_expired() from MetaStore (not its responsibility) - SandboxManager._refresh_timeout / _is_expired: own the I/O (get_timeout + update_timeout) and delegate calculation to SandboxTimeoutHelper - SandboxProxyService._update_expire_time: same pattern - Replace inline auto_clear_time_dict construction in start_async with SandboxTimeoutHelper.make_timeout_info() - Update tests: replace TestRefreshTimeout/TestIsExpired in test_sandbox_meta_store with TestUpdateTimeout; add test_sandbox_timeout.py for pure unit tests
StephenRi
reviewed
Apr 2, 2026
StephenRi
reviewed
Apr 2, 2026
Schema - Add spec (JSONB): DockerDeploymentConfig.model_dump() snapshot, written once at creation, never updated - Add status (JSONB): full SandboxInfo snapshot, overwritten on every update SandboxRecordData - New TypedDict in schema.py extending SandboxInfo with spec/status fields - Used as the unified I/O type for SandboxTable (replaces plain SandboxInfo) SandboxTable - create(): writes spec from caller; auto-populates status from data - update(): always overwrites status with latest SandboxInfo snapshot - list_by / list_by_in / get return SandboxRecordData SandboxMetaStore - create() gains spec: dict | None parameter; constructs SandboxRecordData before passing to SandboxTable; Redis path unchanged SandboxManager - start_async passes spec=docker_deployment_config.model_dump() to meta_store Cleanup - Remove SandboxInfoField generated Literal type and its generation script - Replace SandboxInfoField with plain str in SandboxTable / SandboxMetaStore (LIST_BY_ALLOWLIST already enforces valid column names at runtime)
… in SandboxTable - Remove async_sessionmaker, _session_factory, and session() factory from DatabaseProvider - Add engine property that raises RuntimeError if not initialised - Update all SandboxTable methods to use AsyncSession(self._db.engine) directly - Simpler, more explicit session lifecycle with no factory indirection
StephenRi
reviewed
Apr 3, 2026
StephenRi
reviewed
Apr 3, 2026
StephenRi
reviewed
Apr 3, 2026
| async def list_sandboxes(self, query_params: SandboxQueryParams) -> SandboxListResponse: | ||
| if self._redis_provider is None: | ||
| logger.warning("Redis provider is not available, list_sandboxes returning empty result") | ||
| async def list_sandboxes( |
Collaborator
There was a problem hiding this comment.
get接口兼容性设计。如果stop,仍要报错。一期跟原逻辑保持一致。二期加参数判断
Collaborator
Author
There was a problem hiding this comment.
get 接口这个 PR 之前前 stop 会报错,所以先不兼容;list 接口默认兼容原来的两个状态 (PENDING/RUNNING),在 query params 加入 use_legacy_states=false 会增加返回 STOPPED 状态的 sandboxes
StephenRi
reviewed
Apr 3, 2026
| logger.info(f"list sandboxes with filters: {query_params}, page: {page}, page_size: {page_size}") | ||
| try: | ||
| all_sandbox_data = await self.list_all_sandboxes_by_query_params(query_params) | ||
| all_sandbox_data = await self.list_all_sandboxes_by_query_params(query_params, use_legacy_states) |
Collaborator
Author
There was a problem hiding this comment.
请求可以用 use_legacy_states 参数控制,默认use_legacy_states=true兼容原接口行为,可以用use_legacy_states=false返回带 stopped 状态的 sandbox 列表
StephenRi
reviewed
Apr 3, 2026
StephenRi
reviewed
Apr 3, 2026
rock/sandbox/sandbox_meta_store.py
Outdated
| sandbox_table: SandboxTable | None = None, | ||
| ) -> None: | ||
| self._redis: RedisProvider | None = redis_provider | ||
| self._db: SandboxTable | None = sandbox_table |
Collaborator
There was a problem hiding this comment.
db一定存在。去掉判断。
后面把redis也重构一下,一定存在。
Collaborator
Author
There was a problem hiding this comment.
db_url 不指定时用 sqlite-memory 模式;redis config 不指定时用 fakeredis 库
- Fallback to sqlite-memory when database.url is not configured - Fallback to FakeRedis when redis.host is not configured - SandboxMetaStore now requires both providers (no more None checks) - batch_get() returns only found sandboxes (no positional None slots) - list_by() raises ValueError for non-allowlisted fields (no Redis fallback) - list_sandboxes and batch_get_status expose use_legacy_states param - Update tests to reflect new behaviour
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
close #729