Conversation
Implements COE-228: Session lifecycle, credentials, and harness rendering Key features: - Session manager service with lifecycle transitions - Session-scoped proxy credentials with unique aliases - Harness profile env rendering (shell, dotenv, json) - Git metadata capture from active repository - Outcome states and artifact registry All 41 tests pass.
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 80bcacb8c0
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| repository = InMemorySessionRepository() | ||
| manager = SessionManager(settings=settings, session_repository=repository) |
There was a problem hiding this comment.
Persist sessions across CLI invocations
Each subcommand instantiates its own InMemorySessionRepository (create here, and the same pattern repeats in finalize, note, show, and list). After bench session create exits, that in-memory store is discarded, so a later bench session finalize <id> or bench session show <id> cannot retrieve the session and the documented multi-step workflow is unusable from the CLI.
Useful? React with 👍 / 👎.
| session_obj = await manager.finalize_session( | ||
| UUID(session_id), | ||
| outcome=outcome, | ||
| ) |
There was a problem hiding this comment.
Pass a SessionFinalize model to finalize_session
SessionManager.finalize_session takes a single SessionFinalize argument, but this call passes a UUID plus an outcome keyword. Running bench session finalize will therefore raise a TypeError before it even attempts to load the session, so the finalize command cannot succeed.
Useful? React with 👍 / 👎.
| session = Session( | ||
| operator_label=create_input.operator_label, | ||
| git_metadata=git_metadata, | ||
| ) |
There was a problem hiding this comment.
Preserve chosen variant/task metadata on new sessions
SessionCreate carries experiment_name, variant_name, task_card_name, and harness_profile_name, but create_session only copies operator_label and git_metadata into the saved Session. As a result every created session loses the benchmark configuration it was launched with, leaving the core correlation fields unset and making later comparisons/reporting unable to distinguish one variant/task selection from another.
Useful? React with 👍 / 👎.
| lines.append("# Anthropic-surface harness") | ||
| lines.append("export ANTHROPIC_BASE_URL=\"${STACKPERF_PROXY_BASE_URL}/v1\"") | ||
| lines.append("export ANTHROPIC_API_KEY=\"${STACKPERF_SESSION_API_KEY}\"") | ||
| lines.append("") | ||
| lines.append("# OpenAI-surface harness") |
There was a problem hiding this comment.
Render env from the selected harness profile
bench session create ignores --harness and --variant here and emits a hard-coded Anthropic/OpenAI snippet instead of loading the chosen configs through HarnessRenderer. That means harness-specific variable names, model aliases, and variant overrides never reach the operator's env file, so launching a non-default harness or a variant with overrides will use the wrong settings.
Useful? React with 👍 / 👎.
| # Transition to pending | ||
| session.status = SessionStatus.PENDING |
There was a problem hiding this comment.
Provide an activation path before completion
create_session leaves every new record in pending, and I checked src/cli/session.py but there is no command that calls activate_session. Because SessionManager.finalize_session only allows the default completed transition from ACTIVE, the normal create → run harness → finalize workflow can never record a successful session.
Useful? React with 👍 / 👎.
- Consolidated config files to use plural naming (providers, harnesses, etc.) - Added description and metadata fields to plural config models - Updated test_renderer.py to include all required benchmark tags - Removed duplicate singular config files (provider.py, harness.py, etc.) - All 63 tests passing
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6e0768cd84
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
|
||
|
|
||
| # Import and register commands after groups are defined | ||
| from . import session as session_commands # noqa: E402, F401 |
There was a problem hiding this comment.
Register the config command group on the CLI entry point
src/cli/main.py only defines/imports session; I also checked src/cli/ for any add_command(config) or from . import config wiring and found none. That makes every new bench config ... command introduced in src/cli/config.py unreachable from the installed bench script, so operators cannot run the documented config validation/list/show workflow at all.
Useful? React with 👍 / 👎.
| gitignore_path = Path(".gitignore") | ||
| if gitignore_path.exists(): | ||
| gitignore_content = gitignore_path.read_text() | ||
| if output_dir not in gitignore_content: | ||
| with open(gitignore_path, "a") as f: | ||
| f.write(f"\n# StackPerf session outputs\n{output_dir}/\n.env.local\n") |
There was a problem hiding this comment.
Ensure rendered session secrets are written under an ignored path
This only appends ignore rules when .gitignore already exists, but the command always writes session-env.* with the raw API key immediately afterward. In repositories without a pre-existing .gitignore, a normal git add . will stage the generated credential file, which violates the repo’s “do not write secrets into tracked files” requirement.
Useful? React with 👍 / 👎.
| session.status = SessionStatus.ACTIVE | ||
| session.updated_at = datetime.utcnow() |
There was a problem hiding this comment.
Start timing when the session becomes active
The architecture here creates sessions before the harness is launched, but activate_session() only flips the status and leaves started_at at the timestamp assigned during create_session(). Any delay between bench session create and the actual harness start will therefore inflate session duration and any later rollups/comparisons with pre-launch idle time instead of benchmark runtime.
Useful? React with 👍 / 👎.
Summary
Implements COE-228: Session lifecycle, session-scoped credentials, and harness env rendering.
Key Features
Session Manager Service
CLI Commands
bench session create- Creates session with benchmark metadatabench session finalize- Records status and end timebench session note- Adds notes to sessionsbench session artifact- Registers exported artifactsHarness Profiles
Configuration
Acceptance Criteria Verified
Test Results
Test Plan Coverage
Unit Tests
Integration Tests