Conversation
- Create scripts/run_production.sh with nohup launch, crash detection, stall monitoring, log rotation, and throughput estimation - Add enriched, preclassified, deep_processing, docker_active counters to ProgressCounters and ProgressSnapshot - Emit per-stage throughput metrics in progress log output - Update tests for new counter fields
…options Add two new configurable parameters that flow through CLI -> orchestrator -> pipeline: - concurrency_preclassify: controls pre-classification semaphore (default: 25, was 10) - backlog_multiplier: controls deep processing backlog semaphore multiplier (default: 5, was 3) Both are available as --concurrency-preclassify and --backlog-multiplier CLI args on swe mine and swe benchmark commands.
…ker_sandbox, and pipeline
…ontainerized extraction, tests, production runner Changes: - Docker containers: --memory=32g, no --cpus flag, --network=host - Resource limits (resources.rs, difficulty/mod.rs, runner/sandbox.rs): 32GB RAM, 0.0 CPU - Pipeline: pre-classification semaphore 25 (was 10), backlog multiplier 5 (was 3) - CLI: --concurrency-preclassify and --backlog-multiplier args - Extractor: Docker-first diff extraction, reqwest API fallback - Progress monitor: enriched, preclassified, docker_active counters + throughput - Early cancellation checks between pipeline stages - 33 new unit tests (harness, orchestrator, docker_sandbox, pipeline, progress) - scripts/run_production.sh: nohup, monitoring, crash detection, log rotation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Configures all Docker containers to use 32GB RAM with unlimited CPU and host networking, adds configurable pipeline concurrency parameters, containerizes the patch extraction path, enhances progress monitoring, and adds 33 new unit tests.
Changes
Docker Resource Configuration
--memory=32gwith--network=host, removed--cpusflag entirelyDockerSandbox::start(),harness::evaluate_task(), andrunner/sandbox.rscontainer creationEASY_LIMITS,MEDIUM_LIMITS,HARD_LIMITS, andDefaultconstants inresources.rsto 32GB RAM / 0.0 CPU (unlimited)difficulty/mod.rsresource limit mappingsPipeline Concurrency & Optimization
--concurrency-preclassify)--backlog-multiplier)swe mineandswe benchmarkcommandsSwePipelineConfig→SweOrchestratorConfig→ pipeline semaphore creationContainerized Patch Extraction
PatchExtractorto use Docker-first diff extraction viaDockerSandboxreqwestAPI call retained as fallback onlyProgress Monitoring Enhancements
enriched,preclassified,deep_processing, anddocker_activecounters toProgressCountersProgressMonitorto emit per-stage throughput metricsNew Unit Tests (33 total)
harness.rs:HarnessConfig::default,HarnessStatusDisplay,container_name()variants,truncate()edge cases, result/summary serialization,discover_tasks()with temp dirsorchestrator.rs:DifficultyTargets::parse(valid, invalid, edge cases),total_tasks,is_empty,SweOrchestratorConfig::default,SweRunResultserializationdocker_sandbox.rs:truncate()helper (empty, short, exact boundary, long),SandboxOutputconstructionpipeline.rs:SwePipelineConfig::default,ExportConfigconstructionprogress.rs: counter increment, clone-shares-state, monitor start/stopProduction Runner Script
scripts/run_production.shwith nohup launch, crash detection, stall monitoring, log rotation, throughput estimation, and graceful shutdown via signal trapsTesting
All 356 unit tests pass (
cargo test --lib). Clippy and fmt checks clean.