Skip to content

Autoresearch/scoring mar27#4

Merged
Surge77 merged 2 commits into
mainfrom
autoresearch/scoring-mar27
Mar 27, 2026
Merged

Autoresearch/scoring mar27#4
Surge77 merged 2 commits into
mainfrom
autoresearch/scoring-mar27

Conversation

@Surge77

@Surge77 Surge77 commented Mar 27, 2026

Copy link
Copy Markdown
Owner

Summary by CodeRabbit

Release Notes

  • New Features

    • Added autoresearch framework enabling guarded iterative improvements via scoring and routing tracks with constrained file edits and automated evaluation.
    • Added performance benchmarking tool measuring page load times and paint metrics across application paths.
  • Documentation

    • Added comprehensive guides for autoresearch workflow configuration and usage.
  • Chores

    • Added npm commands for benchmarking and running autoresearch evaluations.
    • Updated dependencies with Puppeteer for performance monitoring.

@vercel

vercel Bot commented Mar 27, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
devtrends Ready Ready Preview, Comment Mar 28, 2026 5:59am

@Surge77 Surge77 merged commit d6f0e92 into main Mar 27, 2026
2 of 4 checks passed
@coderabbitai

coderabbitai Bot commented Mar 27, 2026

Copy link
Copy Markdown

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a2b1228a-e664-4e4d-a65b-afb0c0a76da4

📥 Commits

Reviewing files that changed from the base of the PR and between 3cb1d82 and 52288cf.

⛔ Files ignored due to path filters (2)
  • autoresearch/results.template.tsv is excluded by !**/*.tsv
  • package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (22)
  • .gitignore
  • autoresearch/README.md
  • autoresearch/devtrends-program.md
  • autoresearch/fixtures/routing/baseline.json
  • autoresearch/fixtures/scoring/baseline.json
  • autoresearch/manifest.json
  • package.json
  • scripts/autoresearch-eval-routing.mjs
  • scripts/autoresearch-eval-scoring.mjs
  • scripts/autoresearch-runner.mjs
  • scripts/benchmark.mjs
  • src/lib/__tests__/benchmark-script.test.ts
  • src/lib/autoresearch/__tests__/manifest.test.ts
  • src/lib/autoresearch/__tests__/routing-evaluator.test.ts
  • src/lib/autoresearch/__tests__/runner-auto.test.ts
  • src/lib/autoresearch/__tests__/scoring-evaluator.test.ts
  • src/lib/autoresearch/manifest.mjs
  • src/lib/autoresearch/module-loader.mjs
  • src/lib/autoresearch/routing-evaluator.mjs
  • src/lib/autoresearch/runner.mjs
  • src/lib/autoresearch/scoring-evaluator.mjs
  • src/lib/scoring/adaptive-weights.ts

📝 Walkthrough

Walkthrough

Added comprehensive autoresearch framework including fixture-based evaluation for scoring and routing tracks, manifest configuration, evaluator scripts, a git-integrated runner for automated iteration loops, performance benchmarking with Puppeteer, and supporting documentation and tests.

Changes

Cohort / File(s) Summary
Configuration & Documentation
.gitignore, autoresearch/README.md, autoresearch/devtrends-program.md, autoresearch/manifest.json
Added autoresearch configuration, ignored local clones and runtime outputs while unignoring Markdown docs, and documented guarded autoresearch workflow with scoring/routing track specifications.
Test Fixtures
autoresearch/fixtures/routing/baseline.json, autoresearch/fixtures/scoring/baseline.json
Introduced baseline test fixtures for routing provider selection scenarios and scoring component evaluation with weights, momentum, and ranking test cases.
Evaluator Scripts
scripts/autoresearch-eval-scoring.mjs, scripts/autoresearch-eval-routing.mjs
Added CLI entry points that invoke scoring and routing evaluators, parse fixture paths, and output JSON reports.
Orchestration & Benchmarking
scripts/autoresearch-runner.mjs, scripts/benchmark.mjs, package.json
Introduced runner for autoresearch iterations with track/dry-run/status flags, performance benchmark tool using Puppeteer with configurable paths and metrics, and corresponding npm scripts plus puppeteer dependency.
Core Autoresearch Modules
src/lib/autoresearch/manifest.mjs, src/lib/autoresearch/module-loader.mjs, src/lib/autoresearch/routing-evaluator.mjs, src/lib/autoresearch/scoring-evaluator.mjs, src/lib/autoresearch/runner.mjs
Implemented manifest loader with Zod validation, dynamic module bundler, evaluators for routing and scoring fixtures with assertion checking, and central runner orchestrating git operations, evaluations, result logging, and commit acceptance/rejection.
Evaluator Tests
src/lib/autoresearch/__tests__/routing-evaluator.test.ts, src/lib/autoresearch/__tests__/scoring-evaluator.test.ts, src/lib/autoresearch/__tests__/runner-auto.test.ts
Added tests validating evaluator report structure, metric computation, provider selection, and runner control flow (keep/discard decisions based on metric improvement).
Integration Tests & Utilities
src/lib/__tests__/benchmark-script.test.ts, src/lib/autoresearch/__tests__/manifest.test.ts
Added tests for benchmark CLI argument parsing and output formats, and manifest loading with track configuration and file allowlist validation.
Scoring Adjustment
src/lib/scoring/adaptive-weights.ts
Fine-tuned low-completeness weight scaling factors for jobs, ecosystem, github, and community categories.

Sequence Diagram(s)

sequenceDiagram
    participant Runner as Autoresearch Runner
    participant Git as Git Operations
    participant Evaluator as Track Evaluator
    participant Logger as Result Logger
    participant TSV as results.tsv

    Runner->>Runner: Parse --track argument
    Runner->>Git: Fetch branch name & changed files
    Git-->>Runner: Branch, changed files
    Runner->>Runner: Validate files in track allowlist
    
    Runner->>Git: Stage changed files
    Runner->>Git: Commit with track-prefixed message
    Git-->>Runner: Commit hash
    
    Runner->>Logger: Read previous best metric
    Logger->>TSV: Query best "keep" metric
    TSV-->>Logger: Best metric (if exists)
    Logger-->>Runner: Previous best
    
    Runner->>Evaluator: Evaluate fixture set
    Evaluator-->>Runner: Report with metric & failures
    
    Runner->>Runner: Compare metric vs previous best
    
    alt Metric improved or no previous best
        Runner->>Logger: Append "keep" result row
        Logger->>TSV: Write result
        Runner->>Logger: Write JSON report
    else Metric did not improve
        Runner->>Git: Reset HEAD to parent
        Runner->>Logger: Append "discard" result row
        Logger->>TSV: Write result
        Runner->>Logger: Write JSON report
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 A manifest of tracks so true,
Scoring weights and routing through,
Fixtures guide the autoresearch dance,
Git commits rise at iteration's chance,
Benchmarks bloom with Puppeteer's pace! 🚀

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch autoresearch/scoring-mar27

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant