Releases: webtech-network/autograder
0.4.0
New Features
-
Remote Sandbox Manager API: The Sandbox Manager can now run as a standalone FastAPI service, decoupling sandbox orchestration from the main Autograder API. A new
remotemode enables the main service to communicate with the Sandbox Manager over REST, allowing independent scaling, resource isolation, and shared sandbox pools across multiple API instances. Includes Docker/Compose configuration, health checks, and timeout handling. -
AI Algorithm Verification Tests: A new category of static analysis tests uses LLMs to verify that student code faithfully implements a specific algorithm (e.g., Quick Sort, Binary Search, Dijkstra). Three test functions are available:
ai_sorting_algorithm,ai_search_algorithm, andai_graph_algorithm. Each validates algorithmic logic, complexity characteristics, and ensures no built-in library shortcuts are used. Requires an active LLM provider configured viaOPENAI_API_KEY. -
total_scoreGitHub Action Output: The final numeric score (0–100) is now exposed as an action output (total_score) viaGITHUB_OUTPUT, allowing downstream workflow steps to consume the grading result.
Bug Fixes
-
PreFlight crash with no language: Fixed
AttributeErrorwhen using thewebdevtemplate preset without specifyingsubmission-language. Returns an emptyLanguageSetupConfiginstead of crashing onNone.value. -
Submission persistence after creation: Fixed
SubmissionRepository.create()not flushing/refreshing the entity, which caused auto-generated IDs to be missing and triggeredIntegrityErrorin downstream submission results. -
Test suite schema drift: Migrated test fixtures from the old
languagestring field to the currentlanguageslist field inGradingConfiguration, resolving widespread test failures.
What's Changed
- fix: handle None submission_language in PreFlightStep by @ArthurCRodrigues in #325
- Add total_score output and GITHUB_OUTPUT write by @matheusmra in #326
- docs: refactor GitHub Action documentation by @ArthurCRodrigues in #327
- Refresh submission on add; update tests by @matheusmra in #329
- Add Sandbox Manager API and remote mode by @matheusmra in #330
- feat(static_analysis): add AI algorithm tests (#332) by @ArthurCRodrigues in #333
Full Changelog: 0.3.1...0.4.0
0.3.1
What's Changed
- fix: prevent event-loop blocking in deliberate execution sandbox lifecycle by @ArthurCRodrigues in #317
- Rename CRITERIA_ASSETS_* to EXTERNAL_ASSETS_* by @matheusmra in #320
- fix(security): prevent shell command injection in sandbox file ops (#313) by @ArthurCRodrigues in #321
- fix: secure command construction in SandboxContainer by @ArthurCRodrigues in #322
- feat: enable integration tests on CI PRs by @ArthurCRodrigues in #323
Full Changelog: 0.3.0...0.3.1
0.3.0
New Features
- AST-based Structural Analysis: Introduced a new
StructuralAnalysisStepin the grading pipeline usingast-grepto parse submission files once and pass structural roots downstream for static checks. - Forbidden Keyword Structural Test: Added
forbidden_keywordtest support backed by AST queries for reliable construct detection. - Multiple Template Support: Pipeline/template loading now supports multiple templates (list or comma-separated), allowing mixed capabilities in a single grading config.
- Template Split for Static vs Execution Tests: Static code analysis tests were moved to a dedicated
static_analysistemplate, while execution-oriented tests remain in execution templates (e.g.,input_output).
Refactors
- Grader Architecture Cleanup: Extracted
SubmissionGraderfromGraderServiceand reorganized grader services for clearer separation of responsibilities and maintainability.
Bug Fixes
- Multi-template Regression Fixes: Fixed
submission_languagecollision in grading execution, improved template-name normalization/validation, and resolved stale template state in criteria tree service. - Static Analysis Reliability: Hardened structural-analysis availability handling and fixed regex escaping edge cases in forbidden-import checks.
What's Changed
- Abstract syntax tree + forbidden keyword test by @jaoppb in #300
- refactor: extract SubmissionGrader from GraderService by @jaoppb in #304
- [Feature] Extract static analysis tests and add multi-template support (#305) by @ArthurCRodrigues in #306
Full Changelog: 0.2.0...0.3.0
0.2.0
New Features
- External Execution Mode: Added a new
externalexecution mode to the GitHub Action, allowing grading configurations to be managed centrally in Autograder Cloud instead of requiring config files in student repositories. The action fetches the grading config by ID from the cloud API, runs the pipeline locally, and exports results back for submission tracking and analytics. Existingrepomode remains the default and is fully backward-compatible. - Cloud API Contract: Introduced two new protected endpoints:
GET /api/v1/configs/id/{config_id}for fetching grading configurations by internal ID, andPOST /api/v1/submissions/external-resultsfor ingesting externally computed grading results. Results are immediately visible through existing submission retrieval endpoints. - M2M Bearer Token Authentication: Protected external mode endpoints with machine-to-machine bearer token auth using
AUTOGRADER_INTEGRATION_TOKEN. Validation useshmac.compare_digestto prevent timing attacks. Existing public endpoints are unaffected. - Cloud Client and Exporter: Added
cloud_client.pyfor HTTP communication with the cloud API (config fetch + result submission) andcloud_exporter.pyfor mode-specific result export. TheExporterabstract class now supportsexport_with_contextfor richer pipeline-aware exports. - Structured Logging Middleware: Added HTTP middleware for
X-Request-IDpropagation, per-requestContextVarlogging, and a JSON formatter withservice,env,method,path,status, andduration_msfields. Configurable viaLOG_LEVEL,SERVICE_NAME, andAPP_ENVenvironment variables.
Bug Fixes
- Grading Step model_config: Fixed
model_configbeing set toforbid, which caused errors during the grading step. - Deliberate Execution Error Handling: Restored the
exceptblock that was accidentally removed during the asset injection refactor, which caused unhandled exceptions to propagate as raw 500 errors instead of structuredSYSTEM_ERRORresponses. Error result count now matches the number of requested test cases.
What's Changed
- fix: fixing grading step model_config by @matheusmra in #294
- feat: external mode web API contract, config fetch, and external result ingestion by @matheusmra in #295
- feat: add external mode for GitHub Action by @matheusmra in #297
- fix: restore error handling in deliberate execution service by @ArthurCRodrigues in #298
- feat: add request correlation and structured logging middleware by @ArthurCRodrigues in #299
Full Changelog: 0.1.3...0.2.0
0.1.3
New Features
- S3-Compatible Asset Injection: Integrated support for grader-owned static assets (datasets, fixtures) stored in S3 buckets. These files are injected into the sandbox using a secure Base64-encoded process, ensuring full compatibility with high-isolation gVisor environments.
- File Artifact Validation: Introduced the
expect_file_artifacttest to theInput/Outputtemplate. This allows the autograder to extract and validate files generated by student programs during execution, supporting exact, partial, and regex-based matching.
What's Changed
- fix: use logger.exception instead of logger.error in except blocks by @xyf25 in #271
- chore: remove dead parsers package and AutograderResponse dataclass by @xyf25 in #272
- docs: add MkDocs portal and core concept guides by @ArthurCRodrigues in #274
- docs: make getting started a real grading workflow by @ArthurCRodrigues in #275
- Destroy grading sandboxes at pipeline cleanup by @ArthurCRodrigues in #279
- (feat) new I/O test : Expect File Output by @ArthurCRodrigues in #290
- fix: removing aibatch false success on pipeline by @matheusmra in #291
- feat: implement static asset placement for sandbox executions by @jaoppb in #292
Full Changelog: 1.1.2...0.1.3
0.1.2
What's Changed
- refactor: decouple UpstashDriver from env loading (Item 15) by @koladefaj in #262
- Add test weight and propagate to service by @matheusmra in #264
- feat: Deterministic container naming for traceability #265 by @LuisHenriqueDC301 in #266
- enable feedback step by @matheusmra in #269
- refactor: replace print statements with logging in LanguagePool by @xyf25 in #270
New Contributors
- @koladefaj made their first contribution in #262
- @xyf25 made their first contribution in #270
Full Changelog: 0.1.1...1.1.2
0.1.1
New features
- I8N support
- AI Execution Refactor
What's Changed
- 226 td standardize language prep i18n by @matheusmra in #258
- 233 td formalize exporter plugin interface by @matheusmra in #259
- refactoring AI batch by @matheusmra in #260
Full Changelog: 0.1.0...0.1.1
0.1.0
Non Official Stable Release
What's Changed
- fix: Penalty logic by @matheusmra in #250
- 234 td separate pipelineexecution summary logic by @matheusmra in #253
- 235 td standardize step error handling by @matheusmra in #255
- 225 td split web dev template monolith by @matheusmra in #256
- 220 td decouple sandbox lifecycle from preflight by @matheusmra in #257
Full Changelog: 0.0.9...0.1.0
0.0.9
What's Changed
- [TD][Template ABC] Standardize template contract and validation (#231) by @ArthurCRodrigues in #247
- [TD][PipelineExecution] Add typed accessors and step adoption (#224) by @ArthurCRodrigues in #246
- [TD][GraderService] Make grading flow stateless (#221) by @ArthurCRodrigues in #244
- [TD][Templates] Unify template registration in service layer (#223) by @ArthurCRodrigues in #245
- Fix: Github Module #241 by @Sl3nc in #248
Full Changelog: 0.0.8...0.0.9
0.0.8
What's Changed
- 222 td resolve language before running tests by @matheusmra in #240
- 229 td replace print calls with logging by @LuisHenriqueDC301 in #242
- 216 td fix inverted reporter mode assignment by @matheusmra in #243
New Contributors
- @LuisHenriqueDC301 made their first contribution in #242
Full Changelog: 0.0.7...0.0.8