Skip to content

refactor: overhaul unit tests and achieve 100% coverage on core modules#347

Merged
ArthurCRodrigues merged 7 commits into
mainfrom
test-harnessing
May 30, 2026
Merged

refactor: overhaul unit tests and achieve 100% coverage on core modules#347
ArthurCRodrigues merged 7 commits into
mainfrom
test-harnessing

Conversation

@ArthurCRodrigues

Copy link
Copy Markdown
Member

Context

The Autograder project required a significant overhaul of its unit and integration testing suite to ensure high-fidelity verification of core domain logic, recursive data structures, and failure recovery mechanisms. Previous tests relied too heavily on superficial mocking, which bypassed critical algorithmic paths such as score balancing and state transition validations.

Solution

This PR refactors and expands the testing suite to achieve near 100% coverage on core modules while enforcing deep, meaningful assertions:

  1. Refactored SubmissionGrader Tests: Replaced heavy MagicMock usage in tests/unit/services/grader/test_submission_grader.py with real SubjectNode and TestNode objects. This allows for actual verification of the recursive weight-balancing algorithm and score calculation logic.
  2. CriteriaTree Coverage: Added tests/unit/models/test_criteria_tree.py to achieve 100% coverage on recursive tree traversal methods (get_all_tests).
  3. SandboxService Robustness: Added tests/unit/services/test_sandbox_service.py with comprehensive coverage for success paths and critical failure recovery (e.g., ensuring sandboxes are released back to the pool if workdir preparation fails).
  4. PipelineExecution State Machine: Enhanced tests/unit/pipeline/test_pipeline_execution_accessors.py to cover all state transition branches and strict data requirement validations.
  5. SandboxStep Isolation: Verified 100% coverage for SandboxStep, including branches for skipping execution when no sandbox is required by templates.

Further clarifications

  • Tests were verified with pytest --cov=autograder --cov-branch to ensure maximal path coverage.
  • No changes were made to the core application logic; this PR focuses purely on test infrastructure and reliability.

Related issues

Addresses gaps identified in testing coverage audit.

Checklist

  • I linked the related issue(s) and explained the motivation.
  • I kept this PR focused and scoped to a single concern.
  • I added or updated tests for changed behavior.
  • I ran the relevant tests locally.
  • I updated documentation when needed (No documentation changes required).

Copilot AI review requested due to automatic review settings May 28, 2026 11:26

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

- Add missing docstrings to core models and services.
- Fix critical pylint errors (missing arguments, not-callable false positives).
- Move imports to top level where possible.
- Address broad exception catching with selective pylint-disable.
- Fix signatures of abstract methods for consistency.
- Refactor sandbox and pre-flight services for better adherence to standards.
- Improve logging with lazy formatting.
- Resolve various minor warnings (unused imports, reimports, etc.).

@ArthurCRodrigues ArthurCRodrigues left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@ArthurCRodrigues ArthurCRodrigues merged commit d7ae6ba into main May 30, 2026
2 checks passed
@ArthurCRodrigues ArthurCRodrigues deleted the test-harnessing branch May 30, 2026 13:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants