feat(static_analysis): add AI algorithm tests (#332) by ArthurCRodrigues · Pull Request #333 · webtech-network/autograder

ArthurCRodrigues · 2026-05-26T10:45:07Z

Context

While the autograder already supported basic static analysis (forbidden imports and keyword detection), it lacked the ability to verify complex algorithm implementations. Simple I/O matching or structural checks are often insufficient to distinguish between a faithful implementation of a specific algorithm (like Quick Sort) and a wrapper around a built-in library function.

Solution

This PR introduces AI-based algorithm verification tests to the Static Analysis template:

AI Algorithm Test Suite: Added a base class AiAlgorithmTestBase and three specialized test functions:
- ai_sorting_algorithm
- ai_search_algorithm
- ai_graph_algorithm
Strict Prompting Logic: Implemented shared prompting logic that instructs the LLM to verify algorithm complexity, logic characteristics, and ensure no standard library shortcuts are used.
Template Registration: Registered the new AI tests within the existing StaticAnalysisTemplate.
Internationalization: Added English and Portuguese translations for the new AI test descriptions and parameters.
Refactoring & Bug Fixes:
- Fixed broken mocks and updated web service tests to match the current GradingRequest API.
- Resolved import ambiguities caused by duplicate files in the repository root.

Further clarifications

AI algorithm tests require an active LLM provider (e.g., OpenAI) configured via OPENAI_API_KEY.
These tests are designed to be used alongside existing structural checks for a multi-layered verification approach.

Related issues

Closes #332

Checklist

I linked the related issue(s) and explained the motivation.
I kept this PR focused and scoped to a single concern.
I added or updated tests for changed behavior.
I ran the relevant tests locally.
I updated documentation when needed.

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds new AI-driven algorithm verification tests (sorting/search/graph) to the Static Analysis template, including configuration schema, translations, and unit tests.

Changes:

Added AiAlgorithmTestBase + three concrete AI algorithm tests and registered them in StaticAnalysisTemplate.
Added EN/PT-BR translations for the new tests’ descriptions/params.
Added unit tests covering registration, metadata, config validation, and prompt generation.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

File	Description
tests/unit/test_ai_algorithm_tests.py	Adds unit coverage for AI algorithm tests’ registration/config/prompt behavior.
autograder/translations/en.json	Adds English i18n strings for new AI algorithm tests.
autograder/translations/pt_br.json	Adds PT-BR i18n strings for new AI algorithm tests.
autograder/template_library/static_analysis.py	Introduces AI algorithm test classes/config and registers them in the static analysis template.

+class AiAlgorithmConfig(BaseModel):
+    algorithm_name: str = Field(..., min_length=1)


+        files: Optional[List[SubmissionFile]],
+        **kwargs,
+    ) -> str:
+        algorithm_name = (kwargs.get("algorithm_name") or "").strip()


+        algo_label = algorithm_name or "Unknown algorithm"
+
+        return (
+            f"You are verifying a {self.algorithm_family} algorithm implementation.\n"
+            f"Requested algorithm: {algo_label}.\n"


+            f"Use subject '{algo_label}'."
+        )


+class AiAlgorithmTestBase(AiTestFunction):
+    algorithm_family: str = ""
+    test_name: str = ""
+
+    @property
+    def name(self) -> str:
+        return self.test_name


+    def build_prompt(
+        self,
+        files: Optional[List[SubmissionFile]],
+        **kwargs,
+    ) -> str:
+        algorithm_name = (kwargs.get("algorithm_name") or "").strip()
+        file_names = ", ".join(f.filename for f in files) if files else ""
+
+        if file_names:
+            file_scope = f"Focus only on these files: {file_names}."
+        else:
+            file_scope = "No submission files were provided for this test."
+


ArthurCRodrigues · 2026-05-27T00:08:18Z

E2E via Dockerfile.api (FastAPI) with remote sandbox manager is running. I created three grading configs using static_analysis + the new AI algorithm tests and ran 5 scenarios (2 foggy). All submissions completed, but AI evaluation failed with 401 invalid API key, so every test returned “AI evaluation produced no result.” and scored 0.0. This blocks verifying algorithm correctness in this environment.

Configs created

ai-sort-merge: ai_sorting_algorithm (algorithm_name: merge sort)
ai-search-binary: ai_search_algorithm (algorithm_name: binary search)
ai-graph-bfs: ai_graph_algorithm (algorithm_name: breadth-first search)

Scenarios

Scenario	Assignment	Submission ID	Expected	Result
sort-correct (merge sort)	ai-sort-merge	220376	Pass	Completed, score 0.0, AI eval produced no result
sort-foggy (built-in sorted)	ai-sort-merge	220377	Fail	Completed, score 0.0, AI eval produced no result
search-correct (binary search)	ai-search-binary	220378	Pass	Completed, score 0.0, AI eval produced no result
search-foggy (partial binary/linear)	ai-search-binary	220379	Fail	Completed, score 0.0, AI eval produced no result
graph-correct (BFS)	ai-graph-bfs	220380	Pass	Completed, score 0.0, AI eval produced no result

Logs

AI batch request failed: Error code: 401 ... invalid_api_key (dummy key in test env).

Conclusion
The new AI algorithm tests are registered, invoked, and integrated into the pipeline, but I cannot confirm correctness scoring without a valid OpenAI key (all 5 scenarios scored 0 due to auth failure).

Unrelated issue discovered

API fails to start in remote sandbox mode without a Docker socket mount: API fails to start in remote sandbox mode without docker socket #334

ArthurCRodrigues · 2026-05-27T01:35:41Z

Reran E2E using Dockerfile.api in remote sandbox mode with a valid OpenAI key from .env. AI algorithm tests now return results as expected.

Configs created

ai-sort-merge-2 → ai_sorting_algorithm (algorithm_name: merge sort)
ai-search-binary-2 → ai_search_algorithm (algorithm_name: binary search)
ai-graph-bfs-2 → ai_graph_algorithm (algorithm_name: breadth-first search)

Scenarios (5, incl. 2 foggy)

Scenario	Assignment	Submission ID	Expected	Result
sort-correct (merge sort)	ai-sort-merge-2	220381	Pass	✅ Completed, score 100.0
sort-foggy (built-in sorted)	ai-sort-merge-2	220382	Fail	✅ Completed, score 0.0
search-correct (binary search)	ai-search-binary-2	220383	Pass	✅ Completed, score 100.0
search-foggy (partial binary/linear)	ai-search-binary-2	220384	Fail	✅ Completed, score 0.0
graph-correct (BFS)	ai-graph-bfs-2	220385	Pass	✅ Completed, score 100.0

Conclusion
With a valid key, the new AI algorithm tests behave correctly: correct implementations pass and foggy/incorrect ones fail. PR 333 successfully implements the AI algorithm test flow end‑to‑end in these scenarios.

Note
The API container still requires /var/run/docker.sock even in remote mode (tracked in #334).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…nces

ArthurCRodrigues

OK

feat(static_analysis): add AI algorithm tests (#332)

38a297a

Copilot AI review requested due to automatic review settings May 26, 2026 10:45

ArthurCRodrigues mentioned this pull request May 26, 2026

[Feature] AI-based Algorithm Implementation Verification #332

Closed

Copilot AI reviewed May 26, 2026

View reviewed changes

ci: trigger checks

d60a137

ArthurCRodrigues closed this May 26, 2026

ArthurCRodrigues reopened this May 26, 2026

trigger

4033534

ArthurCRodrigues and others added 3 commits May 26, 2026 22:43

docs: document static analysis tests

abfc917

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

test: fix broken mocks and update to new GradingRequest API

f3df862

docs: add detailed static analysis documentation and fix cross-refere…

aec3492

…nces

ArthurCRodrigues commented May 27, 2026

View reviewed changes

ArthurCRodrigues merged commit 5c645e9 into main May 27, 2026
3 checks passed

ArthurCRodrigues deleted the feat/issue-332-ai-algorithm-tests branch May 27, 2026 02:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(static_analysis): add AI algorithm tests (#332)#333

feat(static_analysis): add AI algorithm tests (#332)#333
ArthurCRodrigues merged 6 commits into
mainfrom
feat/issue-332-ai-algorithm-tests

ArthurCRodrigues commented May 26, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

ArthurCRodrigues commented May 27, 2026

Uh oh!

ArthurCRodrigues commented May 27, 2026

Uh oh!

ArthurCRodrigues left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		class AiAlgorithmConfig(BaseModel):
		algorithm_name: str = Field(..., min_length=1)

Conversation

ArthurCRodrigues commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context

Solution

Further clarifications

Related issues

Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

ArthurCRodrigues commented May 27, 2026

Uh oh!

ArthurCRodrigues commented May 27, 2026

Uh oh!

ArthurCRodrigues left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ArthurCRodrigues commented May 26, 2026 •

edited

Loading