Skip to content

feat(static_analysis): add AI algorithm tests (#332)#333

Merged
ArthurCRodrigues merged 6 commits into
mainfrom
feat/issue-332-ai-algorithm-tests
May 27, 2026
Merged

feat(static_analysis): add AI algorithm tests (#332)#333
ArthurCRodrigues merged 6 commits into
mainfrom
feat/issue-332-ai-algorithm-tests

Conversation

@ArthurCRodrigues

@ArthurCRodrigues ArthurCRodrigues commented May 26, 2026

Copy link
Copy Markdown
Member

Context

While the autograder already supported basic static analysis (forbidden imports and keyword detection), it lacked the ability to verify complex algorithm implementations. Simple I/O matching or structural checks are often insufficient to distinguish between a faithful implementation of a specific algorithm (like Quick Sort) and a wrapper around a built-in library function.

Solution

This PR introduces AI-based algorithm verification tests to the Static Analysis template:

  1. AI Algorithm Test Suite: Added a base class AiAlgorithmTestBase and three specialized test functions:
    • ai_sorting_algorithm
    • ai_search_algorithm
    • ai_graph_algorithm
  2. Strict Prompting Logic: Implemented shared prompting logic that instructs the LLM to verify algorithm complexity, logic characteristics, and ensure no standard library shortcuts are used.
  3. Template Registration: Registered the new AI tests within the existing StaticAnalysisTemplate.
  4. Internationalization: Added English and Portuguese translations for the new AI test descriptions and parameters.
  5. Refactoring & Bug Fixes:
    • Fixed broken mocks and updated web service tests to match the current GradingRequest API.
    • Resolved import ambiguities caused by duplicate files in the repository root.

Further clarifications

  • AI algorithm tests require an active LLM provider (e.g., OpenAI) configured via OPENAI_API_KEY.
  • These tests are designed to be used alongside existing structural checks for a multi-layered verification approach.

Related issues

Closes #332

Checklist

  • I linked the related issue(s) and explained the motivation.
  • I kept this PR focused and scoped to a single concern.
  • I added or updated tests for changed behavior.
  • I ran the relevant tests locally.
  • I updated documentation when needed.

Copilot AI review requested due to automatic review settings May 26, 2026 10:45

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds new AI-driven algorithm verification tests (sorting/search/graph) to the Static Analysis template, including configuration schema, translations, and unit tests.

Changes:

  • Added AiAlgorithmTestBase + three concrete AI algorithm tests and registered them in StaticAnalysisTemplate.
  • Added EN/PT-BR translations for the new tests’ descriptions/params.
  • Added unit tests covering registration, metadata, config validation, and prompt generation.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

File Description
tests/unit/test_ai_algorithm_tests.py Adds unit coverage for AI algorithm tests’ registration/config/prompt behavior.
autograder/translations/en.json Adds English i18n strings for new AI algorithm tests.
autograder/translations/pt_br.json Adds PT-BR i18n strings for new AI algorithm tests.
autograder/template_library/static_analysis.py Introduces AI algorithm test classes/config and registers them in the static analysis template.

Comment on lines +334 to +335
class AiAlgorithmConfig(BaseModel):
algorithm_name: str = Field(..., min_length=1)
files: Optional[List[SubmissionFile]],
**kwargs,
) -> str:
algorithm_name = (kwargs.get("algorithm_name") or "").strip()
Comment on lines +377 to +381
algo_label = algorithm_name or "Unknown algorithm"

return (
f"You are verifying a {self.algorithm_family} algorithm implementation.\n"
f"Requested algorithm: {algo_label}.\n"
Comment on lines +398 to +399
f"Use subject '{algo_label}'."
)
Comment on lines +338 to +344
class AiAlgorithmTestBase(AiTestFunction):
algorithm_family: str = ""
test_name: str = ""

@property
def name(self) -> str:
return self.test_name
Comment on lines +364 to +376
def build_prompt(
self,
files: Optional[List[SubmissionFile]],
**kwargs,
) -> str:
algorithm_name = (kwargs.get("algorithm_name") or "").strip()
file_names = ", ".join(f.filename for f in files) if files else ""

if file_names:
file_scope = f"Focus only on these files: {file_names}."
else:
file_scope = "No submission files were provided for this test."

@ArthurCRodrigues

Copy link
Copy Markdown
Member Author

E2E via Dockerfile.api (FastAPI) with remote sandbox manager is running. I created three grading configs using static_analysis + the new AI algorithm tests and ran 5 scenarios (2 foggy). All submissions completed, but AI evaluation failed with 401 invalid API key, so every test returned “AI evaluation produced no result.” and scored 0.0. This blocks verifying algorithm correctness in this environment.

Configs created

  • ai-sort-merge: ai_sorting_algorithm (algorithm_name: merge sort)
  • ai-search-binary: ai_search_algorithm (algorithm_name: binary search)
  • ai-graph-bfs: ai_graph_algorithm (algorithm_name: breadth-first search)

Scenarios

Scenario Assignment Submission ID Expected Result
sort-correct (merge sort) ai-sort-merge 220376 Pass Completed, score 0.0, AI eval produced no result
sort-foggy (built-in sorted) ai-sort-merge 220377 Fail Completed, score 0.0, AI eval produced no result
search-correct (binary search) ai-search-binary 220378 Pass Completed, score 0.0, AI eval produced no result
search-foggy (partial binary/linear) ai-search-binary 220379 Fail Completed, score 0.0, AI eval produced no result
graph-correct (BFS) ai-graph-bfs 220380 Pass Completed, score 0.0, AI eval produced no result

Logs

  • AI batch request failed: Error code: 401 ... invalid_api_key (dummy key in test env).

Conclusion
The new AI algorithm tests are registered, invoked, and integrated into the pipeline, but I cannot confirm correctness scoring without a valid OpenAI key (all 5 scenarios scored 0 due to auth failure).

Unrelated issue discovered

@ArthurCRodrigues

Copy link
Copy Markdown
Member Author

Reran E2E using Dockerfile.api in remote sandbox mode with a valid OpenAI key from .env. AI algorithm tests now return results as expected.

Configs created

  • ai-sort-merge-2ai_sorting_algorithm (algorithm_name: merge sort)
  • ai-search-binary-2ai_search_algorithm (algorithm_name: binary search)
  • ai-graph-bfs-2ai_graph_algorithm (algorithm_name: breadth-first search)

Scenarios (5, incl. 2 foggy)

Scenario Assignment Submission ID Expected Result
sort-correct (merge sort) ai-sort-merge-2 220381 Pass ✅ Completed, score 100.0
sort-foggy (built-in sorted) ai-sort-merge-2 220382 Fail ✅ Completed, score 0.0
search-correct (binary search) ai-search-binary-2 220383 Pass ✅ Completed, score 100.0
search-foggy (partial binary/linear) ai-search-binary-2 220384 Fail ✅ Completed, score 0.0
graph-correct (BFS) ai-graph-bfs-2 220385 Pass ✅ Completed, score 100.0

Conclusion
With a valid key, the new AI algorithm tests behave correctly: correct implementations pass and foggy/incorrect ones fail. PR 333 successfully implements the AI algorithm test flow end‑to‑end in these scenarios.

Note
The API container still requires /var/run/docker.sock even in remote mode (tracked in #334).

@ArthurCRodrigues ArthurCRodrigues left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@ArthurCRodrigues ArthurCRodrigues merged commit 5c645e9 into main May 27, 2026
3 checks passed
@ArthurCRodrigues ArthurCRodrigues deleted the feat/issue-332-ai-algorithm-tests branch May 27, 2026 02:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] AI-based Algorithm Implementation Verification

3 participants