Description
Currently, the autograder can verify the behavior of an algorithm (via Input/Output tests) and forbidden constructs (via Static Analysis), but it cannot reliably verify the semantic intent of an algorithm implementation. For example, a student might pass sorting tests by calling a built-in sort() method or by implementing Bubble Sort when Quick Sort was requested.
This issue proposes a new set of AI-powered test functions that use Large Language Models (LLMs) to verify if a specific algorithm has been implemented correctly according to its theoretical definition.
Proposed Implementation
1. New Base AI Test Classes
Create specialized classes in a new or existing template (likely StaticAnalysisTemplate) that inherit from AiTestFunction:
AiSortingAlgorithmTest: Verifies sorting algorithms (Quick Sort, Merge Sort, etc.).
AiSearchAlgorithmTest: Verifies search algorithms (Binary Search, DFS, BFS, etc.).
AiGraphAlgorithmTest: Verifies graph algorithms (Dijkstra, Prim, etc.).
2. Test Configuration & Parameters
Each test will accept an algorithm_name parameter via the TestConfig protocol.
- Strictness: The AI prompt will be configured to be strict.
- Scoring: Binary (100 for a correct implementation of the requested algorithm, 0 otherwise).
- Targeting: The test will focus on the code provided in the
file attribute.
3. Configuration Example (JSON)
Following the docs/criteria_example.json protocol:
{
"file": "sort.py",
"name": "ai_sorting_algorithm",
"parameters": [
{
"name": "algorithm_name",
"value": "Quick Sort"
}
],
"weight": 50
}
4. Prompt Strategy
The build_prompt method for these classes should follow a template similar to:
"Analyze the following code and determine if it is a correct and faithful implementation of the {algorithm_name} algorithm.
Criteria:
- The implementation must follow the specific logic and complexity characteristics of {algorithm_name}.
- It must NOT be a wrapper around a built-in library function.
- If it implements a different algorithm (e.g., Bubble Sort instead of Quick Sort), it is considered incorrect.
Output Format:
Return a JSON object with:
- 'success': boolean
- 'reason': a brief explanation of the decision (especially if it failed)."
5. Workflow Integration
These tests will be picked up by the AiBatchStep, ensuring that multiple algorithm checks are performed in a single, cost-efficient API call.
Expected Outcome
Instructors can now verify that students are actually learning the requested algorithms rather than just 'passing the tests' by any means necessary.
Description
Currently, the autograder can verify the behavior of an algorithm (via Input/Output tests) and forbidden constructs (via Static Analysis), but it cannot reliably verify the semantic intent of an algorithm implementation. For example, a student might pass sorting tests by calling a built-in
sort()method or by implementing Bubble Sort when Quick Sort was requested.This issue proposes a new set of AI-powered test functions that use Large Language Models (LLMs) to verify if a specific algorithm has been implemented correctly according to its theoretical definition.
Proposed Implementation
1. New Base AI Test Classes
Create specialized classes in a new or existing template (likely
StaticAnalysisTemplate) that inherit fromAiTestFunction:AiSortingAlgorithmTest: Verifies sorting algorithms (Quick Sort, Merge Sort, etc.).AiSearchAlgorithmTest: Verifies search algorithms (Binary Search, DFS, BFS, etc.).AiGraphAlgorithmTest: Verifies graph algorithms (Dijkstra, Prim, etc.).2. Test Configuration & Parameters
Each test will accept an
algorithm_nameparameter via theTestConfigprotocol.fileattribute.3. Configuration Example (JSON)
Following the
docs/criteria_example.jsonprotocol:{ "file": "sort.py", "name": "ai_sorting_algorithm", "parameters": [ { "name": "algorithm_name", "value": "Quick Sort" } ], "weight": 50 }4. Prompt Strategy
The
build_promptmethod for these classes should follow a template similar to:5. Workflow Integration
These tests will be picked up by the
AiBatchStep, ensuring that multiple algorithm checks are performed in a single, cost-efficient API call.Expected Outcome
Instructors can now verify that students are actually learning the requested algorithms rather than just 'passing the tests' by any means necessary.