Skip to content

[Feature] AI-based Algorithm Implementation Verification #332

@ArthurCRodrigues

Description

@ArthurCRodrigues

Description

Currently, the autograder can verify the behavior of an algorithm (via Input/Output tests) and forbidden constructs (via Static Analysis), but it cannot reliably verify the semantic intent of an algorithm implementation. For example, a student might pass sorting tests by calling a built-in sort() method or by implementing Bubble Sort when Quick Sort was requested.

This issue proposes a new set of AI-powered test functions that use Large Language Models (LLMs) to verify if a specific algorithm has been implemented correctly according to its theoretical definition.

Proposed Implementation

1. New Base AI Test Classes

Create specialized classes in a new or existing template (likely StaticAnalysisTemplate) that inherit from AiTestFunction:

  • AiSortingAlgorithmTest: Verifies sorting algorithms (Quick Sort, Merge Sort, etc.).
  • AiSearchAlgorithmTest: Verifies search algorithms (Binary Search, DFS, BFS, etc.).
  • AiGraphAlgorithmTest: Verifies graph algorithms (Dijkstra, Prim, etc.).

2. Test Configuration & Parameters

Each test will accept an algorithm_name parameter via the TestConfig protocol.

  • Strictness: The AI prompt will be configured to be strict.
  • Scoring: Binary (100 for a correct implementation of the requested algorithm, 0 otherwise).
  • Targeting: The test will focus on the code provided in the file attribute.

3. Configuration Example (JSON)

Following the docs/criteria_example.json protocol:

{
  "file": "sort.py",
  "name": "ai_sorting_algorithm",
  "parameters": [
    {
      "name": "algorithm_name",
      "value": "Quick Sort"
    }
  ],
  "weight": 50
}

4. Prompt Strategy

The build_prompt method for these classes should follow a template similar to:

"Analyze the following code and determine if it is a correct and faithful implementation of the {algorithm_name} algorithm.

Criteria:

  1. The implementation must follow the specific logic and complexity characteristics of {algorithm_name}.
  2. It must NOT be a wrapper around a built-in library function.
  3. If it implements a different algorithm (e.g., Bubble Sort instead of Quick Sort), it is considered incorrect.

Output Format:
Return a JSON object with:

  • 'success': boolean
  • 'reason': a brief explanation of the decision (especially if it failed)."

5. Workflow Integration

These tests will be picked up by the AiBatchStep, ensuring that multiple algorithm checks are performed in a single, cost-efficient API call.

Expected Outcome

Instructors can now verify that students are actually learning the requested algorithms rather than just 'passing the tests' by any means necessary.

Metadata

Metadata

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions