Skip to content

Alpsource/SQM_Test

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

AI Code Agents for Python: A Quality-Driven Approach 🤖🐍

This repository contains the framework, benchmarking data, and comparative analysis for a research project investigating the efficacy of autonomous AI agents in software development. The study evaluates whether an agentic framework can match or exceed the code quality of junior-level developers.

🏗️ System Architecture

The core of this project is a self-improving AI agent that writes, executes, and refines Python code autonomously based on terminal outputs and quality feedback.

Agent Architecture Figure 1: Architecture of the autonomous agentic framework.

Core Components:

  • Code Designer: Generates technical requirements and function compliance standards.
  • Code Generator (Gemini): Produces the initial implementation using the Gemini Flash 2.0 model.
  • Script Evaluator: Runs the script, analyzes outcomes, and triggers refinement loops if requirements aren't met.
  • Web Search Tool: Enables the agent to retrieve up-to-date API information and fill training gaps.
  • Library Installer: Automatically resolves and installs dependencies in isolated environments.

📊 Research Methodology & Results

The study benchmarked AI-generated scripts against human-authored counterparts across five real-world scenarios: CLI utilities, data parsers, HTTP servers, and AI interfaces.

Comparative Quality Metrics:

The AI agent demonstrated significant advantages in several key industry-standard metrics:

Metric AI Agent Mean Human Mean Difference ($\Delta$)
Pylint Score 7.74 7.27 +0.47
Maintainability Index 76.59 64.39 +12.2
Bug Density 0.30 0.43 -0.13
Lines of Code (LLOC) 59.8 95.6 -35.8

Key Findings:

  • Structural Excellence: AI-generated code achieved a higher Maintainability Index, indicating more modular and self-documenting designs.
  • Complexity Trade-off: AI code showed higher Cyclomatic Complexity (3.91 vs 1.63), which reflects a deliberate choice to use discrete functions rather than monolithic, nested loops.
  • Cost Efficiency: Once developed, the AI agent operates at less than 5% of a junior developer's salary.

📈 Functional Validation: Notch Filter Case Study

A critical test of the framework involved designing a digital notch filter to attenuate specific frequencies.

Notch Filter Test Results Figure 2: Magnitude Spectrum and Time Series of AI-generated filter results.

Through the Web Search component, the agent was able to autonomously correct coefficient calculations to achieve a 90% power reduction of the target frequency, proving its utility in technical and academic fields.

🔍 Research Focus

This project addresses three primary Research Questions (RQs):

  • RQ1: Can an AI agentic framework achieve code quality comparable to human developers in Python?

  • RQ2: What are the relative strengths and weaknesses of AI-generated code compared to human-written code across different task types?

  • RQ3: What is the correlation between framework components (e.g., web search) and the functional correctness of the generated code? `

📁 Repository Structure

.
├── README.md
├── assets/                  # Images and diagrams
├── AIWritten/               # Scripts generated by the AI agent
│   ├── ai.py
│   ├── analyze_quality.sh   # Static analysis automation
│   ├── basic_cli.py
│   ├── code_quality_report.csv
│   ├── csv_tool.py
│   ├── server.py
│   └── todo_cli.py
└── HumanWritten/            # Human-authored benchmark scripts
    ├── ai.py
    ├── analyze_quality.sh
    ├── code_quality_report.csv
    ├── csv_parser.py
    ├── server.py
    ├── size.py
    └── todo.py

🛠️ Requirements

  • Python 3.x
  • LLM: Google Gemini API key (set as GOOGLE_API_KEY in environment).
  • Static Analysis: Pylint, Radon, Bandit.

Installation

  1. Clone the repository: git clone https://github.com/Alpsource/SQM_Test.git

  2. Install dependencies: pip install pylint radon bandit google-genai

  3. Set your API Key: export GOOGLE_API_KEY='your_key_here'

About

An autonomous agentic framework that generates, tests, and refines Python code, benchmarked against human developers using industry-standard quality metrics like Pylint and Maintainability Index.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors