AI Code Agents for Python: A Quality-Driven Approach 🤖🐍

This repository contains the framework, benchmarking data, and comparative analysis for a research project investigating the efficacy of autonomous AI agents in software development. The study evaluates whether an agentic framework can match or exceed the code quality of junior-level developers.

🏗️ System Architecture

The core of this project is a self-improving AI agent that writes, executes, and refines Python code autonomously based on terminal outputs and quality feedback.

Figure 1: Architecture of the autonomous agentic framework.

Core Components:

Code Designer: Generates technical requirements and function compliance standards.
Code Generator (Gemini): Produces the initial implementation using the Gemini Flash 2.0 model.
Script Evaluator: Runs the script, analyzes outcomes, and triggers refinement loops if requirements aren't met.
Web Search Tool: Enables the agent to retrieve up-to-date API information and fill training gaps.
Library Installer: Automatically resolves and installs dependencies in isolated environments.

📊 Research Methodology & Results

The study benchmarked AI-generated scripts against human-authored counterparts across five real-world scenarios: CLI utilities, data parsers, HTTP servers, and AI interfaces.

Comparative Quality Metrics:

The AI agent demonstrated significant advantages in several key industry-standard metrics:

Metric	AI Agent Mean	Human Mean	Difference ($\Delta$)
Pylint Score	7.74	7.27	+0.47
Maintainability Index	76.59	64.39	+12.2
Bug Density	0.30	0.43	-0.13
Lines of Code (LLOC)	59.8	95.6	-35.8

Key Findings:

Structural Excellence: AI-generated code achieved a higher Maintainability Index, indicating more modular and self-documenting designs.
Complexity Trade-off: AI code showed higher Cyclomatic Complexity (3.91 vs 1.63), which reflects a deliberate choice to use discrete functions rather than monolithic, nested loops.
Cost Efficiency: Once developed, the AI agent operates at less than 5% of a junior developer's salary.

📈 Functional Validation: Notch Filter Case Study

A critical test of the framework involved designing a digital notch filter to attenuate specific frequencies.

Figure 2: Magnitude Spectrum and Time Series of AI-generated filter results.

Through the Web Search component, the agent was able to autonomously correct coefficient calculations to achieve a 90% power reduction of the target frequency, proving its utility in technical and academic fields.

🔍 Research Focus

This project addresses three primary Research Questions (RQs):

RQ1: Can an AI agentic framework achieve code quality comparable to human developers in Python?
RQ2: What are the relative strengths and weaknesses of AI-generated code compared to human-written code across different task types?
RQ3: What is the correlation between framework components (e.g., web search) and the functional correctness of the generated code? `

📁 Repository Structure

.
├── README.md
├── assets/                  # Images and diagrams
├── AIWritten/               # Scripts generated by the AI agent
│   ├── ai.py
│   ├── analyze_quality.sh   # Static analysis automation
│   ├── basic_cli.py
│   ├── code_quality_report.csv
│   ├── csv_tool.py
│   ├── server.py
│   └── todo_cli.py
└── HumanWritten/            # Human-authored benchmark scripts
    ├── ai.py
    ├── analyze_quality.sh
    ├── code_quality_report.csv
    ├── csv_parser.py
    ├── server.py
    ├── size.py
    └── todo.py

🛠️ Requirements

Python 3.x
LLM: Google Gemini API key (set as GOOGLE_API_KEY in environment).
Static Analysis: Pylint, Radon, Bandit.

Installation

Clone the repository: git clone https://github.com/Alpsource/SQM_Test.git
Install dependencies: pip install pylint radon bandit google-genai
Set your API Key: export GOOGLE_API_KEY='your_key_here'

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Code Agents for Python: A Quality-Driven Approach 🤖🐍

🏗️ System Architecture

Core Components:

📊 Research Methodology & Results

Comparative Quality Metrics:

📈 Functional Validation: Notch Filter Case Study

🔍 Research Focus

📁 Repository Structure

🛠️ Requirements

Installation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
AIWritten		AIWritten
HumanWritten		HumanWritten
assets		assets
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

AI Code Agents for Python: A Quality-Driven Approach 🤖🐍

🏗️ System Architecture

Core Components:

📊 Research Methodology & Results

Comparative Quality Metrics:

📈 Functional Validation: Notch Filter Case Study

🔍 Research Focus

📁 Repository Structure

🛠️ Requirements

Installation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages