This project automates the generation of data analysis reports using a multi-agent system built with crewai. It takes a dataset as input and produces a Jupyter Notebook with a complete analysis, including data cleaning, exploratory data analysis (EDA), insights, and predictive modeling suggestions.
The core of this project is a two-agent system that collaborates to perform a comprehensive data analysis:
- Data Analyst Expert: This agent creates a detailed, step-by-step plan for analyzing the data.
- Expert Python Developer: This agent takes the plan and writes the corresponding Python code, assembling both the plan and the code into a final Jupyter Notebook.
This approach allows for a clear separation of concerns, where one agent focuses on the analytical strategy and the other on the implementation, resulting in a well-structured and easy-to-understand report.
-
Install dependencies:
python -m venv venv source .venv/bin/activate pip install -r requirements.txt -
Set up your environment:
- Create a
.envfile in the root of the project. - Add your
OPENROUTER_API_KEYto the.envfile:OPENROUTER_API_KEY="your_api_key"
- Create a
-
Configure your data:
- Open
main.pyand update the following variables:company_name: The name of the company for the analysis.dataset_description: A brief description of the dataset.dataset_path: The path to your dataset.
- Open
-
Run the agent:
python main.py
-
View the output:
analysis_plan.md: The analysis plan generated by the Data Analyst Expert.analysis_report.ipynb: The final Jupyter Notebook with the complete analysis.
The project uses the crewai library to create and manage the two agents.
-
analyze_planer_agent: This agent is responsible for creating the data analysis plan. It takes the dataset and a description as input and outputs a Markdown file with a step-by-step guide for the analysis. -
analyze_coder_agent: This agent takes the analysis plan from the first agent and generates the Python code to execute it. It then assembles the plan (as Markdown cells) and the code (as code cells) into a Jupyter Notebook.
The two agents work in a sequential process, with the output of the first agent being the input for the second. This ensures that the final report is based on a well-defined and structured plan.
Second Hand Luxury Fashion Data
https://www.kaggle.com/datasets/justinpakzad/vestiaire-fashion-dataset
This dataset contains product listings from Vestiaire, an online marketplace for buying and selling pre-owned luxury fashion items. It was scraped using Python and the Hrequests Library. The CSV file contains approximately 900k rows and 36 columns.
Trend Analysis: Investigate current trends in second-hand luxury fashion, such as brands, product types, and item pricing, to gain a deeper understanding of the current market trends. Geographical Analysis: Analyze which countries are the most active in terms of both buyers and sellers on Vestiaire Collective. Look for trends in user demographics, such as regions with a high concentration of second-hand luxury fashion. Item Price Prediction: Utilize machine learning algorithms to predict the price of listed items based on various available features.