Hackapizza 2025 - GenAI Culinary Navigator

This repository contains the solution for the Hackapizza 2025 coding challenge, organized by DataPizza. The challenge involved developing a Generative AI-powered assistant to help intergalactic travelers navigate a complex culinary landscape.

The Challenge: Intergalactic Gastronomy

As detailed in challenge_description.md, the core task was to build an AI system capable of:

Interpreting natural language queries about food preferences and restrictions.
Processing information from diverse sources like restaurant menus, blog posts, galactic laws, and cooking manuals.
Suggesting appropriate dishes based on user requests.
Verifying dish compliance with (simulated) galactic regulations.
Utilizing Generative AI techniques such as Retrieval Augmented Generation (RAG) and AI Agents.

Implemented Solution

Our solution leverages a knowledge graph and language models to address the challenge. The system is built around three main Python scripts:

parsing.py:
- Responsible for extracting structured information from various unstructured data sources, primarily PDF documents (e.g., restaurant menus, culinary manuals).
- Utilizes an OpenAI language model (e.g., gpt-4o-mini) to understand and parse the content of these documents.
- Outputs the extracted data into JSON files, preparing it for ingestion into the knowledge graph.
graph_construction.py:
- Builds a Neo4j graph database from the structured data generated by parsing.py and other provided data files (e.g., CSVs for planet distances).
- Creates nodes representing entities like Restaurants, Dishes, Ingredients, Chefs, Planets, Culinary Techniques, and Licenses.
- Establishes relationships between these nodes to capture the complex connections within the culinary universe (e.g., a Chef WORKS_AT a Restaurant, a Dish CONTAINS an Ingredient, a Restaurant IS_LOCATED_ON a Planet).
graph_retrieval.py:
- Implements the core RAG (Retrieval Augmented Generation) pipeline for answering user queries.
- Takes a natural language question as input (from data/domande.csv for evaluation).
- Uses a large language model (e.g., DeepSeek or OpenAI models) to translate the natural language question into a Cypher query, tailored to the Neo4j graph schema.
- Executes the generated Cypher query against the Neo4j database to retrieve relevant dishes.
- Maps the retrieved dish names to their corresponding IDs using data/Misc/dish_mapping.json.
- Includes an evaluation component that calculates the Jaccard similarity between the system's output and a ground truth dataset (solution/ground_truth_mapped.csv).
- Saves a detailed report of the queries, generated Cypher, results, and performance metrics to report/report.json.

Project Structure

.
├── data/                     # Input data files (CSVs, PDFs, JSONs)
│   ├── Blogpost/
│   ├── Codice Galattico/
│   ├── Manuale/
│   ├── Misc/
│   ├── Ristoranti/
│   ├── domande.csv
│   └── ...
├── images/                   # Images used in documentation
├── manual_licenses/          # Parsed license information from manuals
├── report/                   # Output reports (e.g., report.json)
├── restaurant_licenses/      # Parsed license information for restaurants
├── restaurant_menu/          # Parsed menu information
├── restaurant_planet/        # Parsed planet information for restaurants
├── solution/                 # Ground truth solution files
│   └── ground_truth_mapped.csv
├── .gitignore
├── challenge_description.md  # Description of the Hackapizza challenge
├── graph_construction.py     # Script to build the Neo4j graph
├── graph_retrieval.py        # Script for querying the graph and RAG pipeline
├── parsing.py                # Script for parsing input documents
├── requirements.txt          # Python dependencies (ensure this is created/updated)
└── README.md                 # This file

How to Run

Prerequisites:
- Python 3.x
- A running Neo4j instance.
- API keys for OpenAI and/or DeepSeek (depending on the LLM used in graph_retrieval.py and parsing.py).
Setup:
- Clone the repository.
- Install Python dependencies:
```
pip install -r requirements.txt
```
- Set up the following environment variables:
  - NEO4J_URI: The URI for your Neo4j instance (e.g., bolt://localhost:7687)
  - NEO4J_USERNAME: Your Neo4j username
  - NEO4J_PASSWORD: Your Neo4j password
  - OPENAI_API_KEY: Your OpenAI API key (used in parsing.py and potentially graph_retrieval.py)
  - DEEPSEEK_API_KEY: Your DeepSeek API key (used in graph_retrieval.py)
Data Parsing:
- Configure the flags in parsing.py to select which documents to parse.
- Run the parsing script:
```
python parsing.py
```
- This will generate JSON files in directories like restaurant_menu/, restaurant_planet/, restaurant_licenses/, and manual_licenses/.
Graph Construction:
- Configure the flags in graph_construction.py to control which nodes and relationships are created/updated.
- Run the graph construction script:
```
python graph_construction.py
```
- This will populate your Neo4j database.
Querying and Evaluation:
- Ensure data/domande.csv contains the questions to be processed.
- Run the graph retrieval and evaluation script:
```
python graph_retrieval.py
```
- This will process the questions, query the graph, and generate report/report.json with the results and Jaccard similarity scores. The script will also produce a CSV file ready for submission to the Kaggle competition.

GenAI Techniques Used

Retrieval Augmented Generation (RAG): The core of the graph_retrieval.py script involves using an LLM to generate Cypher queries (generation) based on the user's question and the graph schema, then retrieving data from the Neo4j knowledge graph (retrieval) to answer the question.
LLM-based Data Extraction: parsing.py uses an LLM to understand and extract structured information from unstructured PDF documents.
Knowledge Graphs: A Neo4j graph is used to store and relate complex information about the culinary universe, enabling sophisticated querying.

This project demonstrates an approach to building a sophisticated AI assistant by combining the strengths of large language models and knowledge graphs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hackapizza 2025 - GenAI Culinary Navigator

The Challenge: Intergalactic Gastronomy

Implemented Solution

Project Structure

How to Run

GenAI Techniques Used

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
images		images
manual_licenses		manual_licenses
report		report
restaurant_licenses		restaurant_licenses
restaurant_menu		restaurant_menu
restaurant_planet		restaurant_planet
solution		solution
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
challenge_description.md		challenge_description.md
graph_construction.py		graph_construction.py
graph_retrieval.py		graph_retrieval.py
parsing.py		parsing.py
prompts.py		prompts.py
requirements.txt		requirements.txt
utils.py		utils.py

Folders and files

Latest commit

History

Repository files navigation

Hackapizza 2025 - GenAI Culinary Navigator

The Challenge: Intergalactic Gastronomy

Implemented Solution

Project Structure

How to Run

GenAI Techniques Used

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages