Skip to content

corramr/hackapizza

Repository files navigation

Hackapizza 2025 - GenAI Culinary Navigator

Galactic Pizza

This repository contains the solution for the Hackapizza 2025 coding challenge, organized by DataPizza. The challenge involved developing a Generative AI-powered assistant to help intergalactic travelers navigate a complex culinary landscape.

The Challenge: Intergalactic Gastronomy

As detailed in challenge_description.md, the core task was to build an AI system capable of:

  • Interpreting natural language queries about food preferences and restrictions.
  • Processing information from diverse sources like restaurant menus, blog posts, galactic laws, and cooking manuals.
  • Suggesting appropriate dishes based on user requests.
  • Verifying dish compliance with (simulated) galactic regulations.
  • Utilizing Generative AI techniques such as Retrieval Augmented Generation (RAG) and AI Agents.

Implemented Solution

Our solution leverages a knowledge graph and language models to address the challenge. The system is built around three main Python scripts:

  1. parsing.py:

    • Responsible for extracting structured information from various unstructured data sources, primarily PDF documents (e.g., restaurant menus, culinary manuals).
    • Utilizes an OpenAI language model (e.g., gpt-4o-mini) to understand and parse the content of these documents.
    • Outputs the extracted data into JSON files, preparing it for ingestion into the knowledge graph.
  2. graph_construction.py:

    • Builds a Neo4j graph database from the structured data generated by parsing.py and other provided data files (e.g., CSVs for planet distances).
    • Creates nodes representing entities like Restaurants, Dishes, Ingredients, Chefs, Planets, Culinary Techniques, and Licenses.
    • Establishes relationships between these nodes to capture the complex connections within the culinary universe (e.g., a Chef WORKS_AT a Restaurant, a Dish CONTAINS an Ingredient, a Restaurant IS_LOCATED_ON a Planet).
  3. graph_retrieval.py:

    • Implements the core RAG (Retrieval Augmented Generation) pipeline for answering user queries.
    • Takes a natural language question as input (from data/domande.csv for evaluation).
    • Uses a large language model (e.g., DeepSeek or OpenAI models) to translate the natural language question into a Cypher query, tailored to the Neo4j graph schema.
    • Executes the generated Cypher query against the Neo4j database to retrieve relevant dishes.
    • Maps the retrieved dish names to their corresponding IDs using data/Misc/dish_mapping.json.
    • Includes an evaluation component that calculates the Jaccard similarity between the system's output and a ground truth dataset (solution/ground_truth_mapped.csv).
    • Saves a detailed report of the queries, generated Cypher, results, and performance metrics to report/report.json.

Project Structure

.
├── data/                     # Input data files (CSVs, PDFs, JSONs)
│   ├── Blogpost/
│   ├── Codice Galattico/
│   ├── Manuale/
│   ├── Misc/
│   ├── Ristoranti/
│   ├── domande.csv
│   └── ...
├── images/                   # Images used in documentation
├── manual_licenses/          # Parsed license information from manuals
├── report/                   # Output reports (e.g., report.json)
├── restaurant_licenses/      # Parsed license information for restaurants
├── restaurant_menu/          # Parsed menu information
├── restaurant_planet/        # Parsed planet information for restaurants
├── solution/                 # Ground truth solution files
│   └── ground_truth_mapped.csv
├── .gitignore
├── challenge_description.md  # Description of the Hackapizza challenge
├── graph_construction.py     # Script to build the Neo4j graph
├── graph_retrieval.py        # Script for querying the graph and RAG pipeline
├── parsing.py                # Script for parsing input documents
├── requirements.txt          # Python dependencies (ensure this is created/updated)
└── README.md                 # This file

How to Run

  1. Prerequisites:

    • Python 3.x
    • A running Neo4j instance.
    • API keys for OpenAI and/or DeepSeek (depending on the LLM used in graph_retrieval.py and parsing.py).
  2. Setup:

    • Clone the repository.
    • Install Python dependencies:
      pip install -r requirements.txt
    • Set up the following environment variables:
      • NEO4J_URI: The URI for your Neo4j instance (e.g., bolt://localhost:7687)
      • NEO4J_USERNAME: Your Neo4j username
      • NEO4J_PASSWORD: Your Neo4j password
      • OPENAI_API_KEY: Your OpenAI API key (used in parsing.py and potentially graph_retrieval.py)
      • DEEPSEEK_API_KEY: Your DeepSeek API key (used in graph_retrieval.py)
  3. Data Parsing:

    • Configure the flags in parsing.py to select which documents to parse.
    • Run the parsing script:
      python parsing.py
    • This will generate JSON files in directories like restaurant_menu/, restaurant_planet/, restaurant_licenses/, and manual_licenses/.
  4. Graph Construction:

    • Configure the flags in graph_construction.py to control which nodes and relationships are created/updated.
    • Run the graph construction script:
      python graph_construction.py
    • This will populate your Neo4j database.
  5. Querying and Evaluation:

    • Ensure data/domande.csv contains the questions to be processed.
    • Run the graph retrieval and evaluation script:
      python graph_retrieval.py
    • This will process the questions, query the graph, and generate report/report.json with the results and Jaccard similarity scores. The script will also produce a CSV file ready for submission to the Kaggle competition.

GenAI Techniques Used

  • Retrieval Augmented Generation (RAG): The core of the graph_retrieval.py script involves using an LLM to generate Cypher queries (generation) based on the user's question and the graph schema, then retrieving data from the Neo4j knowledge graph (retrieval) to answer the question.
  • LLM-based Data Extraction: parsing.py uses an LLM to understand and extract structured information from unstructured PDF documents.
  • Knowledge Graphs: A Neo4j graph is used to store and relate complex information about the culinary universe, enabling sophisticated querying.

This project demonstrates an approach to building a sophisticated AI assistant by combining the strengths of large language models and knowledge graphs.

About

Hackapizza 2025 Challenge: AI culinary navigator using RAG, LLMs & Neo4j for intergalactic dish recommendations.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors