Raw2DataBase is a streamlined solution for loading raw CSV data into a PostgreSQL database, leveraging Docker for easy deployment and Metabase for powerful data visualization. This project provides a robust framework for managing database connections, processing CSV data, and seamlessly integrating with Metabase for data analysis and reporting.
-
Database Connection Handler 🛰️
- Generic and extensible to support multiple database types (PostgreSQL, MySQL, MongoDB etc.).
- Handles CSV processing using pandas, converting files into dataframes for database insertion.
-
Main Application Logic ⚙️
- Script to receive configuration and raw data paths.
- Manages database connection setup, data processing, and data insertion.
-
Tests 🩹
- Tests for each feature to ensure correct functionality and reliability.
- ✅ Database Connection Handler: A flexible, extensible handler for connecting to various databases.
- ✅ CSV Processing: Efficient CSV data processing using pandas, converting raw data into SQL-like objects for database insertion.
- ✅ Dockerized Environment: Easy setup and deployment using Docker and Docker Compose.
- ✅ Data Visualization: Integration with Metabase for creating and sharing interactive dashboards and reports.
- ❌ Test Coverage: Comprehensive tests using Pytest to ensure the reliability and correctness of each component.
raw2database/
├── docker/
│ ├── .env
│ └── docker-compose.yml
├── src/
│ ├── __init__.py
│ ├── data/
│ │ ├── __init__.py
│ │ └── data_processor.py
│ └── database/
│ ├── __init__.py
│ ├── database.py
│ └── database_loader.py
├── tests/
│ ├── __init__.py
│ ├── test_database.py
│ ├── test_data_loader.py
│ └── test_data_processor.py
├── config/
│ └── your_db_config.json
├── data/
│ └── raw/
│ └── processed
│ └── interim
│ └── external
├── requirements.txt
├── README.md
└── .gitignore
- Clone the Repository
git clone https://github.com/JMasr/raw2database.git- Navigate to the Project Directory
cd raw2database- Create and Activate the Conda Environment
conda create -n raw2database python=3.9
conda activate raw2database- Install Requirements
pip install -r requirements.txt- Navigate to the Docker Folder
cd docker- Configure the .env File with your Credentials
cat <<EOL > .env
POSTGRES_USER=<user_postgres>
POSTGRES_PASSWORD=<pass_postgres>
POSTGRES_DB=<ps_db_name>
PGADMIN_DEFAULT_EMAIL=<root@admin.demo>
PGADMIN_DEFAULT_PASSWORD=<pass_ui-admin_tool>
EOL- Run Docker Compose
docker-compose up -d- Create a configuration folder
mkdir config- Configure the Database Edit the config/postgres_config.json file to set the database connection details:
cd config
cat <<EOL > postgres_config.json
{
"db_type": "postgres",
"DB_NAME": "<ps_db_name>",
"DB_HOST": "<host>",
"DB_PORT": <port>,
"DB_USER": "<user_postgres>",
"DB_PASSWORD": "<pass_postgres>"
}
EOL- Running the Application To load data from a CSV file into the database, run:
python src/main.py --raw_files_path <path/to/your_data.csv> --config_file config/postgres_config.json --db_type postgresContributions are welcome! Please fork the repository and create a pull request with your improvements.
This project is licensed under the MIT License - see the LICENSE file for details.
For any questions or issues, please open an issue in the repository or contact the maintainer at jmramirez@gts.uvigo.es