This project compares MongoDB and Redis for analytical workloads using a large-scale e-commerce dataset (~2M+ records). It evaluates how each system performs when executing aggregation queries on transactional data.
-
Data ingestion pipelines for MongoDB and Redis
-
Data cleaning and normalization process
-
Analytical queries on:
- Top-selling category
- Brand with highest revenue
- Month with most sales
- Execution time benchmarking
- Python
- MongoDB
- Redis
- Docker
E-commerce purchase history dataset with ~2.6M records, reduced to ~2.0M after cleaning.
- Load cleaned data into MongoDB (document model)
- Load transformed data into Redis (key-value model)
- Execute equivalent queries in both systems
- Measure execution time for each query
| Query | MongoDB | Redis |
|---|---|---|
| Top Category | ~5.9 sec | ~11 min |
| Top Brand (Revenue) | ~5.8 sec | ~12 min |
| Top Month | ~3.7 sec | ~12 min |
- MongoDB outperforms Redis by up to 100x+ in analytical queries
- Redis is inefficient for aggregation-heavy workloads due to lack of query engine
- MongoDB’s aggregation pipeline and indexing enable scalable analytics
- Data cleaning was critical to obtain valid results
docker-compose up
pip install -r requirements.txtLoad data:
python src/ingestion/load_mongo.py
python src/ingestion/load_redis.pyRun queries:
python src/queries/mongo/query_mongo_brand.py
python src/queries/redis/query_redis_brand.pyMongoDB is significantly more suitable for large-scale analytical queries, while Redis is better suited for caching and fast key-value access.
- Redis queries are executed client-side (Python), not natively
- No parallelization implemented
Edgar Antonio Zeledón Pérez