This project analyzes a ride-booking dataset using PostgreSQL to understand operational performance, revenue patterns, customer behavior, and service quality.
The work follows a structured data pipeline approach, starting from raw data and progressing through cleaning, transformation, validation, and analysis to generate actionable insights.
- Evaluate ride completion and failure patterns
- Analyze revenue distribution across payment methods and vehicle types
- Identify peak demand periods
- Understand customer contribution to revenue
- Assess service quality using ratings
- Identify operational inefficiencies
The dataset contains ride-level transactional data with the following attributes:
booking_id: Unique identifier for each ridecustomer_id: Customer identifierbooking_status: Ride outcome (completed, cancelled, etc.)ride_timestamp: Date and time of the ridevehicle_type: Type of vehicle selectedpickup_location,drop_location: Ride locationsbooking_value: Revenue generated per rideride_distance: Distance of the ridedriver_rating,customer_rating: Ratingspayment_method: Mode of payment
Raw Data → Cleaning → Transformation → Final Table → Validation → Feature Engineering → Views → Analysis
=======
The diagram below represents the complete data pipeline, including data ingestion, transformation, modeling, and analysis layers:
- Standardized text fields (lowercase, trimming, formatting)
- Converted invalid values (empty strings, 'null') into SQL NULL
- Created a unified timestamp column (
ride_timestamp)
- Converted numeric fields from TEXT to numeric types
- Created structured table (
rides) for analysis
- Removed duplicate bookings using window functions
- Created
rides_finalensuring one record per booking
- Checked for duplicates, null values, and invalid entries
- Verified completeness of key fields for completed rides
- Handled missing values (e.g., payment method)
- Added indexes for performance optimization
Created reusable views for:
- KPI metrics
- Time-based analysis
- Payment distribution
- Customer statistics
- Cancellation patterns
- Rating vs revenue analysis
- Completion rate is approximately 62%
- Around 38% of rides fail
- Driver cancellations are the largest contributor
Interpretation: Failures are primarily driven by supply-side issues rather than customer behavior.
- UPI contributes the largest share of revenue (~45%)
- Cash remains significant (~25%)
Interpretation: Users show a strong preference for digital payment methods.
- Peak demand occurs between 5 PM and 8 PM
- Lowest activity occurs between 1 AM and 5 AM
Interpretation: Demand aligns with daily commuting behavior.
- Revenue closely follows ride volume
- No evidence of significantly higher-value time periods
Interpretation: Revenue is driven by volume rather than pricing differences.
- Affordable options (Auto, Go Mini) dominate usage and revenue
- Premium services contribute minimally
Interpretation: The platform is primarily driven by cost-sensitive users.
- Top 10 customers contribute approximately 9% of total revenue
Interpretation: Revenue is distributed across a broad customer base, indicating scalability.
- No single pickup location dominates ride volume
Interpretation: Demand is geographically distributed rather than concentrated.
- Driver cancellations are the largest category
- A significant portion of failures is due to lack of driver availability
Interpretation: Improving driver allocation and availability could significantly increase completion rates.
- Ratings are concentrated between 4.0 and 4.6
- Low ratings are rare
Interpretation: Service quality is consistently high.
Despite high customer satisfaction (ratings), the platform shows a relatively low completion rate.
This indicates a gap between:
- Service quality (post-ride experience)
- Operational efficiency (ride fulfillment and driver availability)
Improving supply-side operations would likely have the highest impact on overall performance.
- PostgreSQL
- SQL
- pgAdmin
- Git and GitHub
ride-booking-analytics/
│
├── data/
│ └── rides_raw_data.csv
│
├── sql/
│ ├── 01_table_creation.sql
│ ├── 02_data_cleaning.sql
│ ├── 03_transformation.sql
│ ├── 04_final_tables.sql
│ ├── 05_validation.sql
│ ├── 06_feature_engineering.sql
│ ├── 07_views.sql
│ └── 08_analysis_and_insights.sql
│
├── architecture.png
-
Import the dataset into PostgreSQL using pgAdmin
-
Execute SQL files sequentially:
- Table creation → Cleaning → Transformation → Final tables
- Validation → Feature engineering → Views → Analysis
-
Review query outputs and insights
- Add visualization layer (Power BI / Tableau)
- Implement demand forecasting models
- Optimize driver allocation strategies
- Build real-time analytics pipeline
Rohit Kumar
