An event-driven streaming data pipeline that ingests real-time flight data from the AviationStack API into BigQuery using Google Cloud Pub/Sub and Cloud Functions.
┌──────────────────┐ ┌──────────────┐ ┌─────────────────┐ ┌──────────────┐
│ AviationStack │ ───▶ │ Publisher │ ───▶ │ Pub/Sub │ ───▶ │ Cloud │
│ API │ │ (Notebook) │ │ Topic │ │ Function │
└──────────────────┘ └──────────────┘ └─────────────────┘ └──────┬───────┘
│
▼
┌──────────────┐
│ BigQuery │
│ Table │
└──────────────┘
Flow:
- A publisher script (
aviationstack_extract_data.ipynb) fetches real-time flight data from the AviationStack API. - Each flight record is published as a message to a Pub/Sub topic.
- A Cloud Function (
main.py) subscribes to the topic, decodes the message, and streams it into BigQuery.
- Cloud Provider: Google Cloud Platform (GCP)
- Messaging: Cloud Pub/Sub
- Compute: Cloud Functions (Python, Pub/Sub-triggered)
- Data Warehouse: BigQuery
- Data Source: AviationStack API
- Language: Python 3
gcp_aviation/
├── aviationstack_extract_data.ipynb # Publisher: fetches API data & publishes to Pub/Sub
├── main.py # Cloud Function: subscribes & writes to BigQuery
├── requirements.txt # Cloud Function dependencies
└── README.md
- A Google Cloud project with billing enabled
- The following GCP APIs enabled:
- Cloud Pub/Sub API
- Cloud Functions API
- BigQuery API
- An AviationStack API key (free tier available)
gcloudCLI installed and authenticated
gcloud pubsub topics create flights-topicbq mk --dataset your_project_id:aviation_data
bq mk --table your_project_id:aviation_data.flights \
flight_date:STRING,flight_status:STRING,departure_airport:STRING,arrival_airport:STRING,airline_name:STRING,flight_number:STRINGAdjust the schema to match the fields you extract from the AviationStack response.
gcloud functions deploy flights-ingest \
--runtime python39 \
--trigger-topic flights-topic \
--entry-point hello_pubsub \
--source .Open aviationstack_extract_data.ipynb in Jupyter or Colab, set your API key and GCP project ID, and execute the cells to start publishing flight data.
Publisher (aviationstack_extract_data.ipynb):
- Calls the AviationStack API endpoint to retrieve real-time flight data
- Parses and structures each flight record
- Publishes messages to the Pub/Sub topic using the
google-cloud-pubsubclient
Subscriber (main.py):
- Triggered automatically when a new message arrives on the Pub/Sub topic
- Decodes the base64-encoded message payload
- Parses the JSON and streams the row into BigQuery using
insert_rows_json
This architectural pattern generalizes well beyond flight data to any use case that requires:
- Real-time ingestion from a third-party REST API
- Decoupled, event-driven processing
- Auto-scaling compute without managing servers
- Analytics-ready storage in a data warehouse
- The BigQuery write logic in
main.pyis currently commented out for demo purposes. Uncomment and updatetable_idbefore deploying to production. - Be mindful of AviationStack free-tier rate limits (100 requests/month on the free plan).
- Consider adding dead-letter topic configuration for failed message handling in production.
- Add Cloud Scheduler to trigger the publisher on a fixed interval (e.g., every 5 minutes)
- Build a Looker Studio dashboard on top of the BigQuery table for live flight monitoring
- Add dbt transformations to clean and model the raw flight data
- Implement schema validation with Pydantic before publishing
- Migrate the publisher to a Cloud Run service for full serverless orchestration
Maintained by Javier Pachas.
MIT