Skip to content

Releases: ibitec7/migration

v1.0.0 - Foundation Release

28 Mar 20:22

Choose a tag to compare

We are pleased to announce the first official release of the Migration Prediction Analysis project.

This release establishes our foundational, end-to-end multi-modal pipeline for predicting global migration surges. By unifying real-time economic indicators, search trends, border encounter ground-truths, and over 100,000 processed geopolitical news events, we have successfully modeled the push and pull factors that influence human migration on a global scale.

Major Features & Pipeline Highlights

  • Automated Data Aggregation: Deployed scripts to aggregate data from US Customs and Border Protection (CBP), Travel.State.Gov, IMF, Google Trends, and Google News.
  • Multi-Modal NLP Pipeline:
    • Processed an initial collection of 170,784 raw news articles, filtering them to 104,333 valid entries (a ~61.1% success rate).
    • Implemented Jina v5 (TensorRT) to generate highly dimensional embeddings for semantic clustering.
    • Integrated Flan-T5 (TensorRT) to autonomously sample and generate descriptive cluster labels for push/pull socioeconomic conditions in origin countries.
  • Hugging Face Bootstrap Syncing: Built-in .sh scripts and a uv-backed environment to ensure consistent and reproducible repository bootstrapping across environments.

Predictive Modeling Architectures

This release incorporates a strict Walk-Forward Out-of-Time (OOT) evaluation framework processing models with 1-month to 6-month leads:

  • cuML Random Forest: High short-term precision tree-ensemble baseline achieving a 0.97 F1-Score for Lead 1 surges.
  • PyTorch Transformer: Attention-driven long-sequence model maintaining robust long-term predictive capabilities (up to 6 months out).
  • PyTorch LSTM with SurgeJointLoss: A novel architecture combining Huber and BCE parameters to meticulously penalize extreme threshold alarms, operating as our highest-precision early warning system.
  • Horizon-Aware Ensemble (Meta-Model): A dynamically scaled ensemble that leverages the optimal strengths of the underlying components relative to the specified forecast horizon duration.

Key Analytical Findings

  • Zipfian Distribution Dynamics: Migration volumes and surges follow a sharp power-law concentration. The top 20% of countries account for 88.3% of visas, with significant excess volume variance mapping heavily to states like Cuba, Mexico, and Afghanistan.
  • Exchange Rate Pre-Indicators: Identified profound precursor signals, such as a localized indicator for the Dominican Republic (2-month lag highlighting a 0.498 correlation vector).
  • Intent via Search Queries: Captured localized search queries (e.g., cbp_one, us_asylum) yielding ~55% structurally significant leading parameters mapping to eventual physical border encounters.

Installation & Usage: Please refer to the updated README.md for Dev Containers and uv environment setup instructions.