Releases: ibitec7/migration
Releases · ibitec7/migration
v1.0.0 - Foundation Release
We are pleased to announce the first official release of the Migration Prediction Analysis project.
This release establishes our foundational, end-to-end multi-modal pipeline for predicting global migration surges. By unifying real-time economic indicators, search trends, border encounter ground-truths, and over 100,000 processed geopolitical news events, we have successfully modeled the push and pull factors that influence human migration on a global scale.
Major Features & Pipeline Highlights
- Automated Data Aggregation: Deployed scripts to aggregate data from US Customs and Border Protection (CBP), Travel.State.Gov, IMF, Google Trends, and Google News.
- Multi-Modal NLP Pipeline:
- Processed an initial collection of 170,784 raw news articles, filtering them to 104,333 valid entries (a ~61.1% success rate).
- Implemented Jina v5 (TensorRT) to generate highly dimensional embeddings for semantic clustering.
- Integrated Flan-T5 (TensorRT) to autonomously sample and generate descriptive cluster labels for push/pull socioeconomic conditions in origin countries.
- Hugging Face Bootstrap Syncing: Built-in
.shscripts and auv-backed environment to ensure consistent and reproducible repository bootstrapping across environments.
Predictive Modeling Architectures
This release incorporates a strict Walk-Forward Out-of-Time (OOT) evaluation framework processing models with 1-month to 6-month leads:
- cuML Random Forest: High short-term precision tree-ensemble baseline achieving a 0.97 F1-Score for Lead 1 surges.
- PyTorch Transformer: Attention-driven long-sequence model maintaining robust long-term predictive capabilities (up to 6 months out).
- PyTorch LSTM with SurgeJointLoss: A novel architecture combining Huber and BCE parameters to meticulously penalize extreme threshold alarms, operating as our highest-precision early warning system.
- Horizon-Aware Ensemble (Meta-Model): A dynamically scaled ensemble that leverages the optimal strengths of the underlying components relative to the specified forecast horizon duration.
Key Analytical Findings
- Zipfian Distribution Dynamics: Migration volumes and surges follow a sharp power-law concentration. The top 20% of countries account for 88.3% of visas, with significant excess volume variance mapping heavily to states like Cuba, Mexico, and Afghanistan.
- Exchange Rate Pre-Indicators: Identified profound precursor signals, such as a localized indicator for the Dominican Republic (2-month lag highlighting a 0.498 correlation vector).
- Intent via Search Queries: Captured localized search queries (e.g.,
cbp_one,us_asylum) yielding ~55% structurally significant leading parameters mapping to eventual physical border encounters.
Installation & Usage: Please refer to the updated README.md for Dev Containers and uv environment setup instructions.