Immigration Data Exploration Project Brief
Objective: Explore visa and encounter data to inform external data collection, model building, and hypothesis testing.
Project Goals
- Guide External Data Scraping: Use analysis to target Google Trends, Google News, and Economic/Financial Indicators.
- Feature Prioritization: Identify which variables contribute most to surge prediction for modeling.
- Statistical Validation: Generate testable hypotheses for rigorous statistical testing.
Data Availability
| Dataset |
File Path |
Description |
Status |
| Legal Immigrants |
data/processed/visa_master.parquet |
Visa issuance records |
✅ Available |
| Illegal Immigrants |
data/processed/encounter_master.parquet |
Border encounter records |
🔄 Coming Soon |
Deliverables
| # |
Deliverable |
Purpose |
| 1 |
Ranked Country/Region List |
Prioritize external data collection (Google Trends, news, economic indicators) |
| 2 |
Surge Pattern Analysis |
Document timing, frequency, and characteristics of migration surges |
| 3 |
Testable Hypotheses |
Provide statistically testable statements about surge drivers |
Methodology: 3-Phase Approach
Phase 1: Explore Individual Datasets
Profile visa_master.parquet
Profile Encounter Data (encounter_master.parquet)
Phase 2: Compare & Correlate Datasets
Align Timelines & Visualize Together
Key Question: Do visa grants precede encounters? Do they move together? Do policy shifts affect both?
Identify Surge Events
| Country |
Date |
Visa Surge? |
Encounter Surge? |
Notes |
| Example |
2024-06 |
✅ Yes |
❌ No |
Employment visa spike |
Phase 3: Distill Insights for Cross-Functional Teams
Rank Countries for External Data Collection
Prioritize top 5–10 countries by:
- Highest visa volume (steady pull factor)
- Highest visa growth rate (rapid change signal)
- Highest encounter volume (illegal migration pressure)
Regional Mapping: Group by region (Central America, South America, Asia, Africa, Middle East)
Output Format:
| Country |
Region |
Visa Growth |
Encounter Rate |
Recommended External Data Focus |
| Mexico |
North America |
+12% YoY |
High |
Google Trends: "US work visa"; Economic: unemployment, MXN/USD |
| Honduras |
Central America |
+45% YoY |
Medium |
News sentiment May–Aug; Remittance flows |
| ... |
... |
... |
... |
... |
Document Surge Patterns
For each priority country, document:
| Country |
Surge Timing (Visa) |
Surge Timing (Encounters) |
Alignment |
| Honduras |
May–July |
June–August |
⏱️ Encounters lag visas by ~1 month |
| Venezuela |
Q1 (Jan–Mar) |
Q2 (Apr–Jun) |
⏱️ Seasonal offset observed |
| ... |
... |
... |
... |
Key Questions:
- Are surges seasonal? (e.g., summer peaks)
- Are visa and encounter surges aligned or offset?
- Do specific years show anomalous behavior?
Generate Testable Hypotheses
Provide 5–8 statistically testable statements for the Statistical Testing team:
1. "Visa issuances for Mexico are positively correlated with southwest border encounters
(lag analysis: 1–3 months TBD)."
2. "Employment visa growth outpaces humanitarian visa growth for Central American countries,
suggesting economic pull factors dominate over protection needs (p < 0.05)."
3. "Encounter surges peak in summer months (Jun–Aug), while visa issuances peak in spring
(Mar–May), indicating seasonal migration patterns (χ² test for seasonality)."
4. "Countries with high encounter volumes also show rising visa issuances, suggesting
substitution vs. complementarity in migration pathways (correlation analysis)."
5. "Policy announcements (e.g., Title 42 changes) cause structural breaks in both visa
and encounter time series (Chow test)."
6. "FMUA encounters correlate more strongly with humanitarian visa trends than with
employment visas (multivariate regression)."
7. "Exchange rate depreciation in origin countries predicts visa application increases
with a 2-month lag (Granger causality test)."
8. "News sentiment spikes (negative) in origin countries precede encounter surges by
4–6 weeks (cross-correlation analysis)."
Immigration Data Exploration Project Brief
Project Goals
Data Availability
data/processed/visa_master.parquetdata/processed/encounter_master.parquetDeliverables
Methodology: 3-Phase Approach
Phase 1: Explore Individual Datasets
Profile
visa_master.parquetFamily,Employment,Humanitarian, etc.Profile Encounter Data (
encounter_master.parquet)Apprehensions,Expulsions,InadmissiblesFMUA,Single Adults,UC/MinorsPhase 2: Compare & Correlate Datasets
Align Timelines & Visualize Together
Identify Surge Events
Phase 3: Distill Insights for Cross-Functional Teams
Rank Countries for External Data Collection
Prioritize top 5–10 countries by:
Regional Mapping: Group by region (Central America, South America, Asia, Africa, Middle East)
Output Format:
Document Surge Patterns
For each priority country, document:
Generate Testable Hypotheses
Provide 5–8 statistically testable statements for the Statistical Testing team: