Skip to content

Exploratory Data Analysis Plan #1

@ibitec7

Description

@ibitec7

Immigration Data Exploration Project Brief

Objective: Explore visa and encounter data to inform external data collection, model building, and hypothesis testing.


Project Goals

  1. Guide External Data Scraping: Use analysis to target Google Trends, Google News, and Economic/Financial Indicators.
  2. Feature Prioritization: Identify which variables contribute most to surge prediction for modeling.
  3. Statistical Validation: Generate testable hypotheses for rigorous statistical testing.

Data Availability

Dataset File Path Description Status
Legal Immigrants data/processed/visa_master.parquet Visa issuance records ✅ Available
Illegal Immigrants data/processed/encounter_master.parquet Border encounter records 🔄 Coming Soon

Deliverables

# Deliverable Purpose
1 Ranked Country/Region List Prioritize external data collection (Google Trends, news, economic indicators)
2 Surge Pattern Analysis Document timing, frequency, and characteristics of migration surges
3 Testable Hypotheses Provide statistically testable statements about surge drivers

Methodology: 3-Phase Approach

Phase 1: Explore Individual Datasets

Profile visa_master.parquet

  • Load & Summarize
    • Date range coverage
    • Countries represented
    • Visa type categories
    • Monthly issuance volumes
  • Visualize Trends
    • Plot visa issuances over time by visa type: Family, Employment, Humanitarian, etc.
  • Rank Countries
    • Identify top 10 countries by total visa issuances
  • Anomaly Detection
    • Flag gaps, outliers, or seasonal patterns (e.g., monthly spikes)

Profile Encounter Data (encounter_master.parquet)

  • Load & Summarize
    • Date range coverage
    • Citizenship countries
    • Encounter types: Apprehensions, Expulsions, Inadmissibles
  • Visualize Trends
    • Plot total monthly encounters over time
  • Rank Countries
    • Identify top countries by encounter volume
  • Demographic Breakdown
    • Segment by: FMUA, Single Adults, UC/Minors
    • Segment by encounter type

Phase 2: Compare & Correlate Datasets

Align Timelines & Visualize Together

  • Convert encounter data to monthly granularity (if needed) to match visa data
  • Plot visa and encounter trends on same timeline to detect:
    • Lead/lag relationships
    • Co-movement patterns
    • Policy change impacts (visible as structural breaks)

Key Question: Do visa grants precede encounters? Do they move together? Do policy shifts affect both?

Identify Surge Events

  • Define "Surge" (example criteria):
    • ≥30% month-over-month increase, OR
    • Values above 75th percentile of historical distribution
  • Detect Surges in both datasets:
    • When do visa surges occur vs. encounter surges?
  • Document Surge Events per country:
Country Date Visa Surge? Encounter Surge? Notes
Example 2024-06 ✅ Yes ❌ No Employment visa spike

Phase 3: Distill Insights for Cross-Functional Teams

Rank Countries for External Data Collection

Prioritize top 5–10 countries by:

  • Highest visa volume (steady pull factor)
  • Highest visa growth rate (rapid change signal)
  • Highest encounter volume (illegal migration pressure)

Regional Mapping: Group by region (Central America, South America, Asia, Africa, Middle East)

Output Format:

Country Region Visa Growth Encounter Rate Recommended External Data Focus
Mexico North America +12% YoY High Google Trends: "US work visa"; Economic: unemployment, MXN/USD
Honduras Central America +45% YoY Medium News sentiment May–Aug; Remittance flows
... ... ... ... ...

Document Surge Patterns

For each priority country, document:

Country Surge Timing (Visa) Surge Timing (Encounters) Alignment
Honduras May–July June–August ⏱️ Encounters lag visas by ~1 month
Venezuela Q1 (Jan–Mar) Q2 (Apr–Jun) ⏱️ Seasonal offset observed
... ... ... ...

Key Questions:

  • Are surges seasonal? (e.g., summer peaks)
  • Are visa and encounter surges aligned or offset?
  • Do specific years show anomalous behavior?

Generate Testable Hypotheses

Provide 5–8 statistically testable statements for the Statistical Testing team:

1. "Visa issuances for Mexico are positively correlated with southwest border encounters 
   (lag analysis: 1–3 months TBD)."

2. "Employment visa growth outpaces humanitarian visa growth for Central American countries, 
   suggesting economic pull factors dominate over protection needs (p < 0.05)."

3. "Encounter surges peak in summer months (Jun–Aug), while visa issuances peak in spring 
   (Mar–May), indicating seasonal migration patterns (χ² test for seasonality)."

4. "Countries with high encounter volumes also show rising visa issuances, suggesting 
   substitution vs. complementarity in migration pathways (correlation analysis)."

5. "Policy announcements (e.g., Title 42 changes) cause structural breaks in both visa 
   and encounter time series (Chow test)."

6. "FMUA encounters correlate more strongly with humanitarian visa trends than with 
   employment visas (multivariate regression)."

7. "Exchange rate depreciation in origin countries predicts visa application increases 
   with a 2-month lag (Granger causality test)."

8. "News sentiment spikes (negative) in origin countries precede encounter surges by 
   4–6 weeks (cross-correlation analysis)."

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions