Skip to content

v3.1 Pipeline Update: Vault Fully Populated, Vote Pipeline 100% #9

@OpenSourcePatents

Description

@OpenSourcePatents

Following the v3 architecture announcement, this update closes the loop on the GovTrack integration and documents the fixes required to get there.
Vote Pipeline: 538/538, 100% success rate
The GovTrack pivot is complete. What the v3 announcement described as “real-time votes” is now actually delivering — 20 real votes per member in every vault file. Getting there required several non-obvious fixes:
The bioguide_id filter on GovTrack’s /api/v2/person endpoint returns HTTP 400 with a Django FieldError. No deprecation notice was published. Fix: replaced 538 individual crosswalk API calls with a bulk pull from unitedstates/congress-legislators JSON (2 HTTP calls total, cached to data/crosswalk.json with 7-day TTL).
The vote['id'] field referenced in format_vote() does not exist in GovTrack’s response schema. The correct field is vote['link'], which returns the full URL directly. This was silently swallowing every vote entry and producing 0-vote saves despite successful API responses. Fixed with fallback URL construction from congress, session, chamber, and number fields.
Finance Pipeline v3
Merged FEC + EDGAR scoring into vault files via split logic. Detail files written by fetch_votes.py are preserved — detail_data.update(m) merges finance data without overwriting votes. Division-by-zero bug patched in update_flags() for members with $0 raised.
Known issues
EDGAR ~30% miss rate and false positives on common names remain open. CIK map is the fix — tracking in a separate issue. PAC contributions returning 0 early in 2026 cycle — 2024 fallback planned for next sprint.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions