Skip to content

feat(ingestion): Evolve alerts schema#763

Closed
runkelcorey wants to merge 9 commits into
mainfrom
feature-ingestion-alerts-schema
Closed

feat(ingestion): Evolve alerts schema#763
runkelcorey wants to merge 9 commits into
mainfrom
feature-ingestion-alerts-schema

Conversation

@runkelcorey

Copy link
Copy Markdown
Collaborator

🔔 handle new & old schemas when ingesting alerts

What changes does this PR propose?

  1. Allow columns in the message that aren't in the schema to be processed. This could result in more columns loaded to our tables, including a column that is blank for the whole table except for one record. The motivation is that this approach allows the table schema to more accurately reflect the messages coming from our upstream systems, which is what we want springboard to do.
  2. To make that happen, I needed to change from setting the schema for incoming message to overriding the inferred schema for fields that overlap with the schema. Ideally, we'd know if new fields are added but this approach allows the schema to evolve (so long as it follows the structure of a GTFS-RT FeedMessage)
  3. To make that work without admitting the possbility that the schema changes on each new message, I wrote dataframely schemas for all GTFS-RT ConfigTypes

How were these changes validated?

  • Added test case for specific Alerts schema changes to test_convert
  • Added other new test cases for test_file_conversion
  • Removed now-obsolete test files
  • Ran ingestion locally until no errors
  • Ran on dev successfully

What questions should reviewers consider?

  1. Should I even handle empty/blank records?
  2. What impacts do you imagine on performance manager?
  3. Is there a better approach than merely changing the partition column. Changing it to a value that can't be null prevents us from silently dropping records where, eg. route_id is null, but it also increases iterations in write_lcoal_pq

@runkelcorey runkelcorey self-assigned this Jun 11, 2026
@runkelcorey runkelcorey added the enhancement New feature or request label Jun 11, 2026
@github-actions

Copy link
Copy Markdown

LCOV of commit f4a34a6 during Continuous Integration (Python) #1977

Summary coverage rate:
  lines......: 65.4% (3684 of 5632 lines)
  functions..: 30.5% (284 of 932 functions)
  branches...: no data found

Files changed coverage rate:
                                                                                     |Lines       |Functions  |Branches    
  Filename                                                                           |Rate     Num|Rate    Num|Rate     Num
  =========================================================================================================================
  src/lamp_py/ingestion/config_busloc_trip.py                                        |93.3%     30|33.3%    12|    -      0
  src/lamp_py/ingestion/config_busloc_vehicle.py                                     |96.9%     65|30.0%    10|    -      0
  src/lamp_py/ingestion/config_rt_alerts.py                                          |97.9%     47|40.0%    10|    -      0
  src/lamp_py/ingestion/config_rt_trip.py                                            |96.4%     28|41.7%    12|    -      0
  src/lamp_py/ingestion/config_rt_vehicle.py                                         |97.9%     47|40.0%    10|    -      0
  src/lamp_py/ingestion/convert_gtfs_rt.py                                           |92.0%    238|50.0%    26|    -      0
  src/lamp_py/ingestion/gtfs_rt_detail.py                                            |98.1%     54| 8.3%    12|    -      0
  src/lamp_py/ingestion/gtfs_rt_structs.py                                           | 100%      6|    -     0|    -      0
  src/lamp_py/ingestion/utils.py                                                     |65.3%    118|36.4%    22|    -      0

@github-actions

Copy link
Copy Markdown

LCOV of commit 6795cfb during Continuous Integration (Python) #1983

Summary coverage rate:
  lines......: 65.4% (3686 of 5632 lines)
  functions..: 30.7% (286 of 932 functions)
  branches...: no data found

Files changed coverage rate:
                                                                                     |Lines       |Functions  |Branches    
  Filename                                                                           |Rate     Num|Rate    Num|Rate     Num
  =========================================================================================================================
  src/lamp_py/ingestion/config_busloc_trip.py                                        |96.7%     30|41.7%    12|    -      0
  src/lamp_py/ingestion/config_busloc_vehicle.py                                     |98.5%     65|40.0%    10|    -      0
  src/lamp_py/ingestion/config_rt_alerts.py                                          |97.9%     47|40.0%    10|    -      0
  src/lamp_py/ingestion/config_rt_trip.py                                            |96.4%     28|41.7%    12|    -      0
  src/lamp_py/ingestion/config_rt_vehicle.py                                         |97.9%     47|40.0%    10|    -      0
  src/lamp_py/ingestion/convert_gtfs_rt.py                                           |92.0%    238|50.0%    26|    -      0
  src/lamp_py/ingestion/gtfs_rt_detail.py                                            |98.1%     54| 8.3%    12|    -      0
  src/lamp_py/ingestion/gtfs_rt_structs.py                                           | 100%      6|    -     0|    -      0
  src/lamp_py/ingestion/utils.py                                                     |65.3%    118|36.4%    22|    -      0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant