Skip to content

Hh backfill adhoc new#755

Draft
huangh wants to merge 16 commits into
mainfrom
hh-backfill-adhoc-new
Draft

Hh backfill adhoc new#755
huangh wants to merge 16 commits into
mainfrom
hh-backfill-adhoc-new

Conversation

@huangh

@huangh huangh commented May 26, 2026

Copy link
Copy Markdown
Collaborator

Asana Task: <asana_ticket_url>

What changes does this PR propose?

Describe the changes, details you want to call attention to, and any context you want to add.

How were these changes validated?

  1. New tests?
  2. Used a runner?
  3. Doesn't need validation? Maybe it's just documentation.

What questions should reviewers consider?

  1. Focus on this thing, even though it seems minor!

@huangh huangh force-pushed the hh-backfill-adhoc-new branch from 8a49f67 to 8f628ac Compare May 26, 2026 19:25
@github-actions

Copy link
Copy Markdown

LCOV of commit 6649de6 during Continuous Integration (Python) #1951

Summary coverage rate:
  lines......: 65.0% (3629 of 5580 lines)
  functions..: 29.9% (275 of 920 functions)
  branches...: no data found

Files changed coverage rate:
                                                                                     |Lines       |Functions  |Branches    
  Filename                                                                           |Rate     Num|Rate    Num|Rate     Num
  =========================================================================================================================
  src/lamp_py/ad_hoc/backfill_runner_terminal_predictions.py                         | 0.0%     18|    -     0|    -      0
  src/lamp_py/ad_hoc/pipeline.py                                                     | 0.0%     14| 0.0%     2|    -      0
  src/lamp_py/aws/s3.py                                                              |52.2%    320|23.8%    42|    -      0
  src/lamp_py/ingestion/backfill/delta_reingestion.py                                |60.7%     61|25.0%     4|    -      0
  src/lamp_py/ingestion/compress_gtfs/gtfs_schema_map.py                             |97.7%     44|50.0%     4|    -      0
  src/lamp_py/ingestion/compress_gtfs/gtfs_to_parquet.py                             |69.0%    100|40.0%    10|    -      0
  src/lamp_py/ingestion/compress_gtfs/pq_to_sqlite.py                                |87.8%     49|50.0%     6|    -      0
  src/lamp_py/ingestion/compress_gtfs/schedule_details.py                            |81.2%     96|50.0%     8|    -      0
  src/lamp_py/ingestion/config_busloc_trip.py                                        |82.4%     17|12.5%     8|    -      0
  src/lamp_py/ingestion/config_rt_trip.py                                            |82.4%     17|12.5%     8|    -      0
  src/lamp_py/ingestion/convert_gtfs.py                                              |79.2%     48|37.5%     8|    -      0
  src/lamp_py/ingestion/convert_gtfs_rt.py                                           |88.6%    237|50.0%    26|    -      0
  src/lamp_py/ingestion/convert_gtfs_rt_fullset.py                                   |45.6%    114|16.7%    18|    -      0
  src/lamp_py/ingestion/converter.py                                                 |95.9%     49|35.0%    20|    -      0
  src/lamp_py/ingestion/daily/config.py                                              | 100%      2|    -     0|    -      0
  src/lamp_py/ingestion/daily/trip_updates.py                                        |75.0%     12|25.0%     4|    -      0
  src/lamp_py/ingestion/glides.py                                                    |95.6%    182|45.5%    44|    -      0
  src/lamp_py/ingestion/gtfs_rt_detail.py                                            |94.1%     17|12.5%     8|    -      0
  src/lamp_py/ingestion/ingest_gtfs.py                                               | 0.0%     61| 0.0%    10|    -      0
  src/lamp_py/ingestion/pipeline.py                                                  | 0.0%     38| 0.0%     4|    -      0
  src/lamp_py/ingestion/utils.py                                                     |63.1%    111|35.0%    20|    -      0

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ignore

@github-actions

Copy link
Copy Markdown

LCOV of commit 36c6804 during Continuous Integration (Python) #1952

Summary coverage rate:
  lines......: 65.2% (3637 of 5580 lines)
  functions..: 29.9% (275 of 920 functions)
  branches...: no data found

Files changed coverage rate:
                                                                                     |Lines       |Functions  |Branches    
  Filename                                                                           |Rate     Num|Rate    Num|Rate     Num
  =========================================================================================================================
  src/lamp_py/ad_hoc/backfill_runner_terminal_predictions.py                         | 0.0%     18|    -     0|    -      0
  src/lamp_py/ad_hoc/pipeline.py                                                     | 0.0%     14| 0.0%     2|    -      0
  src/lamp_py/aws/s3.py                                                              |52.2%    320|23.8%    42|    -      0
  src/lamp_py/ingestion/backfill/delta_reingestion.py                                |60.7%     61|25.0%     4|    -      0
  src/lamp_py/ingestion/compress_gtfs/gtfs_schema_map.py                             |97.7%     44|50.0%     4|    -      0
  src/lamp_py/ingestion/compress_gtfs/gtfs_to_parquet.py                             |77.0%    100|40.0%    10|    -      0
  src/lamp_py/ingestion/compress_gtfs/pq_to_sqlite.py                                |87.8%     49|50.0%     6|    -      0
  src/lamp_py/ingestion/compress_gtfs/schedule_details.py                            |81.2%     96|50.0%     8|    -      0
  src/lamp_py/ingestion/config_busloc_trip.py                                        |82.4%     17|12.5%     8|    -      0
  src/lamp_py/ingestion/config_rt_trip.py                                            |82.4%     17|12.5%     8|    -      0
  src/lamp_py/ingestion/convert_gtfs.py                                              |79.2%     48|37.5%     8|    -      0
  src/lamp_py/ingestion/convert_gtfs_rt.py                                           |88.6%    237|50.0%    26|    -      0
  src/lamp_py/ingestion/convert_gtfs_rt_fullset.py                                   |45.6%    114|16.7%    18|    -      0
  src/lamp_py/ingestion/converter.py                                                 |95.9%     49|35.0%    20|    -      0
  src/lamp_py/ingestion/daily/config.py                                              | 100%      2|    -     0|    -      0
  src/lamp_py/ingestion/daily/trip_updates.py                                        |75.0%     12|25.0%     4|    -      0
  src/lamp_py/ingestion/glides.py                                                    |95.6%    182|45.5%    44|    -      0
  src/lamp_py/ingestion/gtfs_rt_detail.py                                            |94.1%     17|12.5%     8|    -      0
  src/lamp_py/ingestion/ingest_gtfs.py                                               | 0.0%     61| 0.0%    10|    -      0
  src/lamp_py/ingestion/pipeline.py                                                  | 0.0%     38| 0.0%     4|    -      0
  src/lamp_py/ingestion/utils.py                                                     |63.1%    111|35.0%    20|    -      0

self,
config_type: ConfigType,
metadata_queue: Queue[Optional[str]],
max_workers: int = 8,

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max worker 8 matches cpu count now. can lower this back if we decide to shrink ingestion ecs back


try:
files = file_list_from_s3(bucket_name=S3_INCOMING, file_prefix=bucket_filter, max_list_size=10000)
files = file_list_from_s3(bucket_name=S3_INCOMING, file_prefix=bucket_filter, max_list_size=50000)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grab more so we get >15 minutes of data each time. processing still remains under memory threshold with about 50% margin

@github-actions

Copy link
Copy Markdown

LCOV of commit 9027712 during Continuous Integration (Python) #1954

Summary coverage rate:
  lines......: 65.2% (3637 of 5580 lines)
  functions..: 29.9% (275 of 920 functions)
  branches...: no data found

Files changed coverage rate:
                                                                                     |Lines       |Functions  |Branches    
  Filename                                                                           |Rate     Num|Rate    Num|Rate     Num
  =========================================================================================================================
  src/lamp_py/ad_hoc/backfill_runner_terminal_predictions.py                         | 0.0%     18|    -     0|    -      0
  src/lamp_py/ingestion/backfill/delta_reingestion.py                                |60.7%     61|25.0%     4|    -      0
  src/lamp_py/ingestion/compress_gtfs/gtfs_schema_map.py                             |97.7%     44|50.0%     4|    -      0
  src/lamp_py/ingestion/compress_gtfs/gtfs_to_parquet.py                             |77.0%    100|40.0%    10|    -      0
  src/lamp_py/ingestion/compress_gtfs/pq_to_sqlite.py                                |87.8%     49|50.0%     6|    -      0
  src/lamp_py/ingestion/compress_gtfs/schedule_details.py                            |81.2%     96|50.0%     8|    -      0
  src/lamp_py/ingestion/config_busloc_trip.py                                        |82.4%     17|12.5%     8|    -      0
  src/lamp_py/ingestion/config_rt_trip.py                                            |82.4%     17|12.5%     8|    -      0
  src/lamp_py/ingestion/convert_gtfs.py                                              |79.2%     48|37.5%     8|    -      0
  src/lamp_py/ingestion/convert_gtfs_rt.py                                           |88.6%    237|50.0%    26|    -      0
  src/lamp_py/ingestion/convert_gtfs_rt_fullset.py                                   |45.6%    114|16.7%    18|    -      0
  src/lamp_py/ingestion/converter.py                                                 |95.9%     49|35.0%    20|    -      0
  src/lamp_py/ingestion/daily/config.py                                              | 100%      2|    -     0|    -      0
  src/lamp_py/ingestion/daily/trip_updates.py                                        |75.0%     12|25.0%     4|    -      0
  src/lamp_py/ingestion/glides.py                                                    |95.6%    182|45.5%    44|    -      0
  src/lamp_py/ingestion/gtfs_rt_detail.py                                            |94.1%     17|12.5%     8|    -      0
  src/lamp_py/ingestion/ingest_gtfs.py                                               | 0.0%     61| 0.0%    10|    -      0
  src/lamp_py/ingestion/pipeline.py                                                  | 0.0%     38| 0.0%     4|    -      0
  src/lamp_py/ingestion/utils.py                                                     |63.1%    111|35.0%    20|    -      0

@github-actions

Copy link
Copy Markdown

LCOV of commit 84da10e during Continuous Integration (Python) #1961

Summary coverage rate:
  lines......: 65.2% (3637 of 5582 lines)
  functions..: 29.9% (275 of 920 functions)
  branches...: no data found

Files changed coverage rate:
                                                                                     |Lines       |Functions  |Branches    
  Filename                                                                           |Rate     Num|Rate    Num|Rate     Num
  =========================================================================================================================
  src/lamp_py/ad_hoc/backfill_runner_terminal_predictions.py                         | 0.0%     18|    -     0|    -      0
  src/lamp_py/ingestion/backfill/delta_reingestion.py                                |60.7%     61|25.0%     4|    -      0
  src/lamp_py/ingestion/compress_gtfs/gtfs_schema_map.py                             |97.7%     44|50.0%     4|    -      0
  src/lamp_py/ingestion/compress_gtfs/gtfs_to_parquet.py                             |77.0%    100|40.0%    10|    -      0
  src/lamp_py/ingestion/compress_gtfs/pq_to_sqlite.py                                |87.8%     49|50.0%     6|    -      0
  src/lamp_py/ingestion/compress_gtfs/schedule_details.py                            |81.2%     96|50.0%     8|    -      0
  src/lamp_py/ingestion/config_busloc_trip.py                                        |82.4%     17|12.5%     8|    -      0
  src/lamp_py/ingestion/config_rt_trip.py                                            |82.4%     17|12.5%     8|    -      0
  src/lamp_py/ingestion/convert_gtfs.py                                              |79.2%     48|37.5%     8|    -      0
  src/lamp_py/ingestion/convert_gtfs_rt.py                                           |88.6%    237|50.0%    26|    -      0
  src/lamp_py/ingestion/convert_gtfs_rt_fullset.py                                   |45.6%    114|16.7%    18|    -      0
  src/lamp_py/ingestion/converter.py                                                 |95.9%     49|35.0%    20|    -      0
  src/lamp_py/ingestion/daily/config.py                                              | 100%      2|    -     0|    -      0
  src/lamp_py/ingestion/daily/trip_updates.py                                        |75.0%     12|25.0%     4|    -      0
  src/lamp_py/ingestion/glides.py                                                    |95.6%    182|45.5%    44|    -      0
  src/lamp_py/ingestion/gtfs_rt_detail.py                                            |94.1%     17|12.5%     8|    -      0
  src/lamp_py/ingestion/ingest_gtfs.py                                               | 0.0%     63| 0.0%    10|    -      0
  src/lamp_py/ingestion/pipeline.py                                                  | 0.0%     38| 0.0%     4|    -      0
  src/lamp_py/ingestion/utils.py                                                     |63.1%    111|35.0%    20|    -      0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant