Add 2024 agency, tax code and TIF reports#65
Add 2024 agency, tax code and TIF reports#65kyrasturgill wants to merge 19 commits into2024-data-updatefrom
Conversation
Merge branch '2024-data-update' into kyrasturgill/2024-agency-rate-tif-reports # Conflicts: # data-raw/tif/tif.R
|
Thanks for this! Very helpful work. A Few clarifying questions:
I thought that |
|
Here are my thoughts on your questions:
|
That makes sense to me! If I understand the data model correctly , it strikes me that |
…between consolidated agencies and their new parent agencies
|
@jeancochrane, this is ready for your review!
|
jeancochrane
left a comment
There was a problem hiding this comment.
This is getting close to done! I still need to QC the changes to tif.R, but I want to send this feedback over to you first so you can get a head start on it.
| agency_name_short, | ||
| agency_name_original, |
There was a problem hiding this comment.
[Question, non-blocking] Do you think it's a problem that we don't have short names for agencies that changed in 2024?
There was a problem hiding this comment.
Hmmm I don't think I understand, do you mean that we're missing field along lines of agency_name_short_24?
| 0, | ||
| cty_cook_eav | ||
| ), | ||
| across(starts_with("cty_"), ~ replace_na(.x, 0)), |
There was a problem hiding this comment.
[Question, non-blocking] No action necessary right now, but I'm wondering about this choice to coerce these nulls to 0 in the context of 2024 data, where these County fields are always null. If we were to keep them null, we would hew closer to the actual contents of the input data, where these fields are totally missing. However, that would have the downside of requiring users to handle nulls whenever using these columns. I don't have a good enough grasp of the context to be able to make a decision, but I wanted to raise it as a quirk of the data that I noticed while QCing.
There was a problem hiding this comment.
I agree - this was bothering me because it is not technically correct to call those 0. I think my preference would be to not replace those NAs with 0, but happy to discuss more.
Co-authored-by: Jean Cochrane <jeancochrane@users.noreply.github.com>
…ty_overall_eav from final data output
| @@ -342,6 +343,7 @@ agency_2013 <- agency %>% | |||
| agency <- agency %>% | |||
| filter(year != 2013) %>% | |||
| bind_rows(agency_2013) %>% | |||
| select(-cty_overall_eav) %>% | |||
| arrange(year, agency_num) | |||
There was a problem hiding this comment.
@jeancochrane - I added this code to address cty_overall_eav - this field is no longer included in 2024 reports but was previously used to define cty_total_eav which is a field in the source data but for some reason it was chosen to replace it with cty_overall_eav. I believe cty_total_eav from the source data is the correct field to use. The only issue is that the 2013 does not have the cty_total_eav, but it does have cty_overall_eav. Since we already have this code for account for the oddities of 2013, I tweaked it to also carry over cty_overall_eav to replace cty_total_eav. As mentioned above, there are a few agencies where there's a slight discrepancy between the two values - so this isn't perfect but I believe a suitable workaround.
| idx <- agency_legacy_cw$agency_name == "TIF VIL OF OLYMPIA FIELDS-GOV HWY/VOLL" | ||
| agency_legacy_cw[idx, c("agency_num", "agency_num_24")] <- "030930502" |
There was a problem hiding this comment.
The clerk's report erroneously gives this TIF the wrong agency number - the same as TIF VIL OF OLYMPIA FIELDS-GOV HWY/VOLL II - which is why there were duplicate rows generated in agency_info.
This PR adds the raw data for 2024 tax code rate, agency rate and TIF reports. It also makes necessary adjustments to the data-raw ingest scripts that are necessary on account of changes made to 2024 data structure.
Most changes to the ingest scripts involve slight adjustments to the code for renaming fields. This code was originally written to work assuming the same fields being selected from the report files exist across all years (which are contained in separate files). The introduction of new fields in the 2024 files led to errors within the
rename_with()function when it was returning a vector with length of 1 rather than 0, even when the field did not exist. The work around for this was usingrep()withrename_with():rename_with(~rep("agency_name", length(.x)), any_of(c("authority_name")))which would return empty vector if there were no fields present namedauthority_name.Other changes were removing certain fields no longer present in 2024 reports, and the creation of the field
fund_type_numto account for 2024 changes to fund number structure which are now 6 digits rather than 3.fund_type_numis the first 3 digits offund_num, which should be consistent across all years.fund_numpre-2024 is also now padded with trailing zeros. Because 2024 is now reporting funds at a more detailed level than in prior years, any time trend analysis of funds should usefund_type_num.Something I have not added to the data yet is
agency_num_legacyorauthority_num. I realized that we would need to alter prior years'agency_numandagency_nameto align with the revised 2024agency_num. To avoid altering source data, we could simply have an agency crosswalk added to the db that connects the new agency number, the legacy number and authority number, which would be available for user if they did want to do an analysis of agency extensions or agency rates over time.Lastly, this PR also brings in a new TIF data source - the
pin_tif_distributionwhich is derived from the Clerk's TIF PIN list report.