fix: disambiguate two distinct "Jake Anderson" lifters in US data#516
fix: disambiguate two distinct "Jake Anderson" lifters in US data#516jakemanderson wants to merge 1 commit into
Conversation
The lifter 'Jake Anderson' currently merges two distinct people: a Minnesota high-school-era lifter (2013-12 to 2015-03, 7 meets, body weight 96-108 kg) and a separate +109 kg lifter (2022-04 onward, 12 meets, body weight 122-137 kg). This renames the older lifter's 7 historical entries to 'Jake Anderson euanwm#1', following the OpenPowerlifting convention. The newer lifter's records are unchanged. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Excellent! Yes I started writing up something since I do lots of name matching and disambiguation for my research, but I didn't want to overstep and figured you had something in progress already after I saw how many lifters have unrealistic body weight variance when I plotted the time series. YOB will probbaly get 99%+. |
OPL do a lot of manual data correction which is something I simply don't have time to do at scale. I'm slowly going through ways of collating data at the right time without manual correction. Happy to hear any recommendations you have as well. The data layer is staying opensource, it's only the main UI that is closed source so there isn't an additional attack vector on the infosec side. |
Summary
The "Jake Anderson" record in US event data conflates two distinct lifters:
7-year gap, ~25 kg body-weight delta, no overlap. Almost certainly different people.
This PR renames the older lifter's 7 historical entries to
Jake Anderson #1(OpenPowerlifting convention). The newer lifter's records are unchanged.Notes
make check_dbpasses (run viapython3 scripts/check_db.py).Test plan
python3 scripts/check_db.pyreportsTEST PASSEDJake Anderson #1🤖 Generated with Claude Code