Fix Unix timestamp normalisation#558
Conversation
Known limitations of Unix timestamp normalisationThis ticket improves handling of Unix timestamps in How timestamp length detection worksThe
Unhandled rangesThe following date ranges cannot currently be expressed as Unix timestamps and have them normalised correctly:
The millisecond gap is negligible. The seconds gap is more notable but as discussed below, positive values in that range cannot be safely extended without risking collisions with other date patterns. Why we haven't extended below 9 digitsExtending positive timestamp handling below 9 digits risks collisions with other date patterns:
QueryThe seconds gap (1966-10-31 to 1973-03-03) is the most potentially significant, as this window overlaps with plausible listed building designation dates from the early statutory lists compiled from 1947 onwards. Do we have evidence of timestamps in this range appearing in source data? If so, we should consider:
|
What type of PR is this? (check all applicable)
Description
The
DateDataType.normalise()method handles Unix timestamps via a%spattern branch. Over time, support has been extended to handle timestamps of varying digit lengths, with 12- and 13-digit values treated as milliseconds (divided by 1000) and 9- and 10-digit values treated as seconds.A previous change extended this to include 11-digit millisecond timestamps, which appear in listed building designation data sourced from Historic England.
Two related issues have been found with the current logic:
13392000000and-49507200000have 11 digits and represent dates in the late 1960s and early 1970s when treated as milliseconds. These were previously rejected as invalid dates because11was not included in the permitted digit-length tuple.-6048000000has a 10-digit absolute value and was being processed as seconds, placing it in1778and triggering afar-past-dateissue. It should be treated as milliseconds, giving a date of1969-11-22.The root cause of the second issue is that the original length-based division logic assumes positive timestamps, where a 10-digit value in seconds lands in a plausible range (2001-2286). For negative values, a 10-digit second timestamp goes far into the past, outside any realistic designation date range.
Fix
Two changes to the
%sbranch inDateDataType.normalise():11to the tuple of digit lengths that are divided by 1000 before conversion.This approach avoids ambiguity - there is no realistic overlap between a plausible second timestamp and a plausible millisecond timestamp across any digit length within the 1800-2100 window.
Related Tickets & Documents
QA Instructions, Screenshots, Recordings
Please replace this line with instructions on how to test your changes, a note
on the devices and browsers this has been tested on, as well as any relevant
images for UI changes.
Added/updated tests?
We encourage you to keep the code coverage percentage at 80% and above. Please refer to the Digital Land Testing Guidance for more information.
have not been included
[optional] Are there any post deployment tasks we need to perform?
[optional] Are there any dependencies on other PRs or Work?