Skip to content

Data file 'exists' check needs to be updated as duplicates still being saved to json_files #50

@stuchalk

Description

@stuchalk

Currently, if the same data file is ingested a second time there are situations where the 'exists' check fails because the file being ingested is compared to only the most recent file (df_functions.py, def:updatedatafile, lines 81-89 of v0.2.1).

Therefore, the code needs to be updated to check the new data file against all versions that have been ingested. This should be done using the new 'jhash' field already added to the 'json_files' table, where an md5 hash of the 'file' field is stored. Although the 'jhash' field might well be unique across the table, using the 'file_lookup_id' and the 'jhash' to search the table would verify if the file had already been uploaded.

Note: in code the current 'generatedAt' field must be emptied (set to '') before the md5 hash generation.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions