Skip to content
This repository was archived by the owner on Sep 24, 2024. It is now read-only.
This repository was archived by the owner on Sep 24, 2024. It is now read-only.

Comparing hashes of data to prevent saving the same data (1): hashing data file #7

@mjia8

Description

@mjia8

Team members: Jayesh, Kajoyrie, Jason
Sprint 4: 6/19-6/26

Overall Goal:
When we download data within the archiver, we key and hash the data and then we check within the registry and see if it is different from the hash of the last version of the data.

What does success look like?

  • We first want a function in archiver.js that takes a data file and a registry id, applies a MD5 hash to the data file, searches in the registry for the data assigned to that registry id, finds the data hash (that we will later implement to be stored in the registry too), and compares the two hashes, returning true is they are the same.
  • If the data hash or registry id does not exist in the registry, return false.
  • We will then want to add the data hash as a piece of data stored in the registry too when we archive data.

Comments:

  • Ideally, we will compute the hash when we download the data initially so that we do not have to read the data twice.
  • Hashing: we can use MD5 to hash the data when reading the contents of the file and do it incrementally, instead of the whole thing in the memory.
    • Good resource to look at when starting: archiving the file name in Ethan's archive demo
    • It will be interesting to see if the hash is the same depending on if we read the file as the bytes vs. text.
  • We want to only do this for data files because the about info files would change very frequently without much benefit for us.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions