Skip to content

LPhex9/preservation-store-catalog

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

preservation-store-catalog

A catalog-only index of a personal digital preservation store of 278 reference works (~854 MiB, 20 collections) covering computer architecture, operating systems, exploitation, cryptography, malware analysis, and Unix/Linux history.

The underlying files are not included in this repository. What is published here is the descriptive and technical metadata that a digital archivist would produce for a held collection — exactly the artifacts an inheriting institution, an auditor, or a transfer partner would need to verify a copy without the repository owner having to redistribute potentially copyrighted material.

This is the catalog and fixity counterpart to:

What's in here

catalog/
  fixity/
    manifest-sha256.tsv     # one SHA-256 + size per file, 278 rows
  formats/
    siegfried.csv           # raw Siegfried/DROID format identification output
    format-profile.md       # PUID and MIME summary
  finding-aids/
    summary.json            # machine-readable collection index
    <category>/<collection>.md   # human-readable per-collection finding aids
scripts/
  build_catalog.py          # regenerates everything from the local store

Profile at a glance

  • 20 collections organized into 7 categories: architecture, culture, exploitation, foundations, intelligence, malware, unix-linux
  • 278 files, 854.4 MiB
  • 180 PDFs, 28 JPEGs, 18 plain-text, 12 XML, 6 markdown, plus bzip2/gzip/zip/rar/7z/sqlite/epub/json/png
  • PDF version spread: 1.2 → 1.7, plus PDF/A and PDF/X variants — useful surface for testing normalization workflows

The PUID breakdown is the interesting bit for preservation planning: it tells you immediately which formats have stable long-term support (PDF/A, PNG, plain text, markdown) and which would need migration policies (older PDF versions, proprietary archives).

Why catalog-only?

A preservation portfolio has to demonstrate two things:

  1. You can describe a collection — provenance, scope, file types, extent, fixity.
  2. You respect rights — most of the source material here is copyrighted to its original authors and publishers.

Publishing the catalog satisfies (1) without violating (2). It also mirrors how real preservation institutions handle dark or restricted collections: the finding aid is public, the bits are not.

Reproducing the catalog

python scripts/build_catalog.py \
    --store <path-to-local-store> \
    --out   <path-to-this-repo>

Requirements:

  • Python 3.10+
  • Siegfried 1.x with the DROID signature file installed (path to sf executable is --sf <path>; defaults to a Windows-local install)

The script:

  1. SHA-256s every file in the store and writes catalog/fixity/manifest-sha256.tsv
  2. Runs Siegfried with -csv over the entire tree → catalog/formats/siegfried.csv
  3. Summarizes PUIDs and MIME types → catalog/formats/format-profile.md
  4. Writes one finding aid per collection with file inventory and truncated SHA-256s

Re-running the script after store changes regenerates a fresh catalog, so the manifest stays the single source of truth for fixity.

Standards & conventions

  • Fixity: SHA-256 (BagIt manifest-compatible)
  • Format identification: PRONOM PUIDs via Siegfried 1.11.4 (DROID v122 signature file)
  • Timestamps: ISO 8601 UTC
  • Paths: POSIX-style, relative to the store root, in every artifact
  • OAIS mapping: the catalog corresponds to OAIS Descriptive Information + Fixity Information sub-objects of the AIP; the held bits constitute the Content Information

License

The catalog itself (manifests, finding aids, profile, scripts) is CC0. The underlying preserved works retain their original copyright; this repository makes no claim on them and does not distribute them.

About

Catalog-only index of a 278-file / 854 MiB personal digital preservation store. SHA-256 fixity, Siegfried/PRONOM format identification, per-collection finding aids. Bits stay local, the catalog goes public.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages