Structured archival research dossiers produced using a repeatable methodology against the Internet Archive (and adjacent platforms). Each dive treats a topic the way an archivist would treat an incoming collection: discovery, appraisal, description, evaluation, triangulation, and a preservation plan.
This repo is the research counterpart to warc-portfolio (the acquisition + packaging side).
| Topic | Sources cataloged | Status |
|---|---|---|
| WW2 German cipher machines | 25 of 42 candidates | complete |
| Vintage electronics schematics | top items from broad sweep | complete |
| Archiving & setting up archival systems | 15 of ~340 candidates | complete |
| Job market & interests 2026 | scoped survey | complete |
Every dossier follows the same 10-file structure:
| File | Purpose |
|---|---|
00-index.md |
Scope, methodology, key findings, file guide |
01-discovery-ledger.md |
Every search query and raw candidate hit |
02-item-metadata.md |
Structured metadata records for selected items |
03-source-evaluation.md |
Provenance, authority, completeness, rights |
04-layer-a-observational.md |
Factual content extracted from sources |
05-layer-b-interpretive.md |
Analysis, context, historiographic critique |
06-triangulation-matrix.md |
Cross-source verification of key claims |
07-related-collections.md |
Adjacent collections, finding aids, Open Library |
08-questions-to-investigate.md |
Follow-up agenda |
09-preservation-notes.md |
Download commands, format notes, rights |
The two-layer (observational / interpretive) split is borrowed from intelligence-analysis tradecraft and adapted for archival appraisal — it keeps raw evidence separate from synthesis so the chain of reasoning stays auditable.
- Dates: ISO 8601 (YYYY-MM-DD)
- Source discovery: Internet Archive Advanced Search API (Lucene field queries, sorted by downloads as a popularity proxy),
/metadata/<identifier>for record retrieval - Source ranking: weighted on provenance, institutional authority, completeness, primary-vs-secondary status, and rights clarity
- Linking: Obsidian-style
[[wikilinks]]between dossier files
Digital preservation isn't only about bytes — it's about appraisal (deciding what's worth keeping) and description (making it findable later). These dossiers exercise that side of the discipline. Pair them with warc-portfolio for the packaging-and-fixity side.
CC-BY-4.0 for the original research and writing. Underlying sources retain their original rights — see each dossier's 03-source-evaluation.md and 09-preservation-notes.md.