Dead Software Rescue

A digital-preservation project targeting abandoned open-source communications software — encrypted messengers, IM clients, IRC/XMPP tools, and mesh-networking apps that were left to die when their maintainers walked away or their backers shut them down.

38 projects identified, 33 rescued and packaged, 5 confirmed non-recoverable (proprietary, source never released, or vaporware). Total of ~685 MB of source code preserved across reproducible tarball packages.

This repo holds the framework — dataset, rescue scripts, preservation log, per-project exhibits, and master index. The bulk packages and git mirrors (~1.3 GB) are excluded from version control; they are regenerated by re-running the scripts against dataset.json.

Companion repos:

warc-portfolio — web/social acquisition + BagIt packaging
archival-dives — archival research methodology
preservation-store-catalog — fixity + format-ID for a held reference collection

Why this matters

Communications software has the worst long-term survival of any software category. It depends on running infrastructure (servers, networks, peers) and on user populations that move on. When the maintainer quits, the network dies; when the network dies, no one keeps the source mirrored; when GitHub eventually deletes the org or the dev account, the only remaining copies are on the Wayback Machine and SourceForge — often partial, sometimes already gone.

For an investigator, historian, or security researcher trying to understand the history of encrypted messaging, the design tradeoffs of pre-Signal projects, or the actual code behind a documented vulnerability, this material is critical primary evidence. Rescuing it before it disappears is a textbook digital-preservation problem.

What's in `dataset.json`

38 records keyed on project name with 16 fields each:

project_name, type, category, peak_period, death_date,
last_known_maintainer, function, significance, death_reason,
source_url, license, build_dependencies, successor_forks,
preservation_status, tier, preservation_notes

Tiers reflect rescue priority:

Tier	Definition	Examples
1	Confirmed dead, officially discontinued	Cryptocat, TorChat, Tor Messenger, Surespot, Ricochet (Original)
2	Effectively dead, zombie state (last commit >2y, no maintainer)	PyBitmessage, qTox, QKSMS, Kontalk, Let's Chat
3	Abandoned IRC/XMPP clients	Subway, CloudBot, Blackened, Jabble, Jeti, JAJIM, AmoebaChat, others
7	RSS/SourceForge tail discovered late in the pass	see `add_tier7_rss.py`

The INDEX.md table is the human-readable summary; INDEX.json is the machine-readable one used by generate_index.py.

Rescue scripts

Script	Purpose
`preserve.py`	Mirrors all rescuable repos via `git clone --mirror` into `mirrors/`
`package.py`	Wraps mirrors into dated `.tar.gz` packages in `packages/`
`package_github.py`	GitHub-specific packager (uses API for repo metadata)
`package_rss.py`	Wraps RSS-discovered SourceForge tarballs
`pull_rss_sf.py`	Discovers SourceForge projects via category RSS feeds
`sf_direct_pull.py`	Direct SourceForge file-release downloader
`sf_wayback_pull.py`	Falls back to Wayback Machine for delisted/gone SF projects
`add_tier7_rss.py`	Appends Tier-7 RSS-discovered projects to the dataset
`generate_index.py`	Rebuilds `INDEX.md` + `INDEX.json` from `dataset.json` and `packages/`

All paths are currently hardcoded to the author's machine (C:\Users\Tyrone\Documents\dead-software-rescue\). A future pass should parameterize them — opening that as an issue is a good first contribution.

Preservation log

preservation_log.json records every clone attempt: timestamp, repo URL, outcome, error if any, resulting mirror path. sf_*_log.json does the same for the SourceForge passes. This is the audit trail — when a rescued copy is later challenged ("where did this code come from?"), the log is the answer.

Exhibits

exhibits/<project>/ contains the per-project descriptive metadata an archivist would write for a finding aid:

README_rescue.md — provenance, significance, death context
BUILD.md (where applicable) — what it took to make the source compile/run

These are the human-readable counterparts to the dataset records and are the most directly portable artifacts of the project: they explain why each piece of dead software is historically interesting, not just that it exists.

Reproducing the rescue

Roughly:

Read dataset.json and identify Tier 1–3 projects with source_url != null.
Run preserve.py to mirror everything.
Run package.py to produce dated tarballs in packages/.
Run generate_index.py to regenerate INDEX.md and INDEX.json.
For projects without GitHub presence, run the sf_*_pull.py scripts to scrape SourceForge / Wayback.

Expected end state: ~685 MB in packages/, ~660 MB in mirrors/, matched 1:1 against the preservation_status: "Packaged" entries in the dataset.

License & rights

The framework in this repo (dataset, scripts, index generators, exhibits) is CC-BY-4.0.
The rescued software is each project's own license — see dataset.json per record. Most are MIT, GPL-2/3, Apache-2.0, or BSD, which permit redistribution with attribution; a few are unclear and were preserved on a fair-use / scholarly-record basis. Anyone re-publishing the rescued payloads is responsible for honoring the original licenses.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dead Software Rescue

Why this matters

What's in `dataset.json`

Rescue scripts

Preservation log

Exhibits

Reproducing the rescue

License & rights

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
exhibits		exhibits
.gitignore		.gitignore
INDEX.json		INDEX.json
INDEX.md		INDEX.md
README.md		README.md
add_tier7_rss.py		add_tier7_rss.py
dataset.json		dataset.json
generate_index.py		generate_index.py
package.py		package.py
package_github.py		package_github.py
package_rss.py		package_rss.py
preservation_log.json		preservation_log.json
preserve.py		preserve.py
pull_rss_sf.py		pull_rss_sf.py
sf_direct_log.json		sf_direct_log.json
sf_direct_pull.py		sf_direct_pull.py
sf_rss_log.json		sf_rss_log.json
sf_wayback_log.json		sf_wayback_log.json
sf_wayback_pull.py		sf_wayback_pull.py

Folders and files

Latest commit

History

Repository files navigation

Dead Software Rescue

Why this matters

What's in dataset.json

Rescue scripts

Preservation log

Exhibits

Reproducing the rescue

License & rights

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

What's in `dataset.json`

Packages