A digital-preservation project targeting abandoned open-source communications software — encrypted messengers, IM clients, IRC/XMPP tools, and mesh-networking apps that were left to die when their maintainers walked away or their backers shut them down.
38 projects identified, 33 rescued and packaged, 5 confirmed non-recoverable (proprietary, source never released, or vaporware). Total of ~685 MB of source code preserved across reproducible tarball packages.
This repo holds the framework — dataset, rescue scripts, preservation log, per-project exhibits, and master index. The bulk packages and git mirrors (~1.3 GB) are excluded from version control; they are regenerated by re-running the scripts against dataset.json.
Companion repos:
warc-portfolio— web/social acquisition + BagIt packagingarchival-dives— archival research methodologypreservation-store-catalog— fixity + format-ID for a held reference collection
Communications software has the worst long-term survival of any software category. It depends on running infrastructure (servers, networks, peers) and on user populations that move on. When the maintainer quits, the network dies; when the network dies, no one keeps the source mirrored; when GitHub eventually deletes the org or the dev account, the only remaining copies are on the Wayback Machine and SourceForge — often partial, sometimes already gone.
For an investigator, historian, or security researcher trying to understand the history of encrypted messaging, the design tradeoffs of pre-Signal projects, or the actual code behind a documented vulnerability, this material is critical primary evidence. Rescuing it before it disappears is a textbook digital-preservation problem.
38 records keyed on project name with 16 fields each:
project_name, type, category, peak_period, death_date,
last_known_maintainer, function, significance, death_reason,
source_url, license, build_dependencies, successor_forks,
preservation_status, tier, preservation_notes
Tiers reflect rescue priority:
| Tier | Definition | Examples |
|---|---|---|
| 1 | Confirmed dead, officially discontinued | Cryptocat, TorChat, Tor Messenger, Surespot, Ricochet (Original) |
| 2 | Effectively dead, zombie state (last commit >2y, no maintainer) | PyBitmessage, qTox, QKSMS, Kontalk, Let's Chat |
| 3 | Abandoned IRC/XMPP clients | Subway, CloudBot, Blackened, Jabble, Jeti, JAJIM, AmoebaChat, others |
| 7 | RSS/SourceForge tail discovered late in the pass | see add_tier7_rss.py |
The INDEX.md table is the human-readable summary; INDEX.json is the machine-readable one used by generate_index.py.
| Script | Purpose |
|---|---|
preserve.py |
Mirrors all rescuable repos via git clone --mirror into mirrors/ |
package.py |
Wraps mirrors into dated .tar.gz packages in packages/ |
package_github.py |
GitHub-specific packager (uses API for repo metadata) |
package_rss.py |
Wraps RSS-discovered SourceForge tarballs |
pull_rss_sf.py |
Discovers SourceForge projects via category RSS feeds |
sf_direct_pull.py |
Direct SourceForge file-release downloader |
sf_wayback_pull.py |
Falls back to Wayback Machine for delisted/gone SF projects |
add_tier7_rss.py |
Appends Tier-7 RSS-discovered projects to the dataset |
generate_index.py |
Rebuilds INDEX.md + INDEX.json from dataset.json and packages/ |
All paths are currently hardcoded to the author's machine (C:\Users\Tyrone\Documents\dead-software-rescue\). A future pass should parameterize them — opening that as an issue is a good first contribution.
preservation_log.json records every clone attempt: timestamp, repo URL, outcome, error if any, resulting mirror path. sf_*_log.json does the same for the SourceForge passes. This is the audit trail — when a rescued copy is later challenged ("where did this code come from?"), the log is the answer.
exhibits/<project>/ contains the per-project descriptive metadata an archivist would write for a finding aid:
README_rescue.md— provenance, significance, death contextBUILD.md(where applicable) — what it took to make the source compile/run
These are the human-readable counterparts to the dataset records and are the most directly portable artifacts of the project: they explain why each piece of dead software is historically interesting, not just that it exists.
Roughly:
- Read
dataset.jsonand identify Tier 1–3 projects withsource_url != null. - Run
preserve.pyto mirror everything. - Run
package.pyto produce dated tarballs inpackages/. - Run
generate_index.pyto regenerateINDEX.mdandINDEX.json. - For projects without GitHub presence, run the
sf_*_pull.pyscripts to scrape SourceForge / Wayback.
Expected end state: ~685 MB in packages/, ~660 MB in mirrors/, matched 1:1 against the preservation_status: "Packaged" entries in the dataset.
- The framework in this repo (dataset, scripts, index generators, exhibits) is CC-BY-4.0.
- The rescued software is each project's own license — see
dataset.jsonper record. Most are MIT, GPL-2/3, Apache-2.0, or BSD, which permit redistribution with attribution; a few are unclear and were preserved on a fair-use / scholarly-record basis. Anyone re-publishing the rescued payloads is responsible for honoring the original licenses.