Skip to content

LPhex9/dead-software-rescue

Repository files navigation

Dead Software Rescue

A digital-preservation project targeting abandoned open-source communications software — encrypted messengers, IM clients, IRC/XMPP tools, and mesh-networking apps that were left to die when their maintainers walked away or their backers shut them down.

38 projects identified, 33 rescued and packaged, 5 confirmed non-recoverable (proprietary, source never released, or vaporware). Total of ~685 MB of source code preserved across reproducible tarball packages.

This repo holds the framework — dataset, rescue scripts, preservation log, per-project exhibits, and master index. The bulk packages and git mirrors (~1.3 GB) are excluded from version control; they are regenerated by re-running the scripts against dataset.json.

Companion repos:

Why this matters

Communications software has the worst long-term survival of any software category. It depends on running infrastructure (servers, networks, peers) and on user populations that move on. When the maintainer quits, the network dies; when the network dies, no one keeps the source mirrored; when GitHub eventually deletes the org or the dev account, the only remaining copies are on the Wayback Machine and SourceForge — often partial, sometimes already gone.

For an investigator, historian, or security researcher trying to understand the history of encrypted messaging, the design tradeoffs of pre-Signal projects, or the actual code behind a documented vulnerability, this material is critical primary evidence. Rescuing it before it disappears is a textbook digital-preservation problem.

What's in dataset.json

38 records keyed on project name with 16 fields each:

project_name, type, category, peak_period, death_date,
last_known_maintainer, function, significance, death_reason,
source_url, license, build_dependencies, successor_forks,
preservation_status, tier, preservation_notes

Tiers reflect rescue priority:

Tier Definition Examples
1 Confirmed dead, officially discontinued Cryptocat, TorChat, Tor Messenger, Surespot, Ricochet (Original)
2 Effectively dead, zombie state (last commit >2y, no maintainer) PyBitmessage, qTox, QKSMS, Kontalk, Let's Chat
3 Abandoned IRC/XMPP clients Subway, CloudBot, Blackened, Jabble, Jeti, JAJIM, AmoebaChat, others
7 RSS/SourceForge tail discovered late in the pass see add_tier7_rss.py

The INDEX.md table is the human-readable summary; INDEX.json is the machine-readable one used by generate_index.py.

Rescue scripts

Script Purpose
preserve.py Mirrors all rescuable repos via git clone --mirror into mirrors/
package.py Wraps mirrors into dated .tar.gz packages in packages/
package_github.py GitHub-specific packager (uses API for repo metadata)
package_rss.py Wraps RSS-discovered SourceForge tarballs
pull_rss_sf.py Discovers SourceForge projects via category RSS feeds
sf_direct_pull.py Direct SourceForge file-release downloader
sf_wayback_pull.py Falls back to Wayback Machine for delisted/gone SF projects
add_tier7_rss.py Appends Tier-7 RSS-discovered projects to the dataset
generate_index.py Rebuilds INDEX.md + INDEX.json from dataset.json and packages/

All paths are currently hardcoded to the author's machine (C:\Users\Tyrone\Documents\dead-software-rescue\). A future pass should parameterize them — opening that as an issue is a good first contribution.

Preservation log

preservation_log.json records every clone attempt: timestamp, repo URL, outcome, error if any, resulting mirror path. sf_*_log.json does the same for the SourceForge passes. This is the audit trail — when a rescued copy is later challenged ("where did this code come from?"), the log is the answer.

Exhibits

exhibits/<project>/ contains the per-project descriptive metadata an archivist would write for a finding aid:

  • README_rescue.md — provenance, significance, death context
  • BUILD.md (where applicable) — what it took to make the source compile/run

These are the human-readable counterparts to the dataset records and are the most directly portable artifacts of the project: they explain why each piece of dead software is historically interesting, not just that it exists.

Reproducing the rescue

Roughly:

  1. Read dataset.json and identify Tier 1–3 projects with source_url != null.
  2. Run preserve.py to mirror everything.
  3. Run package.py to produce dated tarballs in packages/.
  4. Run generate_index.py to regenerate INDEX.md and INDEX.json.
  5. For projects without GitHub presence, run the sf_*_pull.py scripts to scrape SourceForge / Wayback.

Expected end state: ~685 MB in packages/, ~660 MB in mirrors/, matched 1:1 against the preservation_status: "Packaged" entries in the dataset.

License & rights

  • The framework in this repo (dataset, scripts, index generators, exhibits) is CC-BY-4.0.
  • The rescued software is each project's own license — see dataset.json per record. Most are MIT, GPL-2/3, Apache-2.0, or BSD, which permit redistribution with attribution; a few are unclear and were preserved on a fair-use / scholarly-record basis. Anyone re-publishing the rescued payloads is responsible for honoring the original licenses.

About

Preserving abandoned open-source communications software. 38 projects tracked, 33 rescued. Dataset, rescue scripts, preservation log, per-project exhibits. Bulk payloads excluded — framework is reproducible from dataset.json.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages