Convert a Medium export ZIP into clean Markdown with localized images, optimized for Hugo and compatible with Obsidian knowledge bases.
medium2md is a CLI tool that transforms Medium's HTML export into properly structured Markdown with localized assets. Today, output is optimized for Hugo page bundles and is also readable in Obsidian vaults; planned roadmap work adds stronger Obsidian-specific formatting conventions.
- Why This Exists
- Features
- Installation
- Usage
- Output Structure
- Project Structure
- Development Roadmap
- Contributing
- License
Medium allows you to export your account data as a ZIP archive, but the raw export:
- Contains unstructured HTML
- Includes inconsistent metadata
- References remote image URLs
medium2md solves this by providing:
| Feature | Description |
|---|---|
| HTML → Markdown | Converts Medium HTML posts to clean Markdown |
| Hugo front matter | Generates YAML front matter from post metadata |
| Image localization | Downloads remote images into each bundle; copies local images when present in the export |
| Canonical URL | Preserves the original Medium URL |
| Conversion reports | Summarizes what was converted and what was skipped |
| Incremental re-runs | (planned) Re-run only changed posts |
| Obsidian compatibility | Current output is Obsidian-readable; dedicated Obsidian formatting profile is planned |
This tool is designed to be deterministic, reproducible, and CI-friendly.
Generate correctly formatted Markdown files from Medium posts, with images localized into each post bundle, so the output can be used as a durable personal/team knowledge base in Hugo (or other Markdown-first workflows).
- Convert Medium export ZIP (posts under
posts/in the export) - Extract title and canonical URL; generate slug
- Convert HTML to Markdown
- Create Hugo page bundles with
index.mdand optionalimages/ - Image localization: download remote images into the bundle; copy local images when present in the export
- Basic slug collision handling (
slug-2,slug-3, …) - Terminal progress and summary; per-post image count; prompt to create missing output dir
- Extract date and optional metadata (tags, etc.) into front matter
- Incremental runs via state file
- Embed detection and shortcode conversion (YouTube, Twitter, Gist)
- Pandoc backend option
- Verification command
- Theme-specific front matter mapping
- Conversion report (e.g. JSON/file)
- Obsidian-friendly output profile (e.g., front matter + file layout conventions for vault workflows)
- Front matter currently includes
title,slug,draft, and optionalmedium.canonical; date/tags are not extracted yet. - Embedded content is not converted to Hugo shortcodes yet.
- Incremental conversion/state tracking is not implemented yet.
- Output structure is Hugo-first (
content/posts/<slug>/index.md); a dedicated Obsidian output mode is not implemented yet.
This project uses uv for dependency management.
git clone https://github.com/edgarbc/medium2md.git
cd medium2md
uv syncOnce published to PyPI, install with:
pip install medium2md-cli
# or with uv:
uv tool install medium2md-cliThe CLI command is still medium2md.
Copy your Medium export ZIP into the input/ directory (already set up and git-ignored):
cp ~/Downloads/medium-export.zip input/
uv run medium2md input/medium-export.zip --out ../blog/content/postsNote: The
input/directory is tracked by git (via.gitkeep) so it exists after a fresh clone, but its contents are ignored — your ZIP files will never be accidentally committed.
Each converted post produces an index.md with Hugo-compatible YAML front matter. Current output:
---
title: "My Post Title"
draft: true
slug: "my-post-slug"
medium:
canonical: "https://medium.com/@you/post-slug"
---Additional keys (e.g. date, lastmod, tags) are planned.
Each Medium post becomes a Hugo page bundle. Image links in the Markdown point into the bundle’s images/ folder (remote images are downloaded; local images from the export are copied):
content/posts/
└── my-post-slug/
├── index.md
└── images/
├── 1.png
├── 2.jpg
└── …
medium2md/
├── medium2md/
│ ├── __init__.py
│ ├── cli.py
│ ├── pipeline.py
│ └── main.py
├── pyproject.toml
├── README.md
├── project-plan.md
└── input/
└── medium-export.zip
medium2md follows a layered pipeline:
ZIP → extract → find posts → parse HTML → localize images (copy/download) → Markdown conversion → front matter + Hugo bundle write
Philosophy: Correctness first, cleverness later.
| Milestone | Focus | Status |
|---|---|---|
| 1 — Core conversion | ZIP ingestion, post discovery, HTML→Markdown conversion, Hugo bundle writing, local/remote image localization, slug collision handling | ✅ Implemented |
| 2 — Content fidelity + verification | Better metadata extraction (date, tags), machine-readable conversion report, verify command, clearer failure reporting, Obsidian formatting compatibility review |
📋 Planned |
| 3 — Incremental + extensibility | Incremental state tracking, embed conversion, output-profile mapping (Hugo/Obsidian), optional Pandoc backend, internal link rewriting | 📋 Planned |
- The repository has implemented the core
convertflow end-to-end. - Milestone 2 is the highest-impact next step for knowledge-base quality (
date/tags extraction, verification/reporting, Obsidian compatibility conventions). - Milestone 3 remains optional/polish after fidelity and verification are stable.
Contributions are welcome! To get started:
- Fork the repository
- Create a feature branch (
git checkout -b feat/my-feature) - Make your changes
- Open a pull request (run
uv run medium2md --helpto confirm the CLI works)
- Bump
versioninpyproject.toml. - Build:
uv build(createsdist/). - Install dev deps and upload:
uv sync --extra devthenuv run twine upload dist/*(requires a PyPI API token; use__token__as username). - Optionally tag the release:
git tag v0.1.0 && git push --tags.
This project is licensed under the MIT License.
Built by Edgar Bermudez and GitHub Copilot with 💖 to enable long-term content ownership and reproducible publishing workflows.
Not affiliated with Medium or any of its subsidiaries.