cloudy-liu · cloudy-liu · May 20, 2026 · May 20, 2026 · May 20, 2026
diff --git a/.gitignore b/.gitignore
@@ -20,6 +20,7 @@ logs
 debug
 design
 docs
+research/
 samples/**/output
 
 # Local env files

diff --git a/README.md b/README.md
@@ -2,107 +2,77 @@
   <img src="./assets/logo.svg" alt="Logo">
 </p>
 
-# wechat-article-exporter
+# wechat-article-exporter fork
 
-![GitHub stars]
-![GitHub forks]
-![GitHub License]
-![Package Version]
+这是 `cloudy-liu` 维护的 fork，来源仓库是
+[`wechat-article/wechat-article-exporter`](https://github.com/wechat-article/wechat-article-exporter)。
 
+原仓库完整说明请看：
+[`wechat-article/wechat-article-exporter#readme`](https://github.com/wechat-article/wechat-article-exporter/blob/master/README.md)。
 
-一款在线的 **微信公众号文章批量下载** 工具，支持导出阅读量与评论数据，无需搭建任何环境，可通过 [在线网站] 使用，同时也支持 docker 私有化部署和 Cloudflare 部署。
+这个 fork 不重复搬运原仓库的大段介绍，重点记录我们在本地部署、批量归档、Markdown 导出和自动化采集场景中做过的优化。当前工作区的开发策略是：同步参考 `upstream`，但分支、提交、推送和 PR 都只走我们自己的 fork，也就是 `origin`。
 
-支持下载各种文件格式，其中 HTML 格式可100%还原文章排版与样式。
+## 我们优化了什么
 
-交流群(QQ):
-- `991482155` 1群已满，不再加人
-- `775909845` 2群
+### 本地部署默认走 Nitro 下载代理
 
-## :bell: 重要告知：项目域名调整
-项目域名调整如下：
+问题背景是，本地部署后浏览器端下载文章内容和资源时，部分场景仍会沿用项目内置的公开代理列表。一旦公开代理节点失效，用户虽然已经把应用跑在本地，文章内容和资源下载仍可能失败。更麻烦的是，旧版本可能已经把这些公开代理地址持久化进浏览器本地存储，例如 `preferences.privateProxyList` 或 legacy `wechat-proxy`，导致升级后仍然误判为“用户配置了私有代理”，从而继续绕过本地代理。
 
-|     | 下载站                            | 文档站                        |
-|-----|--------------------------------|----------------------------|
-| 调整后 | https://down.mptext.top        | https://docs.mptext.top    |
-| 调整前 | https://exporter.wxdown.online | https://docs.wxdown.online |
+我们在 fork 中把默认策略改为：没有真实私有代理时，浏览器下载默认使用同源 Nitro 代理。实现上会识别历史内置公开代理地址，并把它们当作旧默认值处理；如果代理列表中混有内置公开代理和用户自定义代理，则过滤掉内置项，只保留用户真正配置的私有代理。
 
-具体细节可以查看 [这里](https://docs.mptext.top/misc/domain.html)。
+这样本地 `yarn dev`、Docker 私有化部署和 Web 部署都不再默认依赖公共代理池，同时不破坏已有的自建代理配置。
 
+### Markdown 导出结果更适合归档
 
-## :books: 如何使用？
+原始微信公众号 HTML 里混有大量运行时内容：样式块、内联 CSS、底部操作栏、SVG/data URL 图标、头像、脚本和只服务网页渲染的节点。如果直接丢给 Markdown 转换器，生成的 `.md` 很容易在开头带一大段 CSS，正文夹杂无关 UI 文本，表格和正文结构也不稳定。
 
-该工具的使用教程已移至 [文档站点](https://docs.mptext.top)。
+当前 fork 的处理思路是尽量避开完整运行时页面，优先解析 `window.cgiDataNew`，再用统一 renderer 还原文章正文结构，最后交给 Turndown 转 Markdown。已有的单篇 Markdown 修复还会在转换前清理 `<style>`、`<script>`、`<link>`、底部栏、图标、头像、空节点和样式属性，并补上评论渲染。
 
+目标不是复刻网页视觉效果，而是得到更干净的长期归档内容，方便后续进入脚本、笔记、全文搜索、知识库或其他自动化流程。
 
-## :dart: 特性
+### 合集批量下载支持 Markdown / HTML 格式选择
 
-- [x] 搜索公众号，支持关键字搜索
-- [x] 支持导出 html/json/excel/txt/md/docx 格式(html 格式打包了图片和样式文件，能够保证100%还原文章样式)
-- [x] 缓存文章列表数据，减少接口请求次数
-- [x] 支持文章过滤，包括作者、标题、发布时间、原创标识、所属合集等
-- [x] 支持合集下载
-- [x] 支持图片分享消息
-- [x] 支持视频分享消息
-- [x] 支持导出评论、评论回复、阅读量、转发量等数据 (需要抓包获取 credentials 信息，[查看操作步骤](https://docs.mptext.top/advanced/wxdown-service.html))
-- [x] 支持 Docker 部署
-- [x] 支持 Cloudflare 部署
-- [x] 开放 API 接口
+原仓库的合集页批量下载主要按 HTML 离线归档来处理，这对完整保留页面样式很有用，但对文本归档、批量阅读、AI/知识库导入和自动化采集来说偏重。另一个实际问题是，合集文章导出时需要稳定、可排序、带发布日期的文件名。
 
+这个 fork 在合集详情页增加了导出格式选择：
 
-## :heart: 感谢
+- 默认导出 `Markdown`，适合文本归档和后续处理。
+- 保留 `HTML`，继续支持带资源的离线页面归档。
+- Markdown 文件名统一为 `yyyy-MM-dd title.md`。
+- HTML 归档目录也复用同样的日期前缀命名。
+- 标题里的 `/` 会被替换，避免 zip 内部意外生成多层目录。
 
-- 感谢 [Deno Deploy]、[Cloudflare Workers] 提供免费托管服务
-- 感谢 [WeChat_Article] 项目提供原理思路
+具体实现集中在 `utils/download/album-format.ts`：这里定义合集下载格式类型、默认值、下拉选项和文件名构造函数。`pages/dashboard/album.vue` 负责在 UI 上绑定格式选择，`composables/useBatchDownload.ts` 根据选择走两条路径：HTML 继续调用现有资源打包逻辑，Markdown 则解析文章 HTML、渲染正文结构，再用 Turndown 写入 `.md` 文件。
 
+### 合集导出格式增加轻量回归测试
 
-## :star: 支持
+新增 `test/album_download_format.ts`，锁定目前支持的合集导出格式只有 `markdown` 和 `html`，并确认默认值保持为 `markdown`。这个测试很小，但能防止后续改 UI 选项或默认导出行为时无意改变归档策略。
 
-如果你觉得本项目帮助到了你，请给作者一个免费的 Star，感谢你的支持！
+## 开发与验证
 
+安装和启动方式沿用原仓库：
 
-## :bulb: 原理
+```bash
+corepack enable
+corepack prepare yarn@1.22.22 --activate
+yarn
+yarn dev
+```
 
-在公众号后台写文章时支持搜索其他公众号的文章功能，以此来实现抓取指定公众号所有文章的目的。
+本 fork 当前常用验证命令：
 
+```bash
+node_modules/.bin/jiti.cmd test/album_download_format.ts
+yarn build
+```
 
-## :memo: 许可
+## Remote 策略
 
-MIT
+- `origin` 是我们的 fork：`https://github.com/cloudy-liu/wechat-article-exporter.git`
+- `upstream` 是原仓库：`https://github.com/wechat-article/wechat-article-exporter.git`
+- 后续所有功能分支、提交、推送和 PR 都只面向 `origin` / fork。
+- 不向 `upstream` 直接推送任何修改。
 
-## :red_circle: 声明
+## License
 
-本程序承诺，不会利用您扫码登录的公众号进行任何形式的私有爬虫，也就是说不存在把你的账号作为公共账号为别人爬取文章的行为，也不存在类似账号池的东西。
-
-您的公众号只会服务于您自己的抓取文章的目的。
-
-通过本程序获取的公众号文章内容，版权归文章原作者所有，请合理使用。若发现侵权行为，请联系我们处理。
-
-
-## :chart_with_upwards_trend: Star 历史
-
-[![Star History Chart]][Star History Chart Link]
-
-
-
-<!-- Definitions -->
-
-[GitHub stars]: https://img.shields.io/github/stars/wechat-article/wechat-article-exporter?style=social&label=Star&style=plastic
-
-[GitHub forks]: https://img.shields.io/github/forks/wechat-article/wechat-article-exporter?style=social&label=Fork&style=plastic
-
-[GitHub License]: https://img.shields.io/github/license/wechat-article/wechat-article-exporter?label=License
-
-[Package Version]: https://img.shields.io/github/package-json/v/wechat-article/wechat-article-exporter
-
-
-[Deno Deploy]: https://deno.com/deploy
-
-[Cloudflare Workers]: https://workers.cloudflare.com
-
-[Wechat_Article]: https://github.com/1061700625/WeChat_Article
-
-[Star History Chart]: https://api.star-history.com/svg?repos=wechat-article/wechat-article-exporter&type=Timeline
-
-[Star History Chart Link]: https://star-history.com/#wechat-article/wechat-article-exporter&Timeline
-
-[在线网站]: https://down.mptext.top
+MIT。许可证继承自原仓库，详见本仓库的 `LICENSE`。
diff --git a/composables/useBatchDownload.ts b/composables/useBatchDownload.ts
@@ -1,8 +1,24 @@
-import { format } from 'date-fns';
 import { saveAs } from 'file-saver';
 import JSZip from 'jszip';
+import TurndownService from 'turndown';
+import { parseCgiDataNew } from '#shared/utils/html';
+import { renderHTMLFromCgiDataNew } from '#shared/utils/renderer';
 import type { DownloadableArticle } from '~/types/types';
 import { downloadArticleHTMLs, packHTMLAssets } from '~/utils';
+import {
+  type AlbumDownloadFormat,
+  buildAlbumArticleArchiveName,
+  buildAlbumMarkdownFilename,
+  DEFAULT_ALBUM_DOWNLOAD_FORMAT,
+} from '~/utils/download/album-format';
+
+async function renderMarkdown(article: DownloadableArticle, turndownService: TurndownService): Promise<string> {
+  const html = article.html!;
+  const cgiData = await parseCgiDataNew(html);
+  const renderedHTML = cgiData ? await renderHTMLFromCgiDataNew(cgiData, false) : html;
+
+  return turndownService.turndown(renderedHTML);
+}
 
 /**
  * 批量下载合集文章
@@ -13,8 +29,14 @@ export function useDownloadAlbum() {
   const downloadedCount = ref(0);
   const packedCount = ref(0);
 
-  async function download(articles: DownloadableArticle[], filename: string) {
+  async function download(
+    articles: DownloadableArticle[],
+    filename: string,
+    format: AlbumDownloadFormat = DEFAULT_ALBUM_DOWNLOAD_FORMAT
+  ) {
     loading.value = true;
+    downloadedCount.value = 0;
+    packedCount.value = 0;
 
     try {
       phase.value = '下载文章内容';
@@ -24,14 +46,24 @@ export function useDownloadAlbum() {
 
       phase.value = '打包';
       const zip = new JSZip();
-      for (const article of results) {
-        await packHTMLAssets(
-          article.fakeid,
-          article.html!,
-          article.title.replaceAll('.', '_'),
-          zip.folder(format(new Date(+article.date * 1000), 'yyyy-MM-dd') + ' ' + article.title.replace(/\//g, '_'))!
-        );
-        packedCount.value++;
+
+      if (format === 'html') {
+        for (const article of results) {
+          await packHTMLAssets(
+            article.fakeid,
+            article.html!,
+            article.title.replaceAll('.', '_'),
+            zip.folder(buildAlbumArticleArchiveName(article))!
+          );
+          packedCount.value++;
+        }
+      } else {
+        const turndownService = new TurndownService();
+        for (const article of results) {
+          const markdown = await renderMarkdown(article, turndownService);
+          zip.file(buildAlbumMarkdownFilename(article), markdown);
+          packedCount.value++;
+        }
       }
 
       const blob = await zip.generateAsync({ type: 'blob' });

diff --git a/pages/dashboard/album.vue b/pages/dashboard/album.vue
@@ -47,6 +47,17 @@
             @click="gotoLink(originalAlbumURL)"
             >跳转到原始链接</UButton
           >
+          <USelectMenu
+            v-model="selectedDownloadFormat"
+            :options="ALBUM_DOWNLOAD_FORMAT_OPTIONS"
+            value-attribute="value"
+            option-attribute="label"
+            color="gray"
+            size="md"
+            class="w-36"
+            :disabled="batchDownloadLoading"
+            aria-label="导出格式"
+          />
           <UButton
             color="black"
             variant="solid"
@@ -63,7 +74,7 @@
               >
               <span v-if="batchDownloadPhase === '打包'">{{ batchPackedCount }}/{{ batchDownloadedCount }}</span>
             </span>
-            <span v-else>批量下载</span>
+            <span v-else>批量下载 {{ selectedDownloadFormatLabel }}</span>
           </UButton>
         </div>
       </header>
@@ -133,6 +144,11 @@ import type { AppMsgAlbumResult, ArticleItem, BaseInfo } from '~/types/album';
 import type { AppMsgAlbumInfo, DownloadableArticle } from '~/types/types';
 import { gotoLink } from '~/utils';
 import { formatAlbumTime } from '~/utils/album';
+import {
+  ALBUM_DOWNLOAD_FORMAT_OPTIONS,
+  type AlbumDownloadFormat,
+  DEFAULT_ALBUM_DOWNLOAD_FORMAT,
+} from '~/utils/download/album-format';
 
 useHead({
   title: `合集下载 | ${websiteName}`,
@@ -291,6 +307,10 @@ const {
   download: batchDownload,
 } = useDownloadAlbum();
 const selectedArticleCount = ref(0);
+const selectedDownloadFormat = ref<AlbumDownloadFormat>(DEFAULT_ALBUM_DOWNLOAD_FORMAT);
+const selectedDownloadFormatLabel = computed(() => {
+  return ALBUM_DOWNLOAD_FORMAT_OPTIONS.find(option => option.value === selectedDownloadFormat.value)?.label || '';
+});
 
 function doBatchDownload() {
   const articles: DownloadableArticle[] = albumArticles.map(article => ({
@@ -301,7 +321,7 @@ function doBatchDownload() {
   }));
   selectedArticleCount.value = articles.length;
   const filename = downloadFileName.value;
-  batchDownload(articles, filename);
+  batchDownload(articles, filename, selectedDownloadFormat.value);
 }
 
 // 抓取全部文章链接

diff --git a/test/album_download_format.ts b/test/album_download_format.ts
@@ -0,0 +1,21 @@
+import assert from 'node:assert/strict';
+import {
+  ALBUM_DOWNLOAD_FORMAT_OPTIONS,
+  DEFAULT_ALBUM_DOWNLOAD_FORMAT,
+  isAlbumDownloadFormat,
+} from '../utils/download/album-format';
+
+function run() {
+  assert.equal(DEFAULT_ALBUM_DOWNLOAD_FORMAT, 'markdown');
+  assert.deepEqual(ALBUM_DOWNLOAD_FORMAT_OPTIONS, [
+    { label: 'Markdown', value: 'markdown' },
+    { label: 'HTML', value: 'html' },
+  ]);
+
+  assert.equal(isAlbumDownloadFormat('markdown'), true);
+  assert.equal(isAlbumDownloadFormat('html'), true);
+  assert.equal(isAlbumDownloadFormat('pdf'), false);
+  assert.equal(isAlbumDownloadFormat(undefined), false);
+}
+
+run();
diff --git a/utils/download/album-format.ts b/utils/download/album-format.ts
@@ -0,0 +1,23 @@
+import { format } from 'date-fns';
+import type { DownloadableArticle } from '~/types/types';
+
+export type AlbumDownloadFormat = 'markdown' | 'html';
+
+export const DEFAULT_ALBUM_DOWNLOAD_FORMAT: AlbumDownloadFormat = 'markdown';
+
+export const ALBUM_DOWNLOAD_FORMAT_OPTIONS: { label: string; value: AlbumDownloadFormat }[] = [
+  { label: 'Markdown', value: 'markdown' },
+  { label: 'HTML', value: 'html' },
+];
+
+export function isAlbumDownloadFormat(value: unknown): value is AlbumDownloadFormat {
+  return value === 'markdown' || value === 'html';
+}
+
+export function buildAlbumArticleArchiveName(article: DownloadableArticle): string {
+  return `${format(new Date(+article.date * 1000), 'yyyy-MM-dd')} ${article.title.replace(/\//g, '_')}`;
+}
+
+export function buildAlbumMarkdownFilename(article: DownloadableArticle): string {
+  return `${buildAlbumArticleArchiveName(article)}.md`;
+}
-Original file line number
+Diff line change
@@ Expand Up / @@ -20,6 +20,7 @@ logs @@
     debug
     design
     docs
+    research/
     samples/**/output
     # Local env files
@@ Expand Down @@