@cp/pdf

PDF 解析 Clip — 从 URL 下载并解析 PDF，输出结构化 Markdown。

三种模式

模式	说明	调 LLM	速度	适用场景
`text`	纯文本提取	否	~0.2s	Agent 消费、纯文本 PDF
`markdown`	文本 + LLM 排版	文本	~30s	人类阅读、无图 PDF
`vision`	多轮 Vision 精提取	图片	~2min	有图表/公式/扫描件

使用

# 默认 markdown 模式
bun run index.ts parse --url https://example.com/doc.pdf

# 快速文本提取
bun run index.ts parse --url https://example.com/doc.pdf --mode text

# 视觉精提取（有图表时）
bun run index.ts parse --url https://example.com/doc.pdf --mode vision

# 带背景信息提升排版质量
bun run index.ts parse --url https://example.com/doc.pdf --context "这是一份 API 设计文档"

特性

内容检测 — 自动识别含图片/图表的页，在 hints.recommendation 中建议模式
缓存 — 相同 URL 不重复下载，首次后秒级响应
零系统依赖 — 纯 npm 包（pdfjs-dist + @napi-rs/canvas），无需 poppler 等

安装

bun install

作为 Pinix Clip

# 注册到本地 Hub
pinix hub add @cp/pdf

# 通过 Hub 调用
pinix pdf-xxxx parse --url https://example.com/doc.pdf

# 作为 MCP Server
bun run index.ts --mcp

输出结构

{
  "metadata": { "title": "...", "type": "...", "pageCount": 25 },
  "pages": [
    { "index": 1, "text": "原始文本", "markdown": "# 格式化后" }
  ],
  "markdown": "完整 Markdown（所有页合并）",
  "hints": {
    "hasImages": true,
    "imagePages": [3, 7],
    "isScanned": false,
    "recommendation": "第 3,7 页含图片，vision 模式可提取视觉内容"
  },
  "cached": false
}

依赖

pdfjs-dist — PDF 文本提取 + 页面渲染
@napi-rs/canvas — Node/Bun canvas 实现（Rust，无系统依赖）
@pinixai/core — Clip SDK
OpenRouter API（google/gemini-3-flash-preview）— markdown/vision 模式使用

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
.pinixignore		.pinixignore
README.md		README.md
bun.lock		bun.lock
cache.ts		cache.ts
clip.json		clip.json
extract.ts		extract.ts
gemini.ts		gemini.ts
index.ts		index.ts
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

@cp/pdf

三种模式

使用

特性

安装

作为 Pinix Clip

输出结构

依赖

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

@cp/pdf

三种模式

使用

特性

安装

作为 Pinix Clip

输出结构

依赖

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages