Skip to content

xntj-ai/baoxiao

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

baoxiao · 报销单

License: MIT Claude Code Skill

A messy folder of invoices → a print-ready, audit-grade China expense reimbursement packet.

一个杂乱的发票文件夹 → 一份打印即合规、便于做账和税务备查的中国企业费用报销单包。

baoxiao is a Claude Code skill. Point it at a folder of VAT invoice PDFs and/or photos of paper invoices; it first renames every file to one convention and files it by issue month (photos included), then asks you a few questions (reimbursing entity, applicant, department, expense period, purpose) and assembles a compliant reimbursement PDF packet — a cover summary, a compliance-analysis page, one detail page per invoice with the invoice image embedded, a compact rebuild of Didi ride itineraries, and a deduplication ledger CSV. Electronic VAT invoices (PDFs with a text layer) are parsed directly; paper-invoice photos are read by Claude's vision.

baoxiao 是一个 Claude Code 技能。给它一个装着增值税发票 PDF 和/或发票照片的文件夹,它先把每个文件按统一规范改名、按开票月归档(含照片),再向你提几个问题(报销主体、报销人、部门、费用所属期、用途),组装出一份合规的报销单 PDF 包 —— 封面汇总单、合规分析页、逐票明细页(嵌入发票图)、滴滴行程单的紧凑重制版,以及一份发票查重台账 CSV。增值税电子发票(带文本层的 PDF)直接解析;纸质发票照片交给 Claude 的视觉识别。

Why baoxiao · 为什么用 baoxiao

Reimbursement is death by a thousand small chores: invoices arrive named by rounded amounts that don't match the printed total, the same summary invoice gets reported twice, a Didi ride bill bundles seven pages of commutes and weekend trips into one tax-deductible line, and the form still needs the amount spelled out in Chinese characters. baoxiao does the tedious, error-prone parts deterministically — it reads the real tax-inclusive total off each invoice, dedupes by invoice number, splits private rides from business ones, and lays the whole thing out so it prints exactly the way a finance reviewer expects.

报销是被无数琐碎小事拖垮的过程:发票按四舍五入后的金额命名,和票面真实合计对不上;同一张汇总发票被报销两次;一张滴滴行程单把七页通勤和周末出行打包进一条"可税前扣除"的金额里;报销单还要把金额写成中文大写。baoxiao 把这些繁琐又易错的环节确定性地做掉 —— 逐张读出真实价税合计、按发票号查重、把私人行程从商务行程里拆出来,再把整份单据排成财务复核时正好期待的样子,打印即用。

Most "AI does my paperwork" tools hand you a chat answer. baoxiao hands you the actual artifact: a PDF you can print, staple, sign, and file, plus a CSV ledger you can keep for audit. The amounts are grounded in the invoices themselves, not guessed — the one place where guessing is unacceptable.

多数"AI 帮我处理文书"的工具给你的是一段对话回答。baoxiao 给你的是真正的成品:一份可以打印、装订、签字、归档的 PDF,外加一份可留存备查的 CSV 台账。金额全部锚定在发票本身,而不是猜出来的 —— 而这恰恰是最不能猜的地方。

Features · 功能特性

  • Filename normalization & monthly filing — every invoice is renamed to date_in-out_category_counterparty_amountCNY_last-8-of-invoice-no.pdf and filed into a per-issue-month folder; photos come along. This gives dedup, cross-checking, and archiving a stable anchor before anything else runs. 文件名规范化 + 分月归档 —— 每张发票统一改名为 日期_收支_科目_对方_金额CNY_票号尾8位.pdf,归入按开票月建的子文件夹,照片一并归档;在后续步骤之前,先给查重、核对、归档一个稳定锚点。

  • Authoritative amount parsing — the tax-inclusive total is read from the invoice's Chinese capital amount (壹仟叁佰贰拾柒圆伍角整), never max(¥) on the page. Invoices carry negative discount lines, so the capital amount is the only reliable source. 权威金额解析 —— 价税合计取自发票上的中文大写金额(壹仟叁佰贰拾柒圆伍角整),而不是页面上的 max(¥)。发票带有负数折扣行,大写金额是唯一可靠来源。

  • Photos via vision — paper invoices with no text layer are read by Claude's vision (invoice number, date, seller, tax-inclusive total, rate, category) and folded into the same record structure as the PDF invoices; amount and invoice number get a human double-check. 照片走视觉识别 —— 无文本层的纸质发票交给 Claude 视觉识别(发票号、开票日期、销售方、价税合计、税率、类别),补成与 PDF 发票相同的记录结构;金额与发票号经人工复核。

  • Didi itinerary reconciliation — for summary ride invoices, the itinerary total is checked against the invoice's tax-inclusive total, then rebuilt into a compact 2–3 page table (originals are often 7+ pages), and scanned for private-vs-business signals (residential/commute, malls, restaurants, weekends, late nights). 滴滴行程单勾稽 —— 对汇总开票的网约车发票,先校验行程单合计 = 发票价税合计,再重制成紧凑的 2–3 页表格(原版常 7+ 页),并扫描公私特征(住宅/通勤、商场、餐厅、周末、深夜)。

  • Deduplication ledger — a 发票查重台账.csv keyed on invoice number flags any number that appears more than once, so the same invoice can't be reimbursed twice. 发票查重台账 —— 一份以发票号为键的 发票查重台账.csv,标出任何出现多于一次的发票号,杜绝同一张发票被重复报销。

  • Compliance built in — the packet carries the hard-rule fields a China reimbursement form needs (real per-line purpose, amount in figures and Chinese capital, signature lanes, attachment count) and surfaces the compliance reminders that matter most. 内置合规 —— 报销单包带齐中国报销单需要的硬核字段(逐笔真实用途、小写金额 中文大写、签字栏、附件张数),并把最关键的合规提示直接呈现出来。

  • Restrained, print-first layout — A4 pages rendered with headless Chrome, a muted grayscale design, tabular-aligned numbers, and nowrap on every date and amount so nothing breaks across lines. 克制、为打印而设的版式 —— A4 页面经 headless Chrome 渲染,灰阶克制设计,数字表格对齐,每个日期和金额都 nowrap,绝不跨行折断。

When to use · 适用场景

Reach for baoxiao when you have a batch of China VAT invoices — electronic PDFs, paper photos, or both — that need to become one compliant, printable reimbursement packet: month-end expense claims, a quarter's worth of receipts to file, ride/meal/cloud-service invoices to collate. It is built for the China fiscal-and-tax reimbursement context.

当你有一批中国增值税发票 —— 电子 PDF、纸质照片,或两者兼有 —— 需要整理成一份合规、可打印的报销单包时,就用 baoxiao:月末报销、一个季度待归档的票据、网约车/餐饮/云服务发票的归集。它面向中国财税报销场景而设计。

It is not a bookkeeping ledger, an accounting system, or a tax filing tool — and it does not give tax or legal advice. It packages and lays out; your accountant decides booking and tax treatment.

它不是记账总账、会计系统,也不是报税工具 —— 它不构成税务或法律意见。它负责整理与排版;具体如何入账与税务处理,由你的会计师决定。

Install · 安装

Clone this repository into your Claude Code skills directory:

把本仓库克隆到你的 Claude Code 技能目录:

git clone https://github.com/xntj-ai/baoxiao.git ~/.claude/skills/baoxiao

Dependencies · 依赖:

  • Python 3.10+
  • pymupdf (fitz) — PDF text extraction and page rendering. PDF 文本提取与页面渲染。
  • Chrome or Edge — headless rendering of the final PDF; rmblib.find_chrome auto-detects it. headless 渲染最终 PDF;rmblib.find_chrome 自动探测。
pip install pymupdf

Claude invokes the skill automatically — per the description in SKILL.md — when you ask anything in the "make a reimbursement form / file these invoices" family.

Claude 会按 SKILL.md 里的 description,在你提出"做报销单/把这些发票整理成报销单/费用报销"这类请求时自动调用本技能。

Usage · 用法

Tell Claude what you have and what you want, for example: "use baoxiao to turn this folder of invoices into a reimbursement packet." Claude reads each invoice for its real total, normalizes filenames, asks the few questions it needs, and produces the PDF packet plus the dedup ledger.

对 Claude 说你有什么、要什么,比如:"用 baoxiao 把这个发票文件夹做成报销单包。" Claude 会逐张读出真实金额、规范化文件名、问清它需要的几个问题,然后产出 PDF 包和查重台账。

The flow is interactive and runs in order · 流程是交互式的,按顺序进行:

  1. Normalize & file — read each invoice's true tax-inclusive total + invoice number, rename to the convention, file by issue month (photos via vision). 整理与归档 —— 读出每张发票真实价税合计 + 票号,按规范改名,按开票月归档(照片走视觉)。
  2. Collect info — reimbursing entity + unified social credit code, applicant, department, expense period, payment method. 收集信息 —— 报销主体 + 统一社会信用代码、报销人、部门、费用所属期、支付方式。
  3. Extract fieldsextract.py parses PDFs (capital-amount method); photos are read by vision and merged. 提取字段 —— extract.py 解析 PDF(大写金额法);照片经视觉识别后合并。
  4. Cross-check — the invoices are listed as a table for you to verify date / seller / category / total / rate; any failed amount (⚠) must be filled by hand. 核对 —— 发票列成表供你校验日期/销售方/类别/合计/税率;金额解析失败的行(⚠)必须人工补。
  5. Purpose + Didi itinerary — purpose (not on the invoice) is inferred and marked draft until you confirm; ride itineraries are reconciled, rebuilt compact, and private/business-split. 用途 + 滴滴行程单 —— 用途(发票上没有)先推断、标为草稿待你确认;行程单做勾稽、重制、公私拆分。
  6. Build the packetbuild_packet.py assembles 报销单.pdf + 发票查重台账.csv. 生成报销单包 —— build_packet.py 组装 报销单.pdf + 发票查重台账.csv

Output · 产物结构

A normalized invoice folder, one PDF packet, and one CSV ledger:

一个规范化的发票文件夹、一份 PDF 包、一份 CSV 台账:

Output 产物 What it is 它是什么
Normalized invoice folder 规范化发票文件夹 Every file renamed to date_in-out_category_counterparty_amountCNY_last-8.pdf and filed under a per-month folder (e.g. 2025-06/); Didi itineraries pair with their invoice by exact amount. 每个文件改名为 日期_收支_科目_对方_金额CNY_票号尾8位.pdf,归入按月子文件夹(如 2025-06/);滴滴行程单按金额精确配对到对应发票。
报销单.pdf → Cover summary 封面汇总单 Header (entity, bill no., department, applicant, dates, tax code, payment method, attachment count), the full invoice table, per-category subtotals, the grand total in figures and Chinese capital, four signature lanes. 抬头(单位、单号、部门、报销人、日期、税号、支付方式、附件张数)、完整发票表、分类小计、总额小写与中文大写、四栏签字。
报销单.pdf → Compliance-analysis page 合规分析页 KPI cards (total / count / categories / period), a category-distribution bar chart, a tax-rate breakdown, a four-item compliance checklist (dedup / e-original archiving / authenticity check / purpose completeness), and a Top-3 large-items table. KPI 卡(总额/张数/类别/所属期)、费用类型分布条形图、税率分布、四项合规检查清单(查重/电子原件归档/真伪查验/用途完整性)、大额项 Top 3 表。
报销单.pdf → Per-invoice detail pages 逐票明细页 One page per invoice — a metadata band (bill no., sequence, seller, date, invoice no., rate, total, purpose) above the embedded invoice image, with a finance-verification line at the foot. 一票一页 —— 元数据条(单号、序号、销售方、日期、发票号、税率、合计、用途)在上,嵌入的发票图在下,页脚是财务核验行。
报销单.pdf → Compact Didi itinerary 滴滴行程单(紧凑版) Spliced in right after the matching ride-invoice page: every trip preserved (sequence, date, time, weekday, start, end, km, amount) in a 2–3 page table instead of the 7+ page original. 拼接在对应网约车发票页之后:逐笔保真(序号、日期、时间、周、起点、终点、里程、金额),用 2–3 页表格替代 7+ 页原版。
发票查重台账.csv 发票查重台账 One row per invoice (sequence, date, invoice no., seller, category, total, rate, plus blank "booked" / "verified" columns), keyed on invoice number; any number appearing twice is reported back. 一票一行(序号、日期、发票号、销售方、类别、合计、税率,外加空白"已入账"/"查验状态"列),以发票号为键;出现两次的发票号会被报回。

The Didi original PDF is kept as the archival source; only the compact rebuild goes into the print packet.

滴滴原版 PDF 保留作原件存档;只有紧凑重制版进打印包。

Examples · 示例

Ask Claude · 对 Claude 说:

用 baoxiao 把 ~/Downloads/2025年6月发票 这个文件夹做成报销单,
报销主体是 <公司全称>,报销人 <姓名>,部门 <部门>,费用所属期 2025-06。

baoxiao normalizes the folder, lists every invoice for you to confirm, reconciles any Didi itinerary, and writes the packet:

baoxiao 会规范化文件夹、列出每张发票供你确认、勾稽滴滴行程单,然后写出报销单包:

报销单.pdf          # 封面 + 合规分析 + 逐票页(滴滴页后拼接紧凑行程单)
发票查重台账.csv     # 按发票号查重

(All values above are placeholders — substitute your own folder path, company, and names.) (以上均为占位符,请替换成你自己的文件夹路径、公司与姓名。)

How it works · 技术原理

extract.py walks the source folder, uses pymupdf to pull text from each PDF, and parses structured fields with rmblib.parse_invoice_text — invoice number, date, buyer/seller (a --buyer-hint disambiguates the two parties), tax rate, project, and the tax-inclusive total via rmblib.cn_capital_to_num (the Chinese-capital method). It renders each invoice's first page to PNG; photos with no text layer are registered for Claude's vision to fill in.

extract.py 遍历源文件夹,用 pymupdf 提取每个 PDF 的文本,再用 rmblib.parse_invoice_text 解析结构化字段 —— 发票号、日期、买/卖方(--buyer-hint 区分买卖双方)、税率、项目,以及通过 rmblib.cn_capital_to_num(中文大写法)得到的价税合计。它把每张发票首页渲染成 PNG;无文本层的照片登记下来,交给 Claude 视觉识别回填。

itinerary.py extracts every trip row from a Didi itinerary with pymupdf's table parser, reconciles the sum against the invoice total, rebuilds a compact HTML table, and scans each trip's start/end against generic keyword sets (residential, leisure, dining, business) plus weekend/late-night timing to estimate a business / commute / personal split.

itinerary.py 用 pymupdf 的表格解析逐笔提取行程,把合计与发票金额勾稽,重制成紧凑 HTML 表格,并把每笔行程的起讫点比对通用关键词集(住宅、休闲、餐饮、商务)及周末/深夜时段,估算商务/通勤/私人的拆分。

build_packet.py composes the cover, analysis page, and per-invoice pages into one HTML document, renders it to A4 PDF with headless Chrome (rmblib.render_pdf / find_chrome auto-detect Chrome or Edge), then uses pymupdf to splice each compact itinerary in right after its ride-invoice page and writes the dedup ledger CSV. rmblib.num_to_cn_capital produces the grand total in Chinese capital for the cover.

build_packet.py 把封面、分析页、逐票页组装成一个 HTML 文档,用 headless Chrome 渲染成 A4 PDF(rmblib.render_pdf / find_chrome 自动探测 Chrome 或 Edge),再用 pymupdf 把每份紧凑行程单拼接到对应网约车发票页之后,并写出查重台账 CSV。封面上的中文大写总额由 rmblib.num_to_cn_capital 生成。

FAQ · 常见问题

Why parse the Chinese capital amount instead of reading the digits? Because VAT invoices carry negative discount lines, so the largest ¥ on the page isn't the total — and the printed digit total can disagree with line items. The Chinese capital amount (...圆...角...分) is the single authoritative figure on the invoice, so baoxiao parses that.

为什么解析中文大写金额,而不直接读阿拉伯数字? 因为增值税发票带有负数折扣行,页面上最大的 ¥ 并不是合计 —— 而打印的数字合计也可能和明细行对不上。中文大写金额(...圆...角...分)是发票上唯一权威的数字,所以 baoxiao 解析它。

Can it handle photos of paper invoices? Yes. Photos have no text layer, so Claude reads them visually — invoice number, date, seller, total, rate, category — and merges them into the same record structure as the PDF invoices. Following audit discipline, amounts and invoice numbers get a human double-check.

能处理纸质发票照片吗? 能。照片没有文本层,所以由 Claude 视觉识别 —— 发票号、开票日期、销售方、合计、税率、类别 —— 再合并成与 PDF 发票相同的记录结构。遵循审查纪律,金额与发票号经人工复核。

What about a Didi invoice that bundles commutes and weekend trips? baoxiao reconciles the itinerary total against the invoice, rebuilds it compact, and reports the private-vs-business split. A single summary invoice mixing private trips is "contaminated" — it can't be cleanly reported as all-business; commuting and private rides reimbursed by the company are treated as salary in kind and carry individual-income-tax exposure. See references/compliance.md and references/vehicle-policy-template.md.

滴滴发票把通勤和周末出行打包了怎么办? baoxiao 把行程单合计与发票勾稽、重制成紧凑版,并报出公私拆分。一张混了私人行程的汇总发票是被"污染"的 —— 无法干净地整张作商务报销;由公司报销的通勤和私人用车视为变相工资薪金,有个税风险。详见 references/compliance.mdreferences/vehicle-policy-template.md

Does it verify whether an invoice is genuine? No — authenticity is checked by a human at the national VAT verification platform inv-veri.chinatax.gov.cn. baoxiao dedupes by invoice number and lays out the packet; it does not call any verification API.

它会验证发票真伪吗? 不会 —— 真伪由人工在全国增值税发票查验平台 inv-veri.chinatax.gov.cn 查验。baoxiao 按发票号查重并排版,不调用任何查验接口。

Is this tax or accounting advice? No. baoxiao provides organizing and layout convenience only and does not constitute tax or legal advice. Booking and tax treatment are for your accountant.

这算税务或会计意见吗? 不算。baoxiao 只提供整理与排版的便利,不构成税务或法律意见。入账与税务处理请咨询你的会计师。

Related · 相关

  • onepage-pdf — turn a long HTML report or proposal into a single continuous-page PDF; the same headless-Chrome-to-PDF lineage baoxiao builds on. 把长 HTML 报告或提案变成不分页的单页 PDF;与 baoxiao 同源的 headless-Chrome 转 PDF 路线。
  • ppvi — the restrained light-mode visual identity these docs pages are built on. 这两个文档页所沿用的克制浅色视觉体系。
  • xntj.tv — more Claude Code workflows and skills from 张拼拼 · XNTJ. 更多来自张拼拼·XNTJ 的 Claude Code 工作流与技能。

License · 许可证

MIT © 张拼拼 · XNTJ

About

中国企业费用报销单生成器 · Claude Skill:整理发票(PDF+照片)→规范命名归档→生成合规报销单 PDF 包 + 查重台账

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages