diff --git a/.DS_Store b/.DS_Store deleted file mode 100644 index 2ea918d..0000000 Binary files a/.DS_Store and /dev/null differ diff --git a/CLAUDE.md b/CLAUDE.md index d29d7c2..91356b5 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -1,556 +1,92 @@ -# ArgusBot - CLAUDE.md +# CLAUDE.md -## 环境配置 -- python 版本: 3.12.3 +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. -## 项目概述 +## What is ArgusBot -ArgusBot 是一个 Python supervisor 插件,用于 Codex CLI 的自动循环执行器。它解决了"agent 过早停止并请求下一步指令"的问题。 +ArgusBot is a Python supervisor plugin that wraps Codex CLI and Claude Code CLI with an automatic loop. A main agent executes tasks, a reviewer sub-agent gates completion (`done`/`continue`/`blocked`), and a planner sub-agent maintains a live framework view. The loop only stops when the reviewer says `done` and all acceptance checks pass. -**核心机制:** -- **Main Agent**: 执行实际任务 (`codex exec` 或 `codex exec resume`) -- **Reviewer Sub-agent**: 评估完成情况 (`done` / `continue` / `blocked`) -- **Planner Sub-agent**: 维护实时计划视图并提出下一 session 目标 -- **循环机制**: 只有当 reviewer 说 `done` 且所有验收检查通过时才停止 +## Build & Run -## 架构 - -``` -┌─────────────────────────────────────────────────────────────┐ -│ ArgusBot │ -├─────────────────────────────────────────────────────────────┤ -│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ -│ │ Main │───▶│ Reviewer │───▶│ Planner │ │ -│ │ Agent │ │ (done?) │ │ (next?) │ │ -│ └──────────────┘ └──────────────┘ └──────────────┘ │ -│ │ │ │ │ -│ ▼ ▼ ▼ │ -│ ┌──────────────────────────────────────────────────────┐ │ -│ │ LoopEngine / Orchestrator │ │ -│ └──────────────────────────────────────────────────────┘ │ -│ │ │ │ │ -│ ▼ ▼ ▼ │ -│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ -│ │ CodexRunner │ │ Checks │ │ State Store │ │ -│ └──────────────┘ └──────────────┘ └──────────────┘ │ -├─────────────────────────────────────────────────────────────┤ -│ Control Channels: Telegram | Feishu | Terminal (CLI) │ -└─────────────────────────────────────────────────────────────┘ -``` - -## 核心组件 - -### 核心引擎 - -| 文件 | 职责 | -|------|------| -| `codex_autoloop/core/engine.py` | **LoopEngine** - 核心循环引擎:管理主 agent→检查→reviewer→planner 循环 | -| `codex_autoloop/orchestrator.py` | **AutoLoopOrchestrator** - 编排器:协调 runner、reviewer、planner 的执行流程 | -| `codex_autoloop/codexloop.py` | **主循环入口** - 单字命令 `argusbot` 的实现 | -| `codex_autoloop/core/ports.py` | 事件端口接口定义 | -| `codex_autoloop/core/state_store.py` | 状态存储和事件处理 | - -### Agent 组件 - -| 文件 | 职责 | -|------|------| -| `codex_autoloop/reviewer.py` | **Reviewer** - 评审器:评估任务是否完成,返回 done/continue/blocked | -| `codex_autoloop/planner.py` | **Planner** - 规划器:维护工作流视图,提出后续目标 | -| `codex_autoloop/stall_subagent.py` | **停滞检测** - 检测 agent 停滞并自动诊断/重启 | -| `codex_autoloop/btw_agent.py` | **BTW 侧边代理** - 只读项目问答代理 | - -### 执行器 - -| 文件 | 职责 | -|------|------| -| `codex_autoloop/codex_runner.py` | **CodexRunner** - Codex CLI 执行器:调用 `codex exec` | -| `codex_autoloop/checks.py` | **验收检查** - 运行并验证检查命令 | - -### 控制通道 - -| 文件 | 职责 | -|------|------| -| `codex_autoloop/telegram_control.py` | **Telegram 控制** - Telegram inbound 控制通道 | -| `codex_autoloop/telegram_notifier.py` | **Telegram 通知** - Telegram 事件推送 | -| `codex_autoloop/telegram_daemon.py` | **Telegram 守护进程** - 24/7 后台运行 | -| `codex_autoloop/feishu_adapter.py` | **飞书适配** - 飞书通知和控制通道 | -| `codex_autoloop/daemon_bus.py` | **命令总线** - JSONL 格式的守护进程命令通道 | -| `codex_autoloop/daemon_ctl.py` | **守护进程控制** - 终端控制台命令 | -| `codex_autoloop/local_control.py` | **本地终端控制** - 本地终端交互 | - -### 数据模型 - -| 文件 | 职责 | -|------|------| -| `codex_autoloop/models.py` | **核心数据结构** - ReviewDecision, PlanDecision, PlanSnapshot, RoundSummary | -| `codex_autoloop/planner_modes.py` | **Planner 模式** - off/auto/record 模式定义 | - -### 工具和辅助 - -| 文件 | 职责 | -|------|------| -| `codex_autoloop/model_catalog.py` | **模型目录** - 常用模型预设查询 | -| `codex_autoloop/setup_wizard.py` | **安装向导** - 交互式首次配置 | -| `codex_autoloop/token_lock.py` | **Token 独占锁** - 一 Telegram token 一守护进程 | -| `codex_autoloop/copilot_proxy.py` | **Copilot 代理** - GitHub Copilot 本地代理 | -| `codex_autoloop/dashboard.py` | **本地 Web 仪表板** - 实时运行状态可视化 | -| `codex_autoloop/live_updates.py` | **实时更新推送** - 实时 agent 消息推送 | -| `codex_autoloop/attachment_policy.py` | **附件策略** - BTW 附件上传策略 | - -## 入口点命令 (pyproject.toml) - -``` -argusbot - 单字入口 (自动附加监控) -argusbot-run - 运行循环 -argusbot-daemon - Telegram/Feishu 守护进程 -argusbot-daemon-ctl - 守护进程控制 -argusbot-setup - 交互式安装向导 -argusbot-models - 模型目录查询 -``` - -## 数据模型 - -### ReviewDecision (models.py:41-48) -```python -status: Literal["done", "continue", "blocked"] -confidence: float -reason: str -next_action: str -round_summary_markdown: str -completion_summary_markdown: str -``` - -### PlanDecision (models.py:51-57) -```python -follow_up_required: bool -next_explore: str -main_instruction: str -review_instruction: str -overview_markdown: str -``` - -### PlanSnapshot (models.py:68-82) -```python -plan_id: str -generated_at: str -trigger: str -terminal: bool -summary: str -workstreams: list[PlanWorkstream] -done_items: list[str] -remaining_items: list[str] -risks: list[str] -next_steps: list[str] -exploration_items: list[str] -suggested_next_objective: str -should_propose_follow_up: bool -report_markdown: str -``` - -## 控制通道 - -### Telegram -- Bot token 和 chat_id 配置 -- 命令:`/run`, `/inject`, `/status`, `/stop`, `/plan`, `/review`, `/btw` -- 支持语音/音频转录 (Whisper) -- 支持附件上传 (图片/视频/文件) - -### Feishu (飞书) -- App ID / App Secret / Chat ID 配置 -- 适合中国网络环境 -- 群聊命令支持 (@bot /command) - -### 本地终端 -- `argusbot` - 附加监控控制台 -- `argusbot-daemon-ctl --bus-dir ` - 直接控制守护进程 - -## 开发指南 - -### 测试 ```bash -pytest tests/ +python -m venv .venv && source .venv/bin/activate +pip install -e . # editable install +pip install pytest # for running tests ``` -测试文件覆盖: -- `tests/test_codexloop.py` - 主循环测试 -- `tests/test_orchestrator.py` - 编排器测试 -- `tests/test_reviewer.py` - Reviewer 测试 -- `tests/test_planner.py` - Planner 测试 -- `tests/test_engine.py` - LoopEngine 测试 -- 各组件单元测试... - -### 调试 +## Testing -**查看详细事件流:** ```bash -argusbot-run --verbose-events "objective" +pytest -q # run all tests +pytest tests/test_loop_engine.py # run a single test file +pytest -k test_loop_engine_stops # run a specific test by name ``` -**日志文件位置:** -- 守护进程日志:`.argusbot/daemon.out` -- 事件流:`.argusbot/logs/daemon-events.jsonl` -- 运行存档:`.argusbot/logs/argusbot-run-archive.jsonl` - -### 扩展 - -**添加新的控制通道:** -1. 在 `codex_autoloop/adapters/` 下创建新的适配器 -2. 实现 `EventSink` 接口 (`core/ports.py`) -3. 在 `setup_wizard.py` 中添加配置选项 - -**添加新的 agent 类型:** -1. 参考 `reviewer.py` 或 `planner.py` 的实现模式 -2. 使用 `CodexRunner` 执行子代理调用 -3. 定义结构化输出 schema (JSON) - - -### 关键文件引用 - -**核心引擎:** -- `codex_autoloop/core/engine.py` - LoopEngine -- `codex_autoloop/orchestrator.py` - AutoLoopOrchestrator -- `codex_autoloop/codexloop.py` - 主循环入口 - -**Agent 组件:** -- `codex_autoloop/reviewer.py` - Reviewer 评审器 -- `codex_autoloop/planner.py` - Planner 规划器 -- `codex_autoloop/stall_subagent.py` - 停滞检测 - -### 测试检查(cli) - -source .venv/bin/activate -在claude_test目录下执行测试检查 - -#### 1. 简单任务 - 创建文件 -claude-autoloop-run "创建一个 README.md 文件,包含项目介绍" --yolo --max-rounds 2 --skip-git-repo-check - -#### 2. 数学计算任务 -claude-autoloop-run "计算 1 到 100 的和,将结果写入 result.txt" --yolo --max-rounds 2 --skip-git-repo-check - -#### 3. 代码修改任务 -claude-autoloop-run "在当前目录创建一个 Python 计算器模块,支持加减乘除" --yolo --max-rounds 3 --skip-git-repo-check - -Reviewer 和 Planner 测试 - -source .venv/bin/activate - -#### 4. 测试 planner 自动模式(默认) -claude-autoloop-run "分析当前目录结构,创建一个项目分析报告" \ - --yolo --max-rounds 3 --skip-git-repo-check \ - --planner - -#### 5. 关闭 planner 测试 -claude-autoloop-run "打印 Hello World" \ - --yolo --max-rounds 2 --skip-git-repo-check \ - --no-planner - -#### 6. 测试 reviewer 决策 -claude-autoloop-run "创建一个空的 package.json 文件" \ - --yolo --max-rounds 2 --skip-git-repo-check - -验收检查测试 - -source .venv/bin/activate - -#### 7. 带验收检查的任务 -claude-autoloop-run "创建一个 greet.py 脚本,接受名字参数并打印问候语" \ - --yolo --max-rounds 3 --skip-git-repo-check \ - --check "python3 greet.py World | grep -q 'Hello World'" \ - --check "test -f greet.py" +Tests live in `tests/` and use lightweight stub classes (no external services needed). CI runs pytest across Python 3.10–3.13. -#### 8. 多检查项测试 -claude-autoloop-run "创建一个包含 main 函数的 Python 模块" \ - --yolo --max-rounds 3 --skip-git-repo-check \ - --check "python3 -m py_compile module.py" \ - --check "grep -q 'if __name__' module.py" +## CLI Entry Points (defined in pyproject.toml) -模型和配置测试 +| Command | Module | Purpose | +|---|---|---| +| `argusbot` | `codex_autoloop.codexloop:main` | One-word entrypoint: first-run setup, later attach monitor | +| `argusbot-run` | `codex_autoloop.cli:main` | Direct run with full CLI flags | +| `argusbot-daemon` | `codex_autoloop.telegram_daemon:main` | Always-on daemon for Telegram/Feishu idle control | +| `argusbot-daemon-ctl` | `codex_autoloop.daemon_ctl:main` | Terminal control for a running daemon | +| `argusbot-setup` | `codex_autoloop.setup_wizard:main` | Interactive first-run wizard | +| `argusbot-models` | `codex_autoloop.model_catalog:main` | List model presets | -source .venv/bin/activate +## Architecture (three layers) -#### 9. 指定模型 -claude-autoloop-run "简单任务" \ - --yolo --max-rounds 2 --skip-git-repo-check \ - --main-model qwen3.5-plus - -#### 10. 指定 reasoning effort -claude-autoloop-run "分析这个文件的代码结构" \ - --yolo --max-rounds 2 --skip-git-repo-check \ - --main-reasoning-effort high \ - --reviewer-reasoning-effort medium - -#### 11. 不同 agent 使用不同模型 -claude-autoloop-run "复杂任务" \ - --yolo --max-rounds 3 --skip-git-repo-check \ - --main-model qwen3.5-plus \ - --reviewer-model qwen3.5-plus \ - --plan-model qwen3.5-plus - -状态和输出文件测试 - -source .venv/bin/activate - -#### 12. 输出状态文件 -claude-autoloop-run "创建测试文件" \ - --yolo --max-rounds 2 --skip-git-repo-check \ - --state-file /tmp/test-state.json - -#### 13. 输出操作员消息文件 -claude-autoloop-run "多轮对话任务" \ - --yolo --max-rounds 3 --skip-git-repo-check \ - --operator-messages-file /tmp/operator.md \ - --plan-overview-file /tmp/plan.md - -#### 14. 完整输出文件测试 -claude-autoloop-run "完整项目任务" \ - --yolo --max-rounds 3 --skip-git-repo-check \ - --state-file /tmp/state.json \ - --operator-messages-file /tmp/messages.md \ - --plan-overview-file /tmp/plan.md \ - --plan-todo-file /tmp/todo.md \ - --review-summaries-dir /tmp/reviews - -停滞检测测试 - -source .venv/bin/activate - -#### 15. 短停滞检测(快速测试) -claude-autoloop-run "长时间任务" \ - --yolo --max-rounds 2 --skip-git-repo-check \ - --stall-soft-idle-seconds 60 \ - --stall-hard-idle-seconds 120 - -#### 16. 禁用停滞检测 -claude-autoloop-run "任务" \ - --yolo --max-rounds 2 --skip-git-repo-check \ - --stall-soft-idle-seconds 0 \ - --stall-hard-idle-seconds 0 - -控制通道测试 - -source .venv/bin/activate - -#### 17. 本地控制文件 -claude-autoloop-run "长时间运行的任务" \ - --yolo --max-rounds 5 --skip-git-repo-check \ - --control-file /tmp/control.jsonl \ - --control-poll-interval-seconds 1 - -后台注入控制命令示例: -echo '{"type": "inject", "message": "请改为打印 10 次 Hello"}' >> /tmp/control.jsonl -echo '{"type": "stop"}' >> /tmp/control.jsonl -echo '{"type": "status"}' >> /tmp/control.jsonl - -详细日志测试 - -source .venv/bin/activate - -#### 18. 详细事件输出 -claude-autoloop-run "调试任务" \ - --yolo --max-rounds 2 --skip-git-repo-check \ - --verbose-events - -#### 19. 禁用实时终端输出 -claude-autoloop-run "安静模式任务" \ - --yolo --max-rounds 2 --skip-git-repo-check \ - --no-live-terminal - -Copilot Proxy 测试(如果配置了) - -source .venv/bin/activate - -#### 20. 使用 Copilot Proxy -claude-autoloop-run "任务" \ - --copilot-proxy \ - --copilot-proxy-port 18080 \ - --yolo --max-rounds 2 --skip-git-repo-check - -压力/边界测试 - -source .venv/bin/activate - -#### 21. 最大轮次限制测试 -claude-autoloop-run "不可能完成的任务" \ - --yolo --max-rounds 1 --skip-git-repo-check - -#### 22. 无进展检测测试 -claude-autoloop-run "重复性任务" \ - --yolo --max-rounds 5 --max-no-progress-rounds 2 --skip-git-repo-check - -#### 23. 空目标测试(应该报错) -claude-autoloop-run "" --yolo --max-rounds 1 --skip-git-repo-check - -组合测试 - -source .venv/bin/activate - -#### 24. 完整功能测试 -claude-autoloop-run "创建一个完整的 Python 项目,包含 setup.py、README.md 和示例模块" \ - --yolo --max-rounds 5 --skip-git-repo-check \ - --main-model qwen3.5-plus \ - --reviewer-model qwen3.5-plus \ - --plan-model qwen3.5-plus \ - --state-file /tmp/full-state.json \ - --operator-messages-file /tmp/messages.md \ - --plan-overview-file /tmp/plan.md \ - --review-summaries-dir /tmp/reviews \ - --check "test -f setup.py" \ - --check "test -f README.md" \ - --verbose-events - -#### 25. 额外目录访问 (--add-dir) -claude-autoloop-run "在 /tmp 目录创建文件" \ - --yolo --max-rounds 2 --skip-git-repo-check \ - --add-dir /tmp - -#### 26. 插件目录 (--plugin-dir) -claude-autoloop-run "使用自定义插件" \ - --yolo --max-rounds 2 --skip-git-repo-check \ - --plugin-dir /path/to/plugins - -#### 27. 文件资源下载 (--file) -claude-autoloop-run "使用下载的文件资源" \ - --yolo --max-rounds 2 --skip-git-repo-check \ - --file "file_abc:doc.txt" - -#### 28. Git Worktree (--worktree) -claude-autoloop-run "在隔离的 worktree 中开发" \ - --yolo --max-rounds 3 --skip-git-repo-check \ - --worktree feature-branch - -快速验证命令 - -#### 查看生成的文件 -cat /tmp/state.json | python3 -m json.tool -cat /tmp/plan.md -ls -la /tmp/reviews/ - ---- - -## 附录:Claude Code CLI 结构化输出参考 - -### 命令行参数 - -| 参数 | 说明 | -|------|------| -| `--json-schema ` | JSON Schema 用于结构化输出验证(内联 JSON 字符串) | -| `--output-format ` | 输出格式:`text`(默认)/ `json` / `stream-json` | -| `--print` | 非交互模式(管道友好),使用结构化输出时必需 | -| `--add-dir ` | 允许工具访问的额外目录(可重复) | -| `--plugin-dir ` | 从指定目录加载插件(可重复) | -| `--file ` | 下载文件资源,格式:`file_id:relative_path`(可重复) | -| `--worktree [name]` | 创建 git worktree 会话,可选名称 | - -### 基础示例 - -```bash -# 简单对象 -claude --print --json-schema '{"type":"object","properties":{"name":{"type":"string"}},"required":["name"]}' "创建一个用户" - -# 复杂对象 -claude --print --json-schema '{ - "type":"object", - "properties":{ - "name":{"type":"string"}, - "age":{"type":"number"}, - "city":{"type":"string"} - }, - "required":["name","age","city"] -}' "创建一个用户,名字叫张三,25 岁,来自北京" - -# 纯 JSON 输出(无额外文本) -claude --print --output-format json --json-schema '{"type":"object","properties":{"result":{"type":"string"}},"required":["result"]}' "说 hello" +``` +codex_autoloop/ + core/ # Pure loop runtime, no I/O integration + engine.py # LoopEngine: run main → checks → reviewer → planner → repeat + ports.py # Protocol contracts: EventSink, ControlChannel, NotificationSink + state_store.py # Mutable runtime state, operator messages, injections, stop requests + + adapters/ # Turn external sources into core abstractions + control_channels.py # Telegram/Feishu/bus → ControlCommand + event_sinks.py # Core events → terminal/dashboard/Telegram/Feishu output + + apps/ # Executable shells that wire layers together + cli_app.py # argusbot-run wiring + daemon_app.py # argusbot-daemon wiring + shell_utils.py # Shared shell-facing helpers ``` -### Reviewer Schema 测试示例 +Top-level modules (`orchestrator.py`, `control_state.py`, `codexloop.py`, `cli.py`) are compatibility wrappers that delegate to the three-layer internals. -```bash -claude --print --json-schema '{ - "type":"object", - "required":["status","confidence","reason","next_action"], - "properties":{ - "status":{"type":"string","enum":["done","continue","blocked"]}, - "confidence":{"type":"number","minimum":0,"maximum":1}, - "reason":{"type":"string"}, - "next_action":{"type":"string"} - } -}' "评估这个任务是否完成:已经创建了 README.md 文件" -``` +### Key domain types (`models.py`) -### Planner Schema 测试示例 +- `CodexRunResult` — output from a single runner invocation +- `ReviewDecision` — structured reviewer verdict (status/confidence/reason/next_action) +- `PlanDecision` — planner output (follow_up_required, instructions, overview markdown) +- `RoundSummary` — per-round aggregate of main result + checks + review + plan +- `PlanMode` — `Literal["off", "auto", "record"]` -```bash -claude --print --json-schema '{ - "type":"object", - "required":["summary","workstreams","next_steps"], - "properties":{ - "summary":{"type":"string"}, - "workstreams":{ - "type":"array", - "items":{ - "type":"object", - "required":["area","status"], - "properties":{ - "area":{"type":"string"}, - "status":{"type":"string","enum":["done","in_progress","todo"]} - } - } - }, - "next_steps":{"type":"array","items":{"type":"string"}} - } -}' "规划一个 Python 项目开发计划" -``` +### Runner backends (`runner_backend.py`, `codex_runner.py`) -### 与 Codex CLI 对比 +Two backends: `codex` (Codex CLI) and `claude` (Claude Code CLI). `CodexRunner` handles subprocess management, JSONL event parsing, stall watchdog, and session resume for both. Claude maps `xhigh` effort to `high`. -| 特性 | Codex CLI | Claude Code CLI | -|------|-----------|-----------------| -| Schema 参数 | `--output-schema ` | `--json-schema ` | -| JSON 事件流 | `--json` | `--output-format stream-json` | -| 非交互模式 | 默认 | `--print` | -| Schema 来源 | 仅文件路径 | 内联 JSON 字符串 或 文件 | +### Loop lifecycle (`core/engine.py`) +Each round: run main agent → run `--check` commands → run reviewer → optionally run planner → decide stop or continue. Stop conditions: reviewer `done` + checks pass, reviewer `blocked`, max rounds, or repeated no-progress. +### Event flow +Structured events (`loop.started`, `round.started`, `round.main.completed`, `round.checks.completed`, `round.review.completed`, `plan.completed`, `loop.completed`) flow through `EventSink` to terminal, dashboard, Telegram, and Feishu adapters. +### Daemon architecture -feishu robot webhook address: https://open.feishu.cn/open-apis/bot/v2/hook/4d2b8fc7-e50f-4174-b953-427761e74295 +The daemon (`apps/daemon_app.py`) polls for commands via `JsonlCommandBus` (`daemon_bus.py`) and Telegram/Feishu adapters, spawns child `argusbot-run` processes, and manages lifecycle. `token_lock.py` enforces one active daemon per Telegram token. +## Per-project runtime state +When ArgusBot operates on a target project, it creates `.argusbot/` in that project with: +- `daemon_config.json` — persisted setup config +- `bus/` — JSONL command bus for daemon↔terminal IPC +- `logs/` — operator messages, run archive JSONL, daemon events -## feishu 启动命令 -```bash -argusbot-daemon \ - --run-cd /Users/halllo/projects/local/ArgusBot \ - --run-max-rounds 500 \ - --bus-dir /Users/halllo/projects/local/ArgusBot/.argusbot/bus \ - --logs-dir /Users/halllo/projects/local/ArgusBot/.argusbot/logs \ - --run-planner-mode auto \ - --run-plan-mode fully-plan \ - --feishu-app-id cli_a93393044b395cb5 \ - --feishu-app-secret MdzD11wewnU7wD4ncuPrVfSYiSmE2tex \ - --feishu-chat-id oc_8517d59f85936c21772d9e2cd8e2e0e1 \ - --feishu-receive-id-type chat_id \ - --run-runner-backend claude \ - --run-runner-bin /opt/homebrew/bin/claude \ - --run-yolo \ - --run-resume-last-session -``` +## Copilot proxy -## 服务器启动命令 -```bash -argusbot-daemon \ - --run-cd /home/ubuntu/projects/OmniSafeBench-MM \ - --run-max-rounds 500 \ - --bus-dir /home/ubuntu/projects/OmniSafeBench-MM/.argusbot/bus \ - --logs-dir /home/ubuntu/projects/OmniSafeBench-MM/.argusbot/logs \ - --run-planner-mode auto \ - --run-plan-mode fully-plan \ - --feishu-app-id cli_a933f2899df89cc4 \ - --feishu-app-secret 9MYP6nf3h5hLYrkmzgvUifuYkx7YtA7g \ - --feishu-chat-id oc_b8e9226c1a47753eee14291c627dc109 \ - --feishu-receive-id-type chat_id \ - --run-runner-backend claude \ - --run-runner-bin /home/ubuntu/node-v24.14.0-linux-x64/bin/claude \ - --run-yolo \ -``` \ No newline at end of file +Optional for Codex backend only. Routes through a local `copilot-proxy` checkout for GitHub Copilot-backed quota. Auto-detected from `~/copilot-proxy`, `~/copilot-codex-proxy`, or `~/.argusbot/tools/copilot-proxy`. diff --git a/README.md b/README.md index f3facbd..df37d2a 100644 --- a/README.md +++ b/README.md @@ -95,7 +95,9 @@ During `argusbot init`, first choose control channel (`1. Telegram`, `2. Feishu - Single-word operator entrypoint: `argusbot` (first run setup, later auto-attach monitor). - Token-exclusive daemon lock: one active daemon per Telegram token. - Operator message history persisted to markdown and fed to reviewer decisions. -- Final task report generated after reviewer `done`, with optional `--final-report-file` output path and notifier delivery when ready. +- PPTX auto-generation for run handoff: builds a presentation-ready slide deck summarizing the completed work. +- Interactive PPTX opt-in: when running `argusbot-run` interactively, the CLI asks whether to generate a PPTX report before starting. Answer `Y` (default) to enable or `n` to skip. Daemon-launched runs (Telegram/Feishu) also ask via the control channel before each `/run` launch — reply `Y` or `N` to confirm. Use `--pptx-report` / `--no-pptx-report` to bypass the prompt. +- Final handoff artifacts generated after reviewer `done`: Markdown via `--final-report-file` and PPTX via `--pptx-report-file`, with notifier delivery when ready. - Run archive persisted as JSONL with date/workspace/session metadata for resume continuity. - Utility scripts: start/kill/watch daemon logs, plus sanitized cross-project setup examples. @@ -121,6 +123,16 @@ source .venv/bin/activate pip install -e . ``` +### PPTX Report Dependencies (optional) + +The PPTX run report generator uses a Node.js script. To enable it: + +```bash +npm install # installs pptxgenjs and other JS dependencies +``` + +If `node` is not available, PPTX generation is silently skipped and does not block the loop. + ## GitHub Copilot via `copilot-proxy` ArgusBot can route Codex backend calls through a local `copilot-proxy` checkout, so main/reviewer/planner/BTW runs can use GitHub Copilot-backed quota instead of OpenAI API billing. @@ -250,6 +262,20 @@ argusbot-run \ "Implement feature X and keep iterating until tests pass" ``` +Report artifact example: + +```bash +argusbot-run \ + --state-file .argusbot/state.json \ + --final-report-file .argusbot/review_summaries/final-task-report.md \ + --pptx-report-file .argusbot/run-report.pptx \ + "实现功能并同时产出 Markdown 与 PPTX 汇报" +``` + +This keeps both handoff artifacts in predictable paths. The PPTX report is also the file pushed by Telegram/Feishu when the run emits `pptx.report.ready`. + +Release note: if `--pptx-report-file` is not passed, ArgusBot still resolves a default PPTX artifact path under the run artifact directory using the standard file name `run-report.pptx`. + Common options: - `--runner-backend {codex,claude,copilot}`: select the execution backend @@ -263,6 +289,7 @@ Common options: - `--full-auto`: request automatic tool approval mode when supported by the selected backend - `--state-file `: write round-by-round state JSON - `--final-report-file `: write the final handoff Markdown report after reviewer `done` +- `--pptx-report-file `: write the auto-generated PPTX run report (default artifact name: `run-report.pptx`) - `--plan-report-file `: write the latest planner markdown snapshot - `--plan-todo-file `: write the latest planner TODO board markdown - `--plan-update-interval-seconds 1800`: run background planning sweeps every 30 minutes diff --git a/codex_autoloop/.DS_Store b/codex_autoloop/.DS_Store deleted file mode 100644 index 18941b6..0000000 Binary files a/codex_autoloop/.DS_Store and /dev/null differ diff --git a/codex_autoloop/adapters/event_sinks.py b/codex_autoloop/adapters/event_sinks.py index aa2949a..0f43eff 100644 --- a/codex_autoloop/adapters/event_sinks.py +++ b/codex_autoloop/adapters/event_sinks.py @@ -99,6 +99,8 @@ def handle_event(self, event: dict[str, object]) -> None: if event_type == "final.report.ready": self._stop_stream_reporter(flush=False) _send_final_report_via_notifier(self.notifier, event) + elif event_type == "pptx.report.ready": + _send_pptx_report_via_notifier(self.notifier, event) elif event_type == "loop.completed": self._stop_stream_reporter(flush=False) self.notifier.notify_event(event) @@ -151,6 +153,8 @@ def handle_event(self, event: dict[str, object]) -> None: if event_type == "final.report.ready": self._stop_stream_reporter(flush=False) _send_final_report_via_notifier(self.notifier, event) + elif event_type == "pptx.report.ready": + _send_pptx_report_via_notifier(self.notifier, event) elif event_type == "loop.completed": self._stop_stream_reporter(flush=False) self.notifier.notify_event(event) @@ -190,6 +194,16 @@ def _send_final_report_via_notifier(notifier, event: dict[str, object]) -> None: notifier.send_local_file(path, caption="ArgusBot final task report") +def _send_pptx_report_via_notifier(notifier, event: dict[str, object]) -> None: + raw_path = str(event.get("path") or "").strip() + if not raw_path: + return + path = Path(raw_path) + if not path.exists(): + return + notifier.send_local_file(path, caption="ArgusBot run report (PPTX)") + + def _render_final_report_message(event: dict[str, object]) -> str: raw_path = str(event.get("path") or "").strip() if not raw_path: diff --git a/codex_autoloop/apps/cli_app.py b/codex_autoloop/apps/cli_app.py index 2c00a57..eb34d8e 100644 --- a/codex_autoloop/apps/cli_app.py +++ b/codex_autoloop/apps/cli_app.py @@ -36,6 +36,7 @@ resolve_btw_messages_file, resolve_final_report_file, resolve_plan_overview_file, + resolve_pptx_report_file, resolve_review_summaries_dir, resolve_operator_messages_file, ) @@ -72,6 +73,15 @@ def run_cli(args: Namespace) -> tuple[dict[str, Any], int]: control_file=args.control_file, state_file=args.state_file, ) + if getattr(args, "pptx_report", None) is False: + pptx_report_file: str | None = None + else: + pptx_report_file = resolve_pptx_report_file( + explicit_path=getattr(args, "pptx_report_file", None), + operator_messages_file=operator_messages_file, + control_file=args.control_file, + state_file=args.state_file, + ) btw_messages_file = resolve_btw_messages_file( explicit_path=None, operator_messages_file=operator_messages_file, @@ -85,6 +95,7 @@ def run_cli(args: Namespace) -> tuple[dict[str, Any], int]: plan_overview_file=plan_overview_file, review_summaries_dir=review_summaries_dir, final_report_file=final_report_file, + pptx_report_file=pptx_report_file, main_prompt_file=args.main_prompt_file, check_commands=args.check or [], plan_mode=args.plan_mode, @@ -487,6 +498,8 @@ def on_control_command(command) -> None: "review_summaries_dir": state_store.review_summaries_dir(), "final_report_file": state_store.final_report_path(), "final_report_ready": state_store.has_final_report(), + "pptx_report_file": state_store.pptx_report_path(), + "pptx_report_ready": state_store.has_pptx_report(), "rounds": [ { "round": item.round_index, diff --git a/codex_autoloop/apps/daemon_app.py b/codex_autoloop/apps/daemon_app.py index 5f2e3bd..556fa9c 100644 --- a/codex_autoloop/apps/daemon_app.py +++ b/codex_autoloop/apps/daemon_app.py @@ -84,6 +84,8 @@ def __init__(self, args: argparse.Namespace) -> None: self.child_started_at: dt.datetime | None = None self.child_control_bus: JsonlCommandBus | None = None self.pending_attachment_batches: dict[str, list[Any]] = {} + self.pending_pptx_run_objective: str | None = None + self.pending_pptx_run_source: str | None = None self.btw_agent = BtwAgent( runner=self.daemon_runner, config=BtwConfig( @@ -203,6 +205,22 @@ def _build_control_channels(self) -> list[object]: def _on_command(self, command) -> None: self._log_event("command.received", source=command.source, kind=command.kind, text=command.text[:700]) + + # Handle pending PPTX confirmation reply + if self.pending_pptx_run_objective is not None: + reply = command.text.strip().lower() + if reply in ("y", "yes", "n", "no"): + objective = self.pending_pptx_run_objective + pptx_enabled = reply in ("y", "yes") + self.pending_pptx_run_objective = None + self.pending_pptx_run_source = None + self._start_child(objective, pptx_report=pptx_enabled) + return + self._send_reply(command.source, "[daemon] PPTX confirmation cancelled. Send /run again to start.") + self.pending_pptx_run_objective = None + self.pending_pptx_run_source = None + # fall through to handle the command normally + if command.kind == "help": self._send_reply(command.source, help_text()) return @@ -342,7 +360,10 @@ def _on_command(self, command) -> None: else: self._send_reply(command.source, "[daemon] active run exists but child control bus unavailable.") return - self._start_child(self._maybe_rewrite_run_objective(objective, source=command.source)) + # Ask about PPTX report before launching + self.pending_pptx_run_objective = objective + self.pending_pptx_run_source = command.source + self._send_reply(command.source, "Generate a PPTX run report at the end? Reply Y or N") return if command.kind in {"plan", "review"}: if not self._child_running(): @@ -401,7 +422,7 @@ def on_complete(result) -> None: self._send_reply(command.source, "[daemon] stopping daemon.") self._stopping = True - def _start_child(self, objective: str) -> None: + def _start_child(self, objective: str, *, pptx_report: bool = True) -> None: assert self.notifier is not None timestamp = dt.datetime.utcnow().strftime("%Y%m%d-%H%M%S") log_path = self.logs_dir / f"run-{timestamp}.log" @@ -428,6 +449,7 @@ def _start_child(self, objective: str) -> None: review_summaries_dir=str(review_summaries_dir), resume_session_id=resume_session_id, force_new_session=force_new_session, + pptx_report=pptx_report, ) log_file = log_path.open("w", encoding="utf-8") self.child = subprocess.Popen(cmd, stdout=log_file, stderr=log_file, text=True, cwd=self.run_cwd) @@ -613,6 +635,7 @@ def build_child_command( review_summaries_dir: str, resume_session_id: str | None, force_new_session: bool = False, + pptx_report: bool = True, ) -> list[str]: preset = get_preset(args.run_model_preset) if args.run_model_preset else None main_model = preset.main_model if preset is not None else args.run_main_model @@ -712,6 +735,10 @@ def build_child_command( cmd.extend(["--state-file", args.run_state_file]) if args.run_no_dashboard: cmd.append("--no-dashboard") + if pptx_report: + cmd.append("--pptx-report") + else: + cmd.append("--no-pptx-report") for add_dir in args.run_add_dir: cmd.extend(["--add-dir", add_dir]) for plugin_dir in args.run_plugin_dir: diff --git a/codex_autoloop/apps/shell_utils.py b/codex_autoloop/apps/shell_utils.py index 6a2fb4b..1cce161 100644 --- a/codex_autoloop/apps/shell_utils.py +++ b/codex_autoloop/apps/shell_utils.py @@ -112,6 +112,25 @@ def resolve_final_report_file( ) +def resolve_pptx_report_file( + *, + explicit_path: str | None, + operator_messages_file: str | None, + control_file: str | None, + state_file: str | None, + default_root: str | None = None, +) -> str: + if explicit_path: + return explicit_path + base = _resolve_artifact_dir( + operator_messages_file=operator_messages_file, + control_file=control_file, + state_file=state_file, + default_root=default_root, + ) + return str(base / "run-report.pptx") + + def format_control_status(state: dict[str, Any]) -> str: status = state.get("status", "unknown") round_index = state.get("round", 0) @@ -125,6 +144,8 @@ def format_control_status(state: dict[str, Any]) -> str: review_summaries_dir = state.get("review_summaries_dir") final_report_file = state.get("final_report_file") final_report_ready = state.get("final_report_ready") + pptx_report_file = state.get("pptx_report_file") + pptx_report_ready = state.get("pptx_report_ready") lines = [ "[autoloop] status", f"status={status}", @@ -149,6 +170,10 @@ def format_control_status(state: dict[str, Any]) -> str: lines.append(f"final_report_file={final_report_file}") if final_report_ready is not None: lines.append(f"final_report_ready={final_report_ready}") + if pptx_report_file: + lines.append(f"pptx_report_file={pptx_report_file}") + if pptx_report_ready is not None: + lines.append(f"pptx_report_ready={pptx_report_ready}") return "\n".join(lines) diff --git a/codex_autoloop/cli.py b/codex_autoloop/cli.py index 2bf540d..65144f7 100644 --- a/codex_autoloop/cli.py +++ b/codex_autoloop/cli.py @@ -15,6 +15,7 @@ resolve_final_report_file, resolve_operator_messages_file, resolve_plan_overview_file, + resolve_pptx_report_file, resolve_review_summaries_dir, ) @@ -28,6 +29,7 @@ "resolve_operator_messages_file", "resolve_plan_report_file", "resolve_plan_todo_file", + "resolve_pptx_report_file", "resolve_review_summaries_dir", ] @@ -57,6 +59,14 @@ def main() -> None: if args.main_prompt_file is None: args.main_prompt_file = resolve_main_prompt_file(state_file=args.state_file, control_file=args.control_file) + # Interactive PPTX report prompt: when --pptx-report is not explicitly set + # and stdin is a terminal, ask the user before starting. + if args.pptx_report is None and _should_prompt_pptx(): + args.pptx_report = _ask_pptx_report() + # --no-pptx-report explicitly disables PPTX generation. + if args.pptx_report is False: + args.pptx_report_file = None + try: payload, exit_code = run_cli(args) except ValueError as exc: @@ -244,6 +254,17 @@ def build_parser() -> argparse.ArgumentParser: default=None, help="Markdown file path for the latest main prompt sent to Codex.", ) + parser.add_argument( + "--pptx-report", + action=argparse.BooleanOptionalAction, + default=None, + help="Enable/disable PPTX run report generation. When omitted, interactive runs prompt the user.", + ) + parser.add_argument( + "--pptx-report-file", + default=None, + help="Output path for the auto-generated PPTX run report.", + ) parser.add_argument( "--control-file", default=None, @@ -497,6 +518,23 @@ def resolve_main_prompt_file(*, state_file: str | None, control_file: str | None return None +def _should_prompt_pptx() -> bool: + """Return True when we should interactively ask about PPTX generation.""" + import sys + return sys.stdin.isatty() + + +def _ask_pptx_report() -> bool: + """Prompt the user to decide whether to generate a PPTX run report.""" + import sys + try: + answer = input("Generate a PPTX run report at the end? [Y/n] ").strip().lower() + except (EOFError, KeyboardInterrupt): + print("", file=sys.stderr) + return False + return answer in ("", "y", "yes") + + def _mirror_plan_report_to_todo(*, report_path: object, todo_path: str) -> None: if not todo_path: return diff --git a/codex_autoloop/core/engine.py b/codex_autoloop/core/engine.py index 2160220..c3df6cb 100644 --- a/codex_autoloop/core/engine.py +++ b/codex_autoloop/core/engine.py @@ -8,9 +8,16 @@ from ..checks import all_checks_passed, run_checks from ..codex_runner import CodexRunner, InactivitySnapshot, RunnerOptions from ..failure_modes import build_progress_signature, build_quota_exhaustion_stop_reason, looks_like_quota_exhaustion -from ..final_report import FinalReportRequest, build_final_report_prompt, write_fallback_final_report +from ..final_report import ( + FinalReportRequest, + PptxReportRequest, + build_final_report_prompt, + build_pptx_report_prompt, + write_fallback_final_report, +) from ..models import PlanDecision, PlanMode, ReviewDecision, RoundSummary from ..planner import Planner, PlannerConfig +from ..pptx_report import generate_pptx_report as generate_pptx_report_fallback from ..reviewer import Reviewer, ReviewerConfig from ..stall_subagent import analyze_stall from .ports import EventSink @@ -479,6 +486,12 @@ def _complete( ) -> LoopResult: if success and not self.state_store.has_final_report(): self._finalize_success_report(session_id=session_id, rounds=rounds) + self._finalize_pptx_report( + success=success, + session_id=session_id, + rounds=rounds, + stop_reason=stop_reason, + ) self.state_store.record_completion(success=success, stop_reason=stop_reason, session_id=session_id) self._emit( { @@ -494,6 +507,104 @@ def _complete( stop_reason=stop_reason, ) + def _finalize_pptx_report( + self, + *, + success: bool, + session_id: str | None, + rounds: list[RoundSummary], + stop_reason: str, + ) -> None: + pptx_path = self.state_store.pptx_report_path() + if not pptx_path: + return + # Read the already-generated Markdown final report as content source + final_report_md = self.state_store.read_final_report_markdown() or "" + request = PptxReportRequest( + objective=self.config.objective, + pptx_path=pptx_path, + session_id=session_id, + success=success, + stop_reason=stop_reason, + operator_messages=self.state_store.list_messages_for_role("all"), + rounds=rounds, + final_report_markdown=final_report_md, + plan_mode=self._current_plan_mode(), + ) + # Try main agent first (uses pptx skill) + agent_failure = self._try_generate_pptx_with_main_agent(request=request, session_id=session_id) + pptx_file = Path(pptx_path) + # Fallback to Node.js script if agent didn't produce the file + if agent_failure is not None or not pptx_file.exists(): + try: + fallback_result = generate_pptx_report_fallback( + objective=self.config.objective, + rounds=rounds, + session_id=session_id, + success=success, + stop_reason=stop_reason, + output_path=pptx_path, + operator_messages=self.state_store.list_messages(), + plan_mode=self._current_plan_mode(), + ) + if not fallback_result: + return + except Exception: + import logging + logging.getLogger(__name__).warning("PPTX fallback generation failed", exc_info=True) + return + self.state_store.record_pptx_report(str(pptx_file)) + self._emit({ + "type": "pptx.report.ready", + "path": str(pptx_file), + "generated_by": "main-agent" if agent_failure is None and pptx_file.exists() else "fallback", + }) + + def _try_generate_pptx_with_main_agent( + self, + *, + request: PptxReportRequest, + session_id: str | None, + ) -> str | None: + """Try to have the main agent generate the PPTX. Returns None on success, error string on failure.""" + prompt = build_pptx_report_prompt(request) + pptx_path = Path(request.pptx_path) + before_bytes: bytes | None = None + if pptx_path.exists(): + try: + before_bytes = pptx_path.read_bytes() + except OSError: + before_bytes = b"" + last_round_index = request.rounds[-1].round_index if request.rounds else 0 + self.state_store.record_main_prompt( + round_index=last_round_index, + phase="pptx-report", + prompt=prompt, + ) + result = self.runner.run_exec( + prompt=prompt, + resume_thread_id=session_id, + options=RunnerOptions( + model=self.config.main_model, + reasoning_effort=self.config.main_reasoning_effort, + dangerous_yolo=self.config.dangerous_yolo, + full_auto=self.config.full_auto, + skip_git_repo_check=self.config.skip_git_repo_check, + extra_args=self.config.main_extra_args, + ), + run_label="main-pptx-report", + ) + if pptx_path.exists(): + try: + after_bytes = pptx_path.read_bytes() + except OSError: + after_bytes = None + if before_bytes is None or after_bytes != before_bytes: + return None # Success: file was created/changed + if result.fatal_error: + return result.fatal_error + return "main agent did not create the PPTX report file" + def _finalize_success_report(self, *, session_id: str | None, rounds: list[RoundSummary]) -> None: if not rounds: return diff --git a/codex_autoloop/core/state_store.py b/codex_autoloop/core/state_store.py index b6d1341..c29bc71 100644 --- a/codex_autoloop/core/state_store.py +++ b/codex_autoloop/core/state_store.py @@ -28,6 +28,7 @@ def __init__( plan_overview_file: str | None = None, review_summaries_dir: str | None = None, final_report_file: str | None = None, + pptx_report_file: str | None = None, main_prompt_file: str | None = None, check_commands: list[str] | None = None, plan_mode: PlanMode = "off", @@ -39,6 +40,7 @@ def __init__( self._plan_overview_file = plan_overview_file self._review_summaries_dir = review_summaries_dir self._final_report_file = final_report_file + self._pptx_report_file = pptx_report_file self._main_prompt_file = main_prompt_file self._check_commands = list(check_commands or []) self._plan_mode = plan_mode @@ -62,6 +64,8 @@ def __init__( "review_summaries_dir": review_summaries_dir, "final_report_file": final_report_file, "final_report_ready": False, + "pptx_report_file": pptx_report_file, + "pptx_report_ready": False, "main_prompt_file": main_prompt_file, "latest_plan_next_explore": None, } @@ -316,6 +320,22 @@ def record_final_report(self, path: str) -> None: self._runtime["final_report_ready"] = True self._write_state_locked() + def pptx_report_path(self) -> str | None: + return self._pptx_report_file + + def has_pptx_report(self) -> bool: + with self._lock: + return bool(self._runtime.get("pptx_report_ready")) + + def record_pptx_report(self, path: str) -> None: + if not path: + return + with self._lock: + self._pptx_report_file = path + self._runtime["pptx_report_file"] = path + self._runtime["pptx_report_ready"] = True + self._write_state_locked() + def latest_plan_overview(self) -> str: if self._latest_plan is not None and self._latest_plan.overview_markdown.strip(): return self._latest_plan.overview_markdown @@ -430,6 +450,8 @@ def _write_state_locked(self) -> None: "review_summaries_dir": self._review_summaries_dir, "final_report_file": self._final_report_file, "final_report_ready": bool(self._runtime.get("final_report_ready")), + "pptx_report_file": self._pptx_report_file, + "pptx_report_ready": bool(self._runtime.get("pptx_report_ready")), "latest_plan": asdict(self._latest_plan) if self._latest_plan else None, "rounds": [self._serialize_round(item) for item in self._rounds], } diff --git a/codex_autoloop/final_report.py b/codex_autoloop/final_report.py index 4aebdf0..461498f 100644 --- a/codex_autoloop/final_report.py +++ b/codex_autoloop/final_report.py @@ -3,6 +3,8 @@ from dataclasses import dataclass from datetime import datetime, timezone from pathlib import Path +from typing import Any +import json import re from .checks import summarize_checks @@ -18,6 +20,19 @@ class FinalReportRequest: round_summary: RoundSummary +@dataclass(frozen=True) +class PptxReportRequest: + objective: str + pptx_path: str + session_id: str | None + success: bool + stop_reason: str + operator_messages: list[str] + rounds: list[RoundSummary] + final_report_markdown: str = "" + plan_mode: str = "off" + + def resolve_final_report_file( *, explicit_path: str | None, @@ -367,3 +382,106 @@ def _resolve_artifact_dir( if default_root: return Path(default_root).resolve() return Path(".").resolve() / ".argusbot" + + +def build_pptx_report_prompt(request: PptxReportRequest) -> str: + """Build the prompt for the main agent to generate a PPTX work presentation.""" + last_round = request.rounds[-1] if request.rounds else None + checks_passed = 0 + checks_total = 0 + if last_round: + for c in last_round.checks: + checks_total += 1 + if c.passed: + checks_passed += 1 + + meta_lines = [ + f"- Date: {datetime.now(timezone.utc).strftime('%Y-%m-%d')}", + f"- Session ID: {request.session_id or 'N/A'}", + f"- Rounds executed: {len(request.rounds)}", + f"- Outcome: {'Success' if request.success else 'Incomplete'}", + f"- Checks: {checks_passed}/{checks_total} passed", + ] + meta_block = "\n".join(meta_lines) + + report_block = request.final_report_markdown.strip() if request.final_report_markdown else "No final report available." + + return ( + "You are the main execution agent, now in the PPTX presentation phase.\n" + "The Markdown final report is already written. Now create a PPTX slide deck.\n" + "Use the local $pptx-run-report skill and the $pptx skill (PptxGenJS).\n\n" + "IMPORTANT: You are NOT making a 'loop run report' about rounds and checks.\n" + "You are making a **work presentation** — something you'd show to a mentor, colleague, or classmate.\n" + "Focus on WHAT was accomplished, WHY it matters, HOW it was done, and WHAT the results are.\n\n" + f"Write the PPTX to this exact absolute path:\n{request.pptx_path}\n\n" + "== ORIGINAL OBJECTIVE ==\n" + f"{request.objective}\n\n" + "== METADATA ==\n" + f"{meta_block}\n\n" + "== FINAL TASK REPORT (Markdown) ==\n" + "Use this as your primary content source. Extract the key points for the slides.\n\n" + f"{report_block}\n\n" + "== INSTRUCTIONS ==\n" + "1. Read the $pptx-run-report skill for slide structure and style guidance.\n" + "2. Choose a color palette that fits the topic (don't always use the same one).\n" + "3. Adapt the slide structure to the type of work (research/feature/bugfix/etc).\n" + "4. Generate 6-10 slides. Quality over quantity.\n" + "5. Match the language of the objective and report.\n" + "6. Use PptxGenJS to write the .pptx file.\n\n" + "After writing the PPTX file, reply with only these two lines:\n" + f"PPTX_REPORT_PATH: {request.pptx_path}\n" + "PPTX_REPORT_STATUS: written\n" + ) + + +def _build_pptx_data_payload(request: PptxReportRequest) -> dict[str, Any]: + """Build the structured data dict for the PPTX report.""" + last_round = request.rounds[-1] if request.rounds else None + + final_checks: list[dict[str, Any]] = [] + checks_passed = 0 + checks_failed = 0 + if last_round: + for check in last_round.checks: + final_checks.append({ + "command": check.command, + "exit_code": check.exit_code, + "passed": check.passed, + }) + if check.passed: + checks_passed += 1 + else: + checks_failed += 1 + + round_data = [] + for r in request.rounds: + round_data.append({ + "round_index": r.round_index, + "review_status": r.review.status, + "review_confidence": r.review.confidence, + "checks_passed": all(c.passed for c in r.checks) if r.checks else True, + }) + + obj_short = request.objective[:80] + "..." if len(request.objective) > 80 else request.objective + return { + "objective": request.objective, + "objective_short": obj_short, + "session_id": request.session_id, + "date": datetime.now(timezone.utc).strftime("%Y-%m-%d %H:%M UTC"), + "success": request.success, + "stop_reason": request.stop_reason, + "total_rounds": len(request.rounds), + "checks_passed": checks_passed, + "checks_failed": checks_failed, + "checks_total": checks_passed + checks_failed, + "reviewer_verdict": last_round.review.status if last_round else None, + "reviewer_reason": last_round.review.reason if last_round else None, + "reviewer_next_action": last_round.review.next_action if last_round else None, + "planner_follow_up_required": last_round.plan.follow_up_required if last_round and last_round.plan else None, + "planner_next_explore": last_round.plan.next_explore if last_round and last_round.plan else None, + "planner_main_instruction": last_round.plan.main_instruction if last_round and last_round.plan else None, + "plan_mode": request.plan_mode, + "final_checks": final_checks, + "rounds": round_data, + "operator_messages": request.operator_messages or [], + } diff --git a/codex_autoloop/pptx_report.py b/codex_autoloop/pptx_report.py new file mode 100644 index 0000000..334a0a6 --- /dev/null +++ b/codex_autoloop/pptx_report.py @@ -0,0 +1,189 @@ +"""PPTX run report generation. + +Builds a structured JSON payload from run data and invokes a Node.js +script (``pptx/generate_run_report.js``) to produce a styled PPTX +presentation. Designed to be called from ``LoopEngine._complete`` +after the Markdown report. Failures are always non-fatal so that the +loop never blocks on a missing ``node`` binary or a broken template. +""" + +from __future__ import annotations + +import json +import logging +import subprocess +import tempfile +from dataclasses import asdict +from datetime import datetime, timezone +from pathlib import Path +from typing import Any + +from .models import RoundSummary + +logger = logging.getLogger(__name__) + +# Resolve the JS script path relative to *this* file's package root. +_PROJECT_ROOT = Path(__file__).resolve().parent.parent +_SCRIPT_DIR = _PROJECT_ROOT / "skills" / "pptx-run-report" +_JS_SCRIPT = _SCRIPT_DIR / "generate_run_report.js" + + +def build_report_data( + *, + objective: str, + rounds: list[RoundSummary], + session_id: str | None, + success: bool, + stop_reason: str, + operator_messages: list[str] | None = None, + plan_mode: str = "off", +) -> dict[str, Any]: + """Extract structured data from run results into a dict for the JS template.""" + total_rounds = len(rounds) + last_round = rounds[-1] if rounds else None + + # Checks from last round + final_checks: list[dict[str, Any]] = [] + checks_passed = 0 + checks_failed = 0 + if last_round: + for check in last_round.checks: + final_checks.append({ + "command": check.command, + "exit_code": check.exit_code, + "passed": check.passed, + }) + if check.passed: + checks_passed += 1 + else: + checks_failed += 1 + + # Reviewer info from last round + reviewer_verdict = last_round.review.status if last_round else None + reviewer_reason = last_round.review.reason if last_round else None + reviewer_next_action = last_round.review.next_action if last_round else None + + # Planner info from last round + planner_follow_up_required = None + planner_next_explore = None + planner_main_instruction = None + if last_round and last_round.plan is not None: + planner_follow_up_required = last_round.plan.follow_up_required + planner_next_explore = last_round.plan.next_explore + planner_main_instruction = last_round.plan.main_instruction + + # Round summaries for timeline + round_data = [] + for r in rounds: + round_data.append({ + "round_index": r.round_index, + "review_status": r.review.status, + "review_confidence": r.review.confidence, + "checks_passed": all(c.passed for c in r.checks) if r.checks else True, + "main_turn_failed": r.main_turn_failed, + }) + + objective_short = objective[:80] + "..." if len(objective) > 80 else objective + + return { + "objective": objective, + "objective_short": objective_short, + "session_id": session_id, + "date": datetime.now(timezone.utc).strftime("%Y-%m-%d %H:%M UTC"), + "success": success, + "stop_reason": stop_reason, + "total_rounds": total_rounds, + "checks_passed": checks_passed, + "checks_failed": checks_failed, + "checks_total": checks_passed + checks_failed, + "reviewer_verdict": reviewer_verdict, + "reviewer_reason": reviewer_reason, + "reviewer_next_action": reviewer_next_action, + "planner_follow_up_required": planner_follow_up_required, + "planner_next_explore": planner_next_explore, + "planner_main_instruction": planner_main_instruction, + "plan_mode": plan_mode, + "final_checks": final_checks, + "rounds": round_data, + "operator_messages": operator_messages or [], + } + + +def generate_pptx_report( + *, + objective: str, + rounds: list[RoundSummary], + session_id: str | None, + success: bool, + stop_reason: str, + output_path: str, + operator_messages: list[str] | None = None, + plan_mode: str = "off", +) -> str | None: + """Generate a PPTX report. Returns the output path on success, or + ``None`` if generation failed (with a warning logged).""" + try: + data = build_report_data( + objective=objective, + rounds=rounds, + session_id=session_id, + success=success, + stop_reason=stop_reason, + operator_messages=operator_messages, + plan_mode=plan_mode, + ) + return _run_js_generator(data=data, output_path=output_path) + except Exception: + logger.warning("PPTX report generation failed", exc_info=True) + return None + + +def _run_js_generator(*, data: dict[str, Any], output_path: str) -> str | None: + """Write JSON to temp file, invoke node script, return output path.""" + if not _JS_SCRIPT.exists(): + logger.warning("PPTX JS script not found at %s", _JS_SCRIPT) + return None + + # Resolve to absolute so the node subprocess (which runs with cwd=pptx/) + # writes to the correct location. + output_path = str(Path(output_path).resolve()) + + # Ensure output dir exists + Path(output_path).parent.mkdir(parents=True, exist_ok=True) + + with tempfile.NamedTemporaryFile( + mode="w", suffix=".json", delete=False, encoding="utf-8" + ) as tmp: + json.dump(data, tmp, ensure_ascii=True) + tmp_path = tmp.name + + try: + result = subprocess.run( + ["node", str(_JS_SCRIPT), tmp_path, output_path], + capture_output=True, + text=True, + timeout=60, + cwd=str(_PROJECT_ROOT), + ) + if result.returncode != 0: + logger.warning( + "PPTX JS script exited with code %d: %s", + result.returncode, + result.stderr[:500], + ) + return None + if Path(output_path).exists(): + return output_path + logger.warning("PPTX output file not created at %s", output_path) + return None + except FileNotFoundError: + logger.warning("node binary not found; cannot generate PPTX report") + return None + except subprocess.TimeoutExpired: + logger.warning("PPTX generation timed out after 60s") + return None + finally: + try: + Path(tmp_path).unlink(missing_ok=True) + except OSError: + pass diff --git a/codex_autoloop/telegram_daemon.py b/codex_autoloop/telegram_daemon.py index ce7fe12..847432f 100644 --- a/codex_autoloop/telegram_daemon.py +++ b/codex_autoloop/telegram_daemon.py @@ -478,6 +478,8 @@ def main() -> None: scheduled_plan_request_at: dt.datetime | None = None pending_follow_up: PlanFollowUp | None = None pending_attachment_batches: dict[str, list[Any]] = {} + pending_pptx_run_objective: str | None = None + pending_pptx_run_source: str | None = None feishu_heartbeat_interval_seconds = max(0, int(args.feishu_heartbeat_interval_seconds)) last_feishu_heartbeat_monotonic = time.monotonic() run_copilot_proxy = config_from_args(args, prefix="run_") @@ -617,7 +619,7 @@ def update_status() -> None: }, ) - def start_child(objective: str, *, resume_last_session: bool = True) -> None: + def start_child(objective: str, *, resume_last_session: bool = True, pptx_report: bool = True) -> None: nonlocal child, child_objective, child_log_path, child_started_at, child_control_bus nonlocal child_run_id, child_control_path, child_resume_session_id nonlocal child_main_prompt_path, child_plan_report_path, child_plan_todo_path @@ -655,6 +657,7 @@ def start_child(objective: str, *, resume_last_session: bool = True) -> None: plan_todo_file=str(plan_todo_path), review_summaries_dir=str(review_summaries_dir), resume_session_id=resume_session_id, + pptx_report=pptx_report, ) log_file = log_path.open("w", encoding="utf-8") child = subprocess.Popen( @@ -828,7 +831,27 @@ def handle_command(command: TelegramCommand, source: str) -> None: nonlocal child, child_control_bus, pending_follow_up nonlocal plan_mode, planner_mode nonlocal pending_session_plan_goal, active_session_plan_goal + nonlocal pending_pptx_run_objective, pending_pptx_run_source log_event("command.received", source=source, kind=command.kind, text=command.text[:700]) + + # Handle pending PPTX confirmation reply + if pending_pptx_run_objective is not None: + reply = command.text.strip().lower() + if reply in ("y", "yes", "n", "no"): + objective = pending_pptx_run_objective + pptx_enabled = reply in ("y", "yes") + pending_pptx_run_objective = None + pending_pptx_run_source = None + if pending_plan_request or scheduled_plan_request_at is not None: + clear_planner_state(reason="manual_override") + start_child(objective, pptx_report=pptx_enabled) + return + # Non-Y/N reply while pending: cancel the pending run and fall through + send_reply(source, "[daemon] PPTX confirmation cancelled. Send /run again to start.") + pending_pptx_run_objective = None + pending_pptx_run_source = None + # fall through to handle the command normally + if command.kind == "help": send_reply(source, help_text()) return @@ -1117,18 +1140,10 @@ def on_complete(result) -> None: if pending_plan_request or scheduled_plan_request_at is not None: clear_planner_state(reason="manual_override") send_reply(source, "[daemon] pending plan request cleared by manual command.") - rewritten_objective = maybe_rewrite_run_objective( - enabled=bool(getattr(args, "run_objective_rewrite", False)), - objective=objective, - source=source, - run_cwd=run_cwd, - runner=daemon_runner, - model=(preset.main_model if preset is not None else args.run_main_model), - reasoning_effort=(preset.main_reasoning_effort if preset is not None else args.run_main_reasoning_effort), - send_reply=send_reply, - log_event=log_event, - ) - start_child(rewritten_objective) + # Ask about PPTX report before launching + pending_pptx_run_objective = objective + pending_pptx_run_source = source + send_reply(source, "Generate a PPTX run report at the end? Reply Y or N") return if command.kind == "stop": running = child is not None and child.poll() is None @@ -1544,6 +1559,7 @@ def build_child_command( plan_todo_file: str, review_summaries_dir: str = "", resume_session_id: str | None, + pptx_report: bool = True, ) -> list[str]: planner_mode = resolve_planner_mode(planner_enabled_flag=args.run_planner, planner_mode=args.run_planner_mode) preset = get_preset(args.run_model_preset) if args.run_model_preset else None @@ -1663,6 +1679,10 @@ def build_child_command( cmd.extend(["--state-file", args.run_state_file]) if args.run_no_dashboard: cmd.append("--no-dashboard") + if pptx_report: + cmd.append("--pptx-report") + else: + cmd.append("--no-pptx-report") if getattr(args, "run_plan_mode", PLAN_MODE_FULLY_PLAN) == PLAN_MODE_FULLY_PLAN: cmd.append("--follow-up-phase") else: diff --git a/package-lock.json b/package-lock.json new file mode 100644 index 0000000..0dc112f --- /dev/null +++ b/package-lock.json @@ -0,0 +1,755 @@ +{ + "name": "ArgusBot", + "lockfileVersion": 3, + "requires": true, + "packages": { + "": { + "dependencies": { + "pptxgenjs": "^4.0.1", + "react": "^19.2.4", + "react-dom": "^19.2.4", + "react-icons": "^5.6.0", + "sharp": "^0.34.5" + } + }, + "node_modules/@emnapi/runtime": { + "version": "1.9.0", + "resolved": "https://registry.npmjs.org/@emnapi/runtime/-/runtime-1.9.0.tgz", + "integrity": "sha512-QN75eB0IH2ywSpRpNddCRfQIhmJYBCJ1x5Lb3IscKAL8bMnVAKnRg8dCoXbHzVLLH7P38N2Z3mtulB7W0J0FKw==", + "license": "MIT", + "optional": true, + "dependencies": { + "tslib": "^2.4.0" + } + }, + "node_modules/@img/colour": { + "version": "1.1.0", + "resolved": "https://registry.npmjs.org/@img/colour/-/colour-1.1.0.tgz", + "integrity": "sha512-Td76q7j57o/tLVdgS746cYARfSyxk8iEfRxewL9h4OMzYhbW4TAcppl0mT4eyqXddh6L/jwoM75mo7ixa/pCeQ==", + "license": "MIT", + "engines": { + "node": ">=18" + } + }, + "node_modules/@img/sharp-darwin-arm64": { + "version": "0.34.5", + "resolved": "https://registry.npmjs.org/@img/sharp-darwin-arm64/-/sharp-darwin-arm64-0.34.5.tgz", + "integrity": "sha512-imtQ3WMJXbMY4fxb/Ndp6HBTNVtWCUI0WdobyheGf5+ad6xX8VIDO8u2xE4qc/fr08CKG/7dDseFtn6M6g/r3w==", + "cpu": [ + "arm64" + ], + "license": "Apache-2.0", + "optional": true, + "os": [ + "darwin" + ], + "engines": { + "node": "^18.17.0 || ^20.3.0 || >=21.0.0" + }, + "funding": { + "url": "https://opencollective.com/libvips" + }, + "optionalDependencies": { + "@img/sharp-libvips-darwin-arm64": "1.2.4" + } + }, + "node_modules/@img/sharp-darwin-x64": { + "version": "0.34.5", + "resolved": "https://registry.npmjs.org/@img/sharp-darwin-x64/-/sharp-darwin-x64-0.34.5.tgz", + "integrity": "sha512-YNEFAF/4KQ/PeW0N+r+aVVsoIY0/qxxikF2SWdp+NRkmMB7y9LBZAVqQ4yhGCm/H3H270OSykqmQMKLBhBJDEw==", + "cpu": [ + "x64" + ], + "license": "Apache-2.0", + "optional": true, + "os": [ + "darwin" + ], + "engines": { + "node": "^18.17.0 || ^20.3.0 || >=21.0.0" + }, + "funding": { + "url": "https://opencollective.com/libvips" + }, + "optionalDependencies": { + "@img/sharp-libvips-darwin-x64": "1.2.4" + } + }, + "node_modules/@img/sharp-libvips-darwin-arm64": { + "version": "1.2.4", + "resolved": "https://registry.npmjs.org/@img/sharp-libvips-darwin-arm64/-/sharp-libvips-darwin-arm64-1.2.4.tgz", + "integrity": "sha512-zqjjo7RatFfFoP0MkQ51jfuFZBnVE2pRiaydKJ1G/rHZvnsrHAOcQALIi9sA5co5xenQdTugCvtb1cuf78Vf4g==", + "cpu": [ + "arm64" + ], + "license": "LGPL-3.0-or-later", + "optional": true, + "os": [ + "darwin" + ], + "funding": { + "url": "https://opencollective.com/libvips" + } + }, + "node_modules/@img/sharp-libvips-darwin-x64": { + "version": "1.2.4", + "resolved": "https://registry.npmjs.org/@img/sharp-libvips-darwin-x64/-/sharp-libvips-darwin-x64-1.2.4.tgz", + "integrity": "sha512-1IOd5xfVhlGwX+zXv2N93k0yMONvUlANylbJw1eTah8K/Jtpi15KC+WSiaX/nBmbm2HxRM1gZ0nSdjSsrZbGKg==", + "cpu": [ + "x64" + ], + "license": "LGPL-3.0-or-later", + "optional": true, + "os": [ + "darwin" + ], + "funding": { + "url": "https://opencollective.com/libvips" + } + }, + "node_modules/@img/sharp-libvips-linux-arm": { + "version": "1.2.4", + "resolved": "https://registry.npmjs.org/@img/sharp-libvips-linux-arm/-/sharp-libvips-linux-arm-1.2.4.tgz", + "integrity": "sha512-bFI7xcKFELdiNCVov8e44Ia4u2byA+l3XtsAj+Q8tfCwO6BQ8iDojYdvoPMqsKDkuoOo+X6HZA0s0q11ANMQ8A==", + "cpu": [ + "arm" + ], + "license": "LGPL-3.0-or-later", + "optional": true, + "os": [ + "linux" + ], + "funding": { + "url": "https://opencollective.com/libvips" + } + }, + "node_modules/@img/sharp-libvips-linux-arm64": { + "version": "1.2.4", + "resolved": "https://registry.npmjs.org/@img/sharp-libvips-linux-arm64/-/sharp-libvips-linux-arm64-1.2.4.tgz", + "integrity": "sha512-excjX8DfsIcJ10x1Kzr4RcWe1edC9PquDRRPx3YVCvQv+U5p7Yin2s32ftzikXojb1PIFc/9Mt28/y+iRklkrw==", + "cpu": [ + "arm64" + ], + "license": "LGPL-3.0-or-later", + "optional": true, + "os": [ + "linux" + ], + "funding": { + "url": "https://opencollective.com/libvips" + } + }, + "node_modules/@img/sharp-libvips-linux-ppc64": { + "version": "1.2.4", + "resolved": "https://registry.npmjs.org/@img/sharp-libvips-linux-ppc64/-/sharp-libvips-linux-ppc64-1.2.4.tgz", + "integrity": "sha512-FMuvGijLDYG6lW+b/UvyilUWu5Ayu+3r2d1S8notiGCIyYU/76eig1UfMmkZ7vwgOrzKzlQbFSuQfgm7GYUPpA==", + "cpu": [ + "ppc64" + ], + "license": "LGPL-3.0-or-later", + "optional": true, + "os": [ + "linux" + ], + "funding": { + "url": "https://opencollective.com/libvips" + } + }, + "node_modules/@img/sharp-libvips-linux-riscv64": { + "version": "1.2.4", + "resolved": "https://registry.npmjs.org/@img/sharp-libvips-linux-riscv64/-/sharp-libvips-linux-riscv64-1.2.4.tgz", + "integrity": "sha512-oVDbcR4zUC0ce82teubSm+x6ETixtKZBh/qbREIOcI3cULzDyb18Sr/Wcyx7NRQeQzOiHTNbZFF1UwPS2scyGA==", + "cpu": [ + "riscv64" + ], + "license": "LGPL-3.0-or-later", + "optional": true, + "os": [ + "linux" + ], + "funding": { + "url": "https://opencollective.com/libvips" + } + }, + "node_modules/@img/sharp-libvips-linux-s390x": { + "version": "1.2.4", + "resolved": "https://registry.npmjs.org/@img/sharp-libvips-linux-s390x/-/sharp-libvips-linux-s390x-1.2.4.tgz", + "integrity": "sha512-qmp9VrzgPgMoGZyPvrQHqk02uyjA0/QrTO26Tqk6l4ZV0MPWIW6LTkqOIov+J1yEu7MbFQaDpwdwJKhbJvuRxQ==", + "cpu": [ + "s390x" + ], + "license": "LGPL-3.0-or-later", + "optional": true, + "os": [ + "linux" + ], + "funding": { + "url": "https://opencollective.com/libvips" + } + }, + "node_modules/@img/sharp-libvips-linux-x64": { + "version": "1.2.4", + "resolved": "https://registry.npmjs.org/@img/sharp-libvips-linux-x64/-/sharp-libvips-linux-x64-1.2.4.tgz", + "integrity": "sha512-tJxiiLsmHc9Ax1bz3oaOYBURTXGIRDODBqhveVHonrHJ9/+k89qbLl0bcJns+e4t4rvaNBxaEZsFtSfAdquPrw==", + "cpu": [ + "x64" + ], + "license": "LGPL-3.0-or-later", + "optional": true, + "os": [ + "linux" + ], + "funding": { + "url": "https://opencollective.com/libvips" + } + }, + "node_modules/@img/sharp-libvips-linuxmusl-arm64": { + "version": "1.2.4", + "resolved": "https://registry.npmjs.org/@img/sharp-libvips-linuxmusl-arm64/-/sharp-libvips-linuxmusl-arm64-1.2.4.tgz", + "integrity": "sha512-FVQHuwx1IIuNow9QAbYUzJ+En8KcVm9Lk5+uGUQJHaZmMECZmOlix9HnH7n1TRkXMS0pGxIJokIVB9SuqZGGXw==", + "cpu": [ + "arm64" + ], + "license": "LGPL-3.0-or-later", + "optional": true, + "os": [ + "linux" + ], + "funding": { + "url": "https://opencollective.com/libvips" + } + }, + "node_modules/@img/sharp-libvips-linuxmusl-x64": { + "version": "1.2.4", + "resolved": "https://registry.npmjs.org/@img/sharp-libvips-linuxmusl-x64/-/sharp-libvips-linuxmusl-x64-1.2.4.tgz", + "integrity": "sha512-+LpyBk7L44ZIXwz/VYfglaX/okxezESc6UxDSoyo2Ks6Jxc4Y7sGjpgU9s4PMgqgjj1gZCylTieNamqA1MF7Dg==", + "cpu": [ + "x64" + ], + "license": "LGPL-3.0-or-later", + "optional": true, + "os": [ + "linux" + ], + "funding": { + "url": "https://opencollective.com/libvips" + } + }, + "node_modules/@img/sharp-linux-arm": { + "version": "0.34.5", + "resolved": "https://registry.npmjs.org/@img/sharp-linux-arm/-/sharp-linux-arm-0.34.5.tgz", + "integrity": "sha512-9dLqsvwtg1uuXBGZKsxem9595+ujv0sJ6Vi8wcTANSFpwV/GONat5eCkzQo/1O6zRIkh0m/8+5BjrRr7jDUSZw==", + "cpu": [ + "arm" + ], + "license": "Apache-2.0", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": "^18.17.0 || ^20.3.0 || >=21.0.0" + }, + "funding": { + "url": "https://opencollective.com/libvips" + }, + "optionalDependencies": { + "@img/sharp-libvips-linux-arm": "1.2.4" + } + }, + "node_modules/@img/sharp-linux-arm64": { + "version": "0.34.5", + "resolved": "https://registry.npmjs.org/@img/sharp-linux-arm64/-/sharp-linux-arm64-0.34.5.tgz", + "integrity": "sha512-bKQzaJRY/bkPOXyKx5EVup7qkaojECG6NLYswgktOZjaXecSAeCWiZwwiFf3/Y+O1HrauiE3FVsGxFg8c24rZg==", + "cpu": [ + "arm64" + ], + "license": "Apache-2.0", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": "^18.17.0 || ^20.3.0 || >=21.0.0" + }, + "funding": { + "url": "https://opencollective.com/libvips" + }, + "optionalDependencies": { + "@img/sharp-libvips-linux-arm64": "1.2.4" + } + }, + "node_modules/@img/sharp-linux-ppc64": { + "version": "0.34.5", + "resolved": "https://registry.npmjs.org/@img/sharp-linux-ppc64/-/sharp-linux-ppc64-0.34.5.tgz", + "integrity": "sha512-7zznwNaqW6YtsfrGGDA6BRkISKAAE1Jo0QdpNYXNMHu2+0dTrPflTLNkpc8l7MUP5M16ZJcUvysVWWrMefZquA==", + "cpu": [ + "ppc64" + ], + "license": "Apache-2.0", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": "^18.17.0 || ^20.3.0 || >=21.0.0" + }, + "funding": { + "url": "https://opencollective.com/libvips" + }, + "optionalDependencies": { + "@img/sharp-libvips-linux-ppc64": "1.2.4" + } + }, + "node_modules/@img/sharp-linux-riscv64": { + "version": "0.34.5", + "resolved": "https://registry.npmjs.org/@img/sharp-linux-riscv64/-/sharp-linux-riscv64-0.34.5.tgz", + "integrity": "sha512-51gJuLPTKa7piYPaVs8GmByo7/U7/7TZOq+cnXJIHZKavIRHAP77e3N2HEl3dgiqdD/w0yUfiJnII77PuDDFdw==", + "cpu": [ + "riscv64" + ], + "license": "Apache-2.0", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": "^18.17.0 || ^20.3.0 || >=21.0.0" + }, + "funding": { + "url": "https://opencollective.com/libvips" + }, + "optionalDependencies": { + "@img/sharp-libvips-linux-riscv64": "1.2.4" + } + }, + "node_modules/@img/sharp-linux-s390x": { + "version": "0.34.5", + "resolved": "https://registry.npmjs.org/@img/sharp-linux-s390x/-/sharp-linux-s390x-0.34.5.tgz", + "integrity": "sha512-nQtCk0PdKfho3eC5MrbQoigJ2gd1CgddUMkabUj+rBevs8tZ2cULOx46E7oyX+04WGfABgIwmMC0VqieTiR4jg==", + "cpu": [ + "s390x" + ], + "license": "Apache-2.0", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": "^18.17.0 || ^20.3.0 || >=21.0.0" + }, + "funding": { + "url": "https://opencollective.com/libvips" + }, + "optionalDependencies": { + "@img/sharp-libvips-linux-s390x": "1.2.4" + } + }, + "node_modules/@img/sharp-linux-x64": { + "version": "0.34.5", + "resolved": "https://registry.npmjs.org/@img/sharp-linux-x64/-/sharp-linux-x64-0.34.5.tgz", + "integrity": "sha512-MEzd8HPKxVxVenwAa+JRPwEC7QFjoPWuS5NZnBt6B3pu7EG2Ge0id1oLHZpPJdn3OQK+BQDiw9zStiHBTJQQQQ==", + "cpu": [ + "x64" + ], + "license": "Apache-2.0", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": "^18.17.0 || ^20.3.0 || >=21.0.0" + }, + "funding": { + "url": "https://opencollective.com/libvips" + }, + "optionalDependencies": { + "@img/sharp-libvips-linux-x64": "1.2.4" + } + }, + "node_modules/@img/sharp-linuxmusl-arm64": { + "version": "0.34.5", + "resolved": "https://registry.npmjs.org/@img/sharp-linuxmusl-arm64/-/sharp-linuxmusl-arm64-0.34.5.tgz", + "integrity": "sha512-fprJR6GtRsMt6Kyfq44IsChVZeGN97gTD331weR1ex1c1rypDEABN6Tm2xa1wE6lYb5DdEnk03NZPqA7Id21yg==", + "cpu": [ + "arm64" + ], + "license": "Apache-2.0", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": "^18.17.0 || ^20.3.0 || >=21.0.0" + }, + "funding": { + "url": "https://opencollective.com/libvips" + }, + "optionalDependencies": { + "@img/sharp-libvips-linuxmusl-arm64": "1.2.4" + } + }, + "node_modules/@img/sharp-linuxmusl-x64": { + "version": "0.34.5", + "resolved": "https://registry.npmjs.org/@img/sharp-linuxmusl-x64/-/sharp-linuxmusl-x64-0.34.5.tgz", + "integrity": "sha512-Jg8wNT1MUzIvhBFxViqrEhWDGzqymo3sV7z7ZsaWbZNDLXRJZoRGrjulp60YYtV4wfY8VIKcWidjojlLcWrd8Q==", + "cpu": [ + "x64" + ], + "license": "Apache-2.0", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": "^18.17.0 || ^20.3.0 || >=21.0.0" + }, + "funding": { + "url": "https://opencollective.com/libvips" + }, + "optionalDependencies": { + "@img/sharp-libvips-linuxmusl-x64": "1.2.4" + } + }, + "node_modules/@img/sharp-wasm32": { + "version": "0.34.5", + "resolved": "https://registry.npmjs.org/@img/sharp-wasm32/-/sharp-wasm32-0.34.5.tgz", + "integrity": "sha512-OdWTEiVkY2PHwqkbBI8frFxQQFekHaSSkUIJkwzclWZe64O1X4UlUjqqqLaPbUpMOQk6FBu/HtlGXNblIs0huw==", + "cpu": [ + "wasm32" + ], + "license": "Apache-2.0 AND LGPL-3.0-or-later AND MIT", + "optional": true, + "dependencies": { + "@emnapi/runtime": "^1.7.0" + }, + "engines": { + "node": "^18.17.0 || ^20.3.0 || >=21.0.0" + }, + "funding": { + "url": "https://opencollective.com/libvips" + } + }, + "node_modules/@img/sharp-win32-arm64": { + "version": "0.34.5", + "resolved": "https://registry.npmjs.org/@img/sharp-win32-arm64/-/sharp-win32-arm64-0.34.5.tgz", + "integrity": "sha512-WQ3AgWCWYSb2yt+IG8mnC6Jdk9Whs7O0gxphblsLvdhSpSTtmu69ZG1Gkb6NuvxsNACwiPV6cNSZNzt0KPsw7g==", + "cpu": [ + "arm64" + ], + "license": "Apache-2.0 AND LGPL-3.0-or-later", + "optional": true, + "os": [ + "win32" + ], + "engines": { + "node": "^18.17.0 || ^20.3.0 || >=21.0.0" + }, + "funding": { + "url": "https://opencollective.com/libvips" + } + }, + "node_modules/@img/sharp-win32-ia32": { + "version": "0.34.5", + "resolved": "https://registry.npmjs.org/@img/sharp-win32-ia32/-/sharp-win32-ia32-0.34.5.tgz", + "integrity": "sha512-FV9m/7NmeCmSHDD5j4+4pNI8Cp3aW+JvLoXcTUo0IqyjSfAZJ8dIUmijx1qaJsIiU+Hosw6xM5KijAWRJCSgNg==", + "cpu": [ + "ia32" + ], + "license": "Apache-2.0 AND LGPL-3.0-or-later", + "optional": true, + "os": [ + "win32" + ], + "engines": { + "node": "^18.17.0 || ^20.3.0 || >=21.0.0" + }, + "funding": { + "url": "https://opencollective.com/libvips" + } + }, + "node_modules/@img/sharp-win32-x64": { + "version": "0.34.5", + "resolved": "https://registry.npmjs.org/@img/sharp-win32-x64/-/sharp-win32-x64-0.34.5.tgz", + "integrity": "sha512-+29YMsqY2/9eFEiW93eqWnuLcWcufowXewwSNIT6UwZdUUCrM3oFjMWH/Z6/TMmb4hlFenmfAVbpWeup2jryCw==", + "cpu": [ + "x64" + ], + "license": "Apache-2.0 AND LGPL-3.0-or-later", + "optional": true, + "os": [ + "win32" + ], + "engines": { + "node": "^18.17.0 || ^20.3.0 || >=21.0.0" + }, + "funding": { + "url": "https://opencollective.com/libvips" + } + }, + "node_modules/@types/node": { + "version": "22.19.15", + "resolved": "https://registry.npmjs.org/@types/node/-/node-22.19.15.tgz", + "integrity": "sha512-F0R/h2+dsy5wJAUe3tAU6oqa2qbWY5TpNfL/RGmo1y38hiyO1w3x2jPtt76wmuaJI4DQnOBu21cNXQ2STIUUWg==", + "license": "MIT", + "dependencies": { + "undici-types": "~6.21.0" + } + }, + "node_modules/core-util-is": { + "version": "1.0.3", + "resolved": "https://registry.npmjs.org/core-util-is/-/core-util-is-1.0.3.tgz", + "integrity": "sha512-ZQBvi1DcpJ4GDqanjucZ2Hj3wEO5pZDS89BWbkcrvdxksJorwUDDZamX9ldFkp9aw2lmBDLgkObEA4DWNJ9FYQ==", + "license": "MIT" + }, + "node_modules/detect-libc": { + "version": "2.1.2", + "resolved": "https://registry.npmjs.org/detect-libc/-/detect-libc-2.1.2.tgz", + "integrity": "sha512-Btj2BOOO83o3WyH59e8MgXsxEQVcarkUOpEYrubB0urwnN10yQ364rsiByU11nZlqWYZm05i/of7io4mzihBtQ==", + "license": "Apache-2.0", + "engines": { + "node": ">=8" + } + }, + "node_modules/https": { + "version": "1.0.0", + "resolved": "https://registry.npmjs.org/https/-/https-1.0.0.tgz", + "integrity": "sha512-4EC57ddXrkaF0x83Oj8sM6SLQHAWXw90Skqu2M4AEWENZ3F02dFJE/GARA8igO79tcgYqGrD7ae4f5L3um2lgg==", + "license": "ISC" + }, + "node_modules/image-size": { + "version": "1.2.1", + "resolved": "https://registry.npmjs.org/image-size/-/image-size-1.2.1.tgz", + "integrity": "sha512-rH+46sQJ2dlwfjfhCyNx5thzrv+dtmBIhPHk0zgRUukHzZ/kRueTJXoYYsclBaKcSMBWuGbOFXtioLpzTb5euw==", + "license": "MIT", + "dependencies": { + "queue": "6.0.2" + }, + "bin": { + "image-size": "bin/image-size.js" + }, + "engines": { + "node": ">=16.x" + } + }, + "node_modules/immediate": { + "version": "3.0.6", + "resolved": "https://registry.npmjs.org/immediate/-/immediate-3.0.6.tgz", + "integrity": "sha512-XXOFtyqDjNDAQxVfYxuF7g9Il/IbWmmlQg2MYKOH8ExIT1qg6xc4zyS3HaEEATgs1btfzxq15ciUiY7gjSXRGQ==", + "license": "MIT" + }, + "node_modules/inherits": { + "version": "2.0.4", + "resolved": "https://registry.npmjs.org/inherits/-/inherits-2.0.4.tgz", + "integrity": "sha512-k/vGaX4/Yla3WzyMCvTQOXYeIHvqOKtnqBduzTHpzpQZzAskKMhZ2K+EnBiSM9zGSoIFeMpXKxa4dYeZIQqewQ==", + "license": "ISC" + }, + "node_modules/isarray": { + "version": "1.0.0", + "resolved": "https://registry.npmjs.org/isarray/-/isarray-1.0.0.tgz", + "integrity": "sha512-VLghIWNM6ELQzo7zwmcg0NmTVyWKYjvIeM83yjp0wRDTmUnrM678fQbcKBo6n2CJEF0szoG//ytg+TKla89ALQ==", + "license": "MIT" + }, + "node_modules/jszip": { + "version": "3.10.1", + "resolved": "https://registry.npmjs.org/jszip/-/jszip-3.10.1.tgz", + "integrity": "sha512-xXDvecyTpGLrqFrvkrUSoxxfJI5AH7U8zxxtVclpsUtMCq4JQ290LY8AW5c7Ggnr/Y/oK+bQMbqK2qmtk3pN4g==", + "license": "(MIT OR GPL-3.0-or-later)", + "dependencies": { + "lie": "~3.3.0", + "pako": "~1.0.2", + "readable-stream": "~2.3.6", + "setimmediate": "^1.0.5" + } + }, + "node_modules/lie": { + "version": "3.3.0", + "resolved": "https://registry.npmjs.org/lie/-/lie-3.3.0.tgz", + "integrity": "sha512-UaiMJzeWRlEujzAuw5LokY1L5ecNQYZKfmyZ9L7wDHb/p5etKaxXhohBcrw0EYby+G/NA52vRSN4N39dxHAIwQ==", + "license": "MIT", + "dependencies": { + "immediate": "~3.0.5" + } + }, + "node_modules/pako": { + "version": "1.0.11", + "resolved": "https://registry.npmjs.org/pako/-/pako-1.0.11.tgz", + "integrity": "sha512-4hLB8Py4zZce5s4yd9XzopqwVv/yGNhV1Bl8NTmCq1763HeK2+EwVTv+leGeL13Dnh2wfbqowVPXCIO0z4taYw==", + "license": "(MIT AND Zlib)" + }, + "node_modules/pptxgenjs": { + "version": "4.0.1", + "resolved": "https://registry.npmjs.org/pptxgenjs/-/pptxgenjs-4.0.1.tgz", + "integrity": "sha512-TeJISr8wouAuXw4C1F/mC33xbZs/FuEG6nH9FG1Zj+nuPcGMP5YRHl6X+j3HSUnS1f3at6k75ZZXPMZlA5Lj9A==", + "license": "MIT", + "dependencies": { + "@types/node": "^22.8.1", + "https": "^1.0.0", + "image-size": "^1.2.1", + "jszip": "^3.10.1" + } + }, + "node_modules/process-nextick-args": { + "version": "2.0.1", + "resolved": "https://registry.npmjs.org/process-nextick-args/-/process-nextick-args-2.0.1.tgz", + "integrity": "sha512-3ouUOpQhtgrbOa17J7+uxOTpITYWaGP7/AhoR3+A+/1e9skrzelGi/dXzEYyvbxubEF6Wn2ypscTKiKJFFn1ag==", + "license": "MIT" + }, + "node_modules/queue": { + "version": "6.0.2", + "resolved": "https://registry.npmjs.org/queue/-/queue-6.0.2.tgz", + "integrity": "sha512-iHZWu+q3IdFZFX36ro/lKBkSvfkztY5Y7HMiPlOUjhupPcG2JMfst2KKEpu5XndviX/3UhFbRngUPNKtgvtZiA==", + "license": "MIT", + "dependencies": { + "inherits": "~2.0.3" + } + }, + "node_modules/react": { + "version": "19.2.4", + "resolved": "https://registry.npmjs.org/react/-/react-19.2.4.tgz", + "integrity": "sha512-9nfp2hYpCwOjAN+8TZFGhtWEwgvWHXqESH8qT89AT/lWklpLON22Lc8pEtnpsZz7VmawabSU0gCjnj8aC0euHQ==", + "license": "MIT", + "engines": { + "node": ">=0.10.0" + } + }, + "node_modules/react-dom": { + "version": "19.2.4", + "resolved": "https://registry.npmjs.org/react-dom/-/react-dom-19.2.4.tgz", + "integrity": "sha512-AXJdLo8kgMbimY95O2aKQqsz2iWi9jMgKJhRBAxECE4IFxfcazB2LmzloIoibJI3C12IlY20+KFaLv+71bUJeQ==", + "license": "MIT", + "dependencies": { + "scheduler": "^0.27.0" + }, + "peerDependencies": { + "react": "^19.2.4" + } + }, + "node_modules/react-icons": { + "version": "5.6.0", + "resolved": "https://registry.npmjs.org/react-icons/-/react-icons-5.6.0.tgz", + "integrity": "sha512-RH93p5ki6LfOiIt0UtDyNg/cee+HLVR6cHHtW3wALfo+eOHTp8RnU2kRkI6E+H19zMIs03DyxUG/GfZMOGvmiA==", + "license": "MIT", + "peerDependencies": { + "react": "*" + } + }, + "node_modules/readable-stream": { + "version": "2.3.8", + "resolved": "https://registry.npmjs.org/readable-stream/-/readable-stream-2.3.8.tgz", + "integrity": "sha512-8p0AUk4XODgIewSi0l8Epjs+EVnWiK7NoDIEGU0HhE7+ZyY8D1IMY7odu5lRrFXGg71L15KG8QrPmum45RTtdA==", + "license": "MIT", + "dependencies": { + "core-util-is": "~1.0.0", + "inherits": "~2.0.3", + "isarray": "~1.0.0", + "process-nextick-args": "~2.0.0", + "safe-buffer": "~5.1.1", + "string_decoder": "~1.1.1", + "util-deprecate": "~1.0.1" + } + }, + "node_modules/safe-buffer": { + "version": "5.1.2", + "resolved": "https://registry.npmjs.org/safe-buffer/-/safe-buffer-5.1.2.tgz", + "integrity": "sha512-Gd2UZBJDkXlY7GbJxfsE8/nvKkUEU1G38c1siN6QP6a9PT9MmHB8GnpscSmMJSoF8LOIrt8ud/wPtojys4G6+g==", + "license": "MIT" + }, + "node_modules/scheduler": { + "version": "0.27.0", + "resolved": "https://registry.npmjs.org/scheduler/-/scheduler-0.27.0.tgz", + "integrity": "sha512-eNv+WrVbKu1f3vbYJT/xtiF5syA5HPIMtf9IgY/nKg0sWqzAUEvqY/xm7OcZc/qafLx/iO9FgOmeSAp4v5ti/Q==", + "license": "MIT" + }, + "node_modules/semver": { + "version": "7.7.4", + "resolved": "https://registry.npmjs.org/semver/-/semver-7.7.4.tgz", + "integrity": "sha512-vFKC2IEtQnVhpT78h1Yp8wzwrf8CM+MzKMHGJZfBtzhZNycRFnXsHk6E5TxIkkMsgNS7mdX3AGB7x2QM2di4lA==", + "license": "ISC", + "bin": { + "semver": "bin/semver.js" + }, + "engines": { + "node": ">=10" + } + }, + "node_modules/setimmediate": { + "version": "1.0.5", + "resolved": "https://registry.npmjs.org/setimmediate/-/setimmediate-1.0.5.tgz", + "integrity": "sha512-MATJdZp8sLqDl/68LfQmbP8zKPLQNV6BIZoIgrscFDQ+RsvK/BxeDQOgyxKKoh0y/8h3BqVFnCqQ/gd+reiIXA==", + "license": "MIT" + }, + "node_modules/sharp": { + "version": "0.34.5", + "resolved": "https://registry.npmjs.org/sharp/-/sharp-0.34.5.tgz", + "integrity": "sha512-Ou9I5Ft9WNcCbXrU9cMgPBcCK8LiwLqcbywW3t4oDV37n1pzpuNLsYiAV8eODnjbtQlSDwZ2cUEeQz4E54Hltg==", + "hasInstallScript": true, + "license": "Apache-2.0", + "dependencies": { + "@img/colour": "^1.0.0", + "detect-libc": "^2.1.2", + "semver": "^7.7.3" + }, + "engines": { + "node": "^18.17.0 || ^20.3.0 || >=21.0.0" + }, + "funding": { + "url": "https://opencollective.com/libvips" + }, + "optionalDependencies": { + "@img/sharp-darwin-arm64": "0.34.5", + "@img/sharp-darwin-x64": "0.34.5", + "@img/sharp-libvips-darwin-arm64": "1.2.4", + "@img/sharp-libvips-darwin-x64": "1.2.4", + "@img/sharp-libvips-linux-arm": "1.2.4", + "@img/sharp-libvips-linux-arm64": "1.2.4", + "@img/sharp-libvips-linux-ppc64": "1.2.4", + "@img/sharp-libvips-linux-riscv64": "1.2.4", + "@img/sharp-libvips-linux-s390x": "1.2.4", + "@img/sharp-libvips-linux-x64": "1.2.4", + "@img/sharp-libvips-linuxmusl-arm64": "1.2.4", + "@img/sharp-libvips-linuxmusl-x64": "1.2.4", + "@img/sharp-linux-arm": "0.34.5", + "@img/sharp-linux-arm64": "0.34.5", + "@img/sharp-linux-ppc64": "0.34.5", + "@img/sharp-linux-riscv64": "0.34.5", + "@img/sharp-linux-s390x": "0.34.5", + "@img/sharp-linux-x64": "0.34.5", + "@img/sharp-linuxmusl-arm64": "0.34.5", + "@img/sharp-linuxmusl-x64": "0.34.5", + "@img/sharp-wasm32": "0.34.5", + "@img/sharp-win32-arm64": "0.34.5", + "@img/sharp-win32-ia32": "0.34.5", + "@img/sharp-win32-x64": "0.34.5" + } + }, + "node_modules/string_decoder": { + "version": "1.1.1", + "resolved": "https://registry.npmjs.org/string_decoder/-/string_decoder-1.1.1.tgz", + "integrity": "sha512-n/ShnvDi6FHbbVfviro+WojiFzv+s8MPMHBczVePfUpDJLwoLT0ht1l4YwBCbi8pJAveEEdnkHyPyTP/mzRfwg==", + "license": "MIT", + "dependencies": { + "safe-buffer": "~5.1.0" + } + }, + "node_modules/tslib": { + "version": "2.8.1", + "resolved": "https://registry.npmjs.org/tslib/-/tslib-2.8.1.tgz", + "integrity": "sha512-oJFu94HQb+KVduSUQL7wnpmqnfmLsOA/nAh6b6EH0wCEoK0/mPeXU6c3wKDV83MkOuHPRHtSXKKU99IBazS/2w==", + "license": "0BSD", + "optional": true + }, + "node_modules/undici-types": { + "version": "6.21.0", + "resolved": "https://registry.npmjs.org/undici-types/-/undici-types-6.21.0.tgz", + "integrity": "sha512-iwDZqg0QAGrg9Rav5H4n0M64c3mkR59cJ6wQp+7C4nI0gsmExaedaYLNO44eT4AtBBwjbTiGPMlt2Md0T9H9JQ==", + "license": "MIT" + }, + "node_modules/util-deprecate": { + "version": "1.0.2", + "resolved": "https://registry.npmjs.org/util-deprecate/-/util-deprecate-1.0.2.tgz", + "integrity": "sha512-EPD5q1uXyFxJpCrLnCc1nHnq3gOa6DZBocAIiI2TaSCA7VCJ1UJDMagCzIkXNsUYfD1daK//LTEQ8xiIbrHtcw==", + "license": "MIT" + } + } +} diff --git a/package.json b/package.json new file mode 100644 index 0000000..51730f3 --- /dev/null +++ b/package.json @@ -0,0 +1,9 @@ +{ + "dependencies": { + "pptxgenjs": "^4.0.1", + "react": "^19.2.4", + "react-dom": "^19.2.4", + "react-icons": "^5.6.0", + "sharp": "^0.34.5" + } +} diff --git a/research/README.md b/research/README.md new file mode 100644 index 0000000..bffae1f --- /dev/null +++ b/research/README.md @@ -0,0 +1,98 @@ +# Legal AI Research Package Index + +Snapshot date: `2026-03-20` + +This folder is the repository-side knowledge-base entrypoint for the completed legal AI research package on how legal LLM / AI currently intersects with law. + +## Package Status + +- package id: `legal-ai-research-package-2026-03-20-v1` +- canonical role: `current legal-AI research main package` +- canonical version: `v1.0` +- status: `Active baseline` +- version-control status: `git-tracked delivery baseline` +- baseline tag: `legal-ai-research-package-2026-03-20-v1.0-baseline` +- baseline evidence receipt: `legal-ai-research-package-baseline-receipt-2026-03-20.md` +- source of truth: `research/` +- maintenance mode: dated tracker updates first, memo or matrix rewrites only when the boundary actually changes +- maintenance cadence: weekly tracker sweeps, monthly register / package review, and version-boundary review when `L2 / L3` events or package-governance rule changes occur + +## Start Here + +- `legal-llm-law-intersections-2026-03-20.md` + - direct synthesis of how legal LLM / AI intersects with law +- `legal-ai-opportunity-risk-matrix-2026-03-20.md` + - prioritized scenarios, safeguards, and go / no-go logic +- `legal-ai-research-package-register-2026-03-20.md` + - package registration, owner-of-record, maintenance cadence, versioning, and change-log rules +- `legal-ai-research-package-baseline-receipt-2026-03-20.md` + - git baseline receipt for the locked `2026-03-20` delivery snapshot, including manifest and checksum evidence + +## Monitoring Layer + +- `legal-ai-regulatory-monitoring-tracker-2026-03-20.md` + - cross-jurisdiction regulatory, filing, labeling, enforcement, and market-boundary monitoring +- `court-facing-ai-rules-sanction-risk-tracker-2026-03-20.md` + - filing, disclosure, verification, evidence, and sanctions monitoring for court-facing workflows + +## Audit Trail + +- `legal-ai-research-package-register-2026-03-20.md` + - package registration log, owner-of-record rules, maintenance cadence, change-log discipline, and version boundary +- `legal-ai-research-package-baseline-receipt-2026-03-20.md` + - baseline evidence file: branch / parent-commit context, file manifest, SHA-256 checksums, and reproduction commands for the locked delivery baseline +- `legal-ai-regulatory-monitoring-tracker-2026-03-20.md` + - first dated China governance / labeling / filing / sector-rule update, summary-level source-access rollup, official-anchor split, per-anchor baseline evidence pack, baseline source-access backfill, anthropomorphic-interaction / professional-domain obligation anchor, open China queue, meaningful-change rubric, and standard weekly output template +- `court-facing-ai-rules-sanction-risk-tracker-2026-03-20.md` + - first dated court-facing refresh log, summary-level source-access rollup, baseline source-access backfill, court-rule monitoring queue, minimum evidence pack, meaningful-change rubric, and standard weekly output template + +## China Operating Layer + +- `china-legal-ai-go-no-go-memo-2026-03-20.md` +- `china-contract-compliance-copilot-management-memo-2026-03-20.md` +- `china-contract-compliance-copilot-management-brief-2026-03-20.md` +- `china-contract-compliance-copilot-ops-checklist-2026-03-20.md` +- `china-contract-compliance-copilot-execution-tracker-2026-03-20.md` +- `china-contract-compliance-copilot-validation-plan-2026-03-20.md` + +## Jurisdiction Extensions + +- `singapore-legal-ai-go-no-go-memo-2026-03-20.md` +- `hong-kong-legal-ai-go-no-go-memo-2026-03-20.md` +- `hong-kong-legal-ai-management-brief-2026-03-20.md` +- `uk-australia-uae-legal-ai-market-comparison-2026-03-20.md` + +## Owner Roles + +- `Research owner` + - keeps the package baseline current and writes dated tracker entries +- `Legal / compliance owner` + - decides whether a rule change alters the legal or compliance boundary +- `Product owner` + - decides whether roadmap, feature scope, or stop conditions need to change +- `Market / GTM owner` + - decides whether jurisdiction priority or external positioning should change + +## Next Dated Checkpoints + +- `2026-03-27` + - next China governance / filing / labeling / sector-rule sweep +- `2026-03-27` + - next China `CAC` anthropomorphic-interaction / professional-domain-obligation checkpoint, including `MOJ / 12348 / ACLA` follow-on sweep plus `source unavailable / fallback official page used` logging if the `MOJ` legal-service channel pages self-redirect or time out; for `MOJ`, the default fallback order is the same-domain `pub/sfbgw/...` article-page family, then `pub/sfbgwapp/...`, and then manual browser verification if automation still fails +- `2026-03-27` + - next multi-jurisdiction court-facing weekly sweep +- `2026-04-03` + - next China sector-specific-rule sweep already queued in the regulatory tracker +- `2026-04-03` + - next China court-facing / public-legal-service rule sweep already queued in the regulatory tracker +- `2026-04-14 23:59` + - England-and-Wales consultation closing-time recheck already queued in the monitoring trackers after the early-completed `CJC` `L2` impact analysis + +## Operating Rule + +- Treat this folder as a dated research snapshot, not timeless doctrine. +- Record new official developments in the trackers first. +- For package-governance changes, use `legal-ai-research-package-register-2026-03-20.md` sections `3`, `3.1`, `4`, and `4.1` to decide owner, cadence, version impact, and whether the change must be logged in the register the same day. +- For China weekly sweeps, use `legal-ai-regulatory-monitoring-tracker-2026-03-20.md` sections `12.1`, `12.2`, and `12.3` to capture evidence, keep the `source access / fallback used` field explicit in both the dated summary row and the per-anchor output, log `source unavailable / fallback official page used` when official legal-sector pages are inaccessible, follow the recorded `MOJ` root-path to `pub/sfbgw/...` then `pub/sfbgwapp/...` fallback order, escalate to manual browser verification if automation still fails, distinguish official-domain local-practice / industry-dynamic AI articles from national formal legal-sector guidance before judging meaningful change or rewriting memos or the matrix, and when the judgment is `signal-only / L1 no-change`, record at least one concrete official-domain example page title or path. +- For court-facing weekly sweeps, start from `court-facing-ai-rules-sanction-risk-tracker-2026-03-20.md` sections `17`, `17.1`, `17.2`, and `17.3`, keep the `source access / fallback used` field explicit in both the dated summary row and the per-anchor output, and then use section `14` when recording the dated event; for the already-completed England-and-Wales `CJC` `L2` item, if official materials only confirm the consultation remains open and the same evidence-stage proposal is still on foot before `2026-04-14 23:59`, record tracker-level `L1 no-change` only and do not reopen the prior `L2` writeback unless official materials materially change. +- Rewrite synthesis or memo documents only when the new development actually changes go / no-go, safeguards, filing boundaries, or jurisdiction priority. diff --git a/research/china-contract-compliance-copilot-execution-tracker-2026-03-20.md b/research/china-contract-compliance-copilot-execution-tracker-2026-03-20.md new file mode 100644 index 0000000..34c5c1c --- /dev/null +++ b/research/china-contract-compliance-copilot-execution-tracker-2026-03-20.md @@ -0,0 +1,309 @@ +# 中国企业合同 / 合规 copilot 执行 tracker + +日期:2026-03-20 + +用途:把现有中国首发研究包中分散在管理层 memo、ops checklist、管理层 brief 和中国法域总 memo 里的 Gate 0-5、责任人、验收证据、周报节奏、红旗信号、停机动作,以及 `30 / 60 / 90` 天复盘机制,固化为一份单一执行文档。 + +适用范围: + +- 中国法域 +- 企业内部合同 / 合规 copilot +- 首个 pilot 客户或首个业务单元 +- 不适用于面向公众的开放式法律意见服务 + +关联文档: + +- `research/china-contract-compliance-copilot-management-memo-2026-03-20.md` +- `research/china-contract-compliance-copilot-ops-checklist-2026-03-20.md` +- `research/china-contract-compliance-copilot-management-brief-2026-03-20.md` +- `research/china-contract-compliance-copilot-validation-plan-2026-03-20.md` +- `research/china-legal-ai-go-no-go-memo-2026-03-20.md` + +这不是法律意见,而是 `2026-03-20` 的执行治理文档。 + +## 1. 使用方式 + +这份 tracker 只服务一个目标:把“可以做中国企业合同 / 合规 copilot”变成一个**有 gate、有证据、有节奏、有停机机制**的 pilot 管理过程。 + +执行原则: + +- 每个 Gate 都必须有责任人、留档证据和明确退出条件。 +- 任何红旗信号如果命中停机阈值,都优先触发停机或降级,不为赶进度跳过控制。 +- 周报、双周评审和 `30 / 60 / 90` 天复盘都必须基于同一套指标、证据和问题清单。 +- 这份 tracker 优先级高于口头共识;如果口头说法和 tracker 冲突,以 tracker 为准。 + +## 2. 角色与决策权 + +| 角色 | 主要职责 | 关键决策权 | +| --- | --- | --- | +| 管理层 sponsor | 批准立项、试点、扩容或终止 | Gate 0 立项、Gate 5 扩容 / 终止 | +| 产品负责人 | 管理 scope、周报、用户反馈、复盘组织 | Gate 2 启动建议、Gate 3 缩 scope / HOLD 建议 | +| 法务负责人 | 定义人工签发、升级规则、法务验收标准 | Gate 2 是否可进真实工作流、Gate 4 go / no-go 建议 | +| 合规负责人 | 判断场景边界、监管义务、行业控制项 | Gate 1 / 3 / 4 合规边界判断 | +| 安全负责人 | 数据流、权限、日志、导出和供应商治理 | Gate 1 是否可进真实数据、停机事件分级 | +| 架构 / 研发负责人 | 部署模式、来源回链、日志、功能降级 | 停机执行、恢复技术判断 | +| 研究 / 知识管理负责人 | benchmark、失败用例、规则库版本、复盘沉淀 | Gate 2 评测就绪、Gate 4 失败用例收口 | +| 法务运营 / 项目经理 | 周报收集、用户培训、会议节奏、证据归档 | 周报发布、双周会追踪、行动项闭环 | +| 客户业务 owner | 提供场景边界、流程入口、最终业务验收反馈 | Gate 3 是否继续试点、Gate 4 业务侧验收 | + +## 3. Gate 0-5 执行总表 + +| Gate | 目标 | 责任人 | 必须具备的验收证据 | 周报 / 评审节奏 | 红旗信号 | 停机 / 降级动作 | +| --- | --- | --- | --- | --- | --- | --- | +| Gate 0:售前资格筛选 | 判断客户与场景是否值得进入方案阶段 | 销售、产品负责人、管理层 sponsor | 业务 sponsor 名单、目标团队说明、使用边界说明、客户确认“辅助判断 + 人工签发” | 立项前一次评审;未过不进入周报机制 | 客户要求自动审批;目标团队实为公众服务;无明确 owner;无现有流程可接 | 不立项;只保留需求记录 | +| Gate 1:安全 / 法务评审 | 判断是否可以进入真实数据前准备 | 安全负责人、法务负责人、合规负责人、架构负责人 | 数据流图、部署模式说明、供应商条款审查记录、权限矩阵初稿、保留 / 删除 / 再训练说明 | 每周跟进未闭项;通过前不进真实数据 | 无法说清 retention / deletion / sub-processor;无法做租户隔离;涉及个人信息但未评估 | 禁止进入真实数据环境;必要时降为离线样例 | +| Gate 2:pilot 启动 | 判断是否可以进入受限 pilot | 产品负责人、法务负责人、研究负责人、研发负责人 | 模板 / 规则库版本清单、失败用例清单、审批流接入方案、review queue、来源回链样例、培训记录 | 启动前评审;通过后进入周报和双周会 | 无人工签发人;无高风险升级规则;无 benchmark;来源不能回链 | 不开 pilot;保持离线验证 | +| Gate 3:pilot 中期 | 判断继续、缩 scope 还是进入 HOLD | 产品负责人、法务负责人、合规负责人、安全负责人、客户业务 owner | 周报、双周报、人工采纳率、升级命中率、来源回链抽查、失败用例新增记录、日志抽查结果 | 每周周报;每双周正式评审;必要时专项会 | 时长不降反升;人工持续不采纳;高风险漏升级;越权访问;用户绕过 review queue | 缩 scope;冻结高风险功能;转 `HOLD`;必要时切回离线模式 | +| Gate 4:pilot 收口 | 形成正式 go / no-go / hold 结论 | 产品负责人、法务负责人、安全负责人、研究负责人、管理层 sponsor | 失败用例复盘、更新版数据流图、日志样例、法务复核反馈、未解决风险清单、收口结论纪要 | 收口评审一次;输出正式结论 | 核心红旗未消除;证据缺口仍大;关键趋势不稳定;客户业务方拒绝继续 | 不进扩容评审;维持试点或终止 | +| Gate 5:扩容前 | 判断能否从单业务单元扩到更多团队 / 合同类型 | 管理层 sponsor、产品负责人、法务负责人、安全负责人、客户业务 owner | 指标趋势图、扩容技术评估、扩容后权限 / 日志方案、更新版 benchmark、最终复盘结论 | 扩容前一次正式评审;扩容后重新进入周报机制 | 扩容同时引入公众服务、自动审批或外部签发;指标未稳定;控制无法复用 | 不扩容;保留试点状态;必要时转 NO-GO | + +## 4. Gate 逐项执行要求 + +### 4.1 Gate 0:售前资格筛选 + +必须确认: + +- 客户是企业内法务 / 合规 / 合同审核团队 +- 场景是合同审查、条款比对、义务映射或审批前预筛查 +- 客户接受“辅助判断 + 人工签发” +- 有业务 sponsor 和后续验收 owner + +通过标准: + +- 有书面边界说明 +- 有客户 owner 名单 +- 没有自动审批诉求 + +### 4.2 Gate 1:安全 / 法务评审 + +必须确认: + +- 能交付数据流图 +- 能交付部署模式说明 +- 能说清 retention / deletion / sub-processor / training policy +- 能给出权限与日志方案 + +通过标准: + +- 可以进入真实数据前准备 +- 但还不能视为已获准启动 pilot + +### 4.3 Gate 2:pilot 启动 + +必须确认: + +- 有模板 / 规则库和失败用例 +- 有审批流接入方案和 review queue +- 有人工签发人 +- 有来源回链能力和样例 +- 有首批用户培训材料 + +通过标准: + +- pilot 只允许在单业务单元、`2-3` 类高频标准合同内启动 +- pilot 输出只允许是风险提示、候选 redline、义务映射、预筛查建议 + +### 4.4 Gate 3:pilot 中期 + +必须确认: + +- 周报和双周报持续产出 +- 高风险升级命中率、人工采纳率、来源回链可用率有持续观测 +- 权限和日志抽查正常 +- 失败用例在持续新增和归档 + +通过标准: + +- 指标没有连续恶化 +- 红旗信号没有触发停机阈值 +- 用户仍愿意在真实工作流中继续使用 + +### 4.5 Gate 4:pilot 收口 + +必须确认: + +- 有完整失败用例复盘 +- 有更新版数据流图、日志样例、法务反馈 +- 有未解决风险清单 +- 有正式 `go / no-go / hold` 结论 + +通过标准: + +- 只有在核心风险已被解释、核心指标趋势稳定时,才允许进入 Gate 5 + +### 4.6 Gate 5:扩容前 + +必须确认: + +- 定位仍是企业内辅助工具 +- 没有引入公众服务、自动审批、外部签发等高风险用途 +- 权限、日志、删除策略可在扩容后复用 +- benchmark 和失败用例覆盖会同步扩张 + +通过标准: + +- 扩容后仍能维持原有人工签发、权限隔离和审计能力 + +## 5. 周报与双周评审节奏 + +### 5.1 每周周报 + +| 项目 | 责任人 | 必须更新的内容 | 证据形式 | +| --- | --- | --- | --- | +| 单份合同平均审查时长变化 | 产品负责人 | 本周 vs 上周变化、原因解释 | 周报表格 | +| 风险提示人工采纳率 | 法务运营 + 产品负责人 | 采纳率、低采纳原因、需改规则项 | 周报表格 + 标注样本 | +| 高风险升级命中率 | 法务负责人 | 漏升级 / 误升级样本、改进动作 | 升级样本复盘 | +| 来源回链可用率 | 产品负责人 + 研发 | 回链成功率、失效原因、修复状态 | 抽样截图 / 链接检查 | +| 新增失败用例 | 研究 / 知识管理 | 本周新增、是否已入库、是否重复 | 失败用例库变更记录 | +| 用户反馈与投诉 | 法务运营 / 项目经理 | 反馈分类、紧急问题、培训需求 | 问题清单 | + +周报发布时间: + +- 每周固定一次 +- 由法务运营 / 项目经理汇总 +- 发送给产品、法务、合规、安全、研发和客户业务 owner + +### 5.2 双周评审 + +| 节奏 | 参与角色 | 必看内容 | 决策输出 | +| --- | --- | --- | --- | +| 双周评审 | 产品负责人、法务负责人、合规负责人、安全负责人、客户业务 owner | 周报趋势、权限 / 日志抽查、来源回链、升级命中率、失败用例重复率 | 保持当前 scope、缩 scope、补控制、转 HOLD | + +## 6. 红旗信号清单 + +以下信号出现时,不应只在周报里备注,而应升级为专项处理: + +| 红旗信号 | 默认等级 | 责任人 | 默认动作 | +| --- | --- | --- | --- | +| 单份合同审查时长连续 `2` 周不降反升 | 中 | 产品负责人 | 检查流程摩擦点,必要时缩 scope | +| 风险提示人工采纳率连续 `2` 周显著偏低 | 中 | 法务负责人 + 产品负责人 | 调整规则库、提示方式和 review queue | +| 高风险合同漏升级 | 高 | 法务负责人 | 立即复盘样本,必要时暂停相关功能 | +| 来源回链失效或规则版本错乱 | 高 | 产品负责人 + 研发负责人 | 暂停相关输出,修复后再恢复 | +| 权限穿透、越权访问、异常导出 | 极高 | 安全负责人 | 立即停机或切回离线模式 | +| 客户要求自动审批或去掉人工签发 | 极高 | 管理层 sponsor + 法务负责人 | 直接进入停机 / 终止评审 | +| 同类失败用例重复出现且连续两周未关闭 | 中高 | 研究负责人 + 产品负责人 | 冻结相关扩容动作,先补评测与修复 | +| 用户持续把输出当成最终结论而非辅助建议 | 高 | 法务运营 + 法务负责人 | 加强培训;必要时收紧权限或缩 scope | + +## 7. 停机与降级动作 + +### 7.1 立即停机条件 + +以下任一情况出现,直接触发停机或至少功能级降级: + +- 出现跨客户 / 跨 matter 数据混用 +- 出现权限穿透或未授权访问 +- 无法解释或执行 retention、deletion、sub-processor 或日志策略 +- 业务方要求系统直接自动审批高风险合同 +- 输出无法稳定回链到条款、模板或规则来源 +- 高风险升级规则明显失效 + +### 7.2 停机动作顺序 + +1. 立即关闭相关功能或切回离线模式 +2. 锁定相关日志、样本和版本信息 +3. 通知安全、法务、产品、管理层 sponsor +4. 当日给出初步事件分级 +5. 下一个工作日给出“恢复 / 缩 scope / 终止”建议 + +### 7.3 停机后的三种结论 + +| 结论 | 适用条件 | 后续动作 | +| --- | --- | --- | +| 恢复 | 根因明确、已修复、未再触发红旗 | 恢复受限 scope,并进入加密监控 | +| 缩 scope | 问题可控,但当前范围过大 | 仅保留低风险合同类型或低风险功能 | +| 终止 / NO-GO | 根因不可接受,或产品定位已滑向禁区 | 结束当前 pilot,冻结扩容和对外承诺 | + +## 8. 30 / 60 / 90 天复盘机制 + +### 8.1 Day 30:启动后首轮复盘 + +目标: + +- 确认 pilot 没有跑偏 +- 确认周报和双周会能正常工作 +- 确认人工签发和升级机制没有被绕开 + +必须产出: + +- `30` 天周报汇总 +- 首轮失败用例集 +- 权限 / 日志抽查结果 +- 是否继续当前 scope 的判断 + +默认问题: + +- 用户有没有真实使用? +- 最常见失败类型是什么? +- 现有规则库是否足够支撑首批合同类型? + +### 8.2 Day 60:中期 go / hold 复盘 + +目标: + +- 判断 pilot 是否继续、缩 scope 或进入 `HOLD` +- 判断控制项是否真正在真实流程里生效 + +必须产出: + +- `60` 天趋势图:审查时长、人工采纳率、升级命中率、来源回链可用率 +- 失败用例分类报告 +- 用户反馈与培训缺口总结 +- 未解决风险清单 + +默认判断规则: + +- 如果关键指标连续恶化,优先转 `HOLD` +- 如果高风险升级或来源回链持续异常,不进入收口扩容讨论 + +### 8.3 Day 90:收口 / 扩容前复盘 + +目标: + +- 形成正式 `go / no-go / hold` +- 判断是否允许从单业务单元扩到更多团队 / 合同类型 + +必须产出: + +- `90` 天总复盘报告 +- 更新版数据流图 +- 日志与权限抽查总结 +- benchmark 与失败用例更新说明 +- 客户业务 owner 与法务负责人联合反馈 +- 扩容建议或终止建议 + +默认输出结论: + +- `GO`:指标稳定、控制可复用、风险可解释 +- `HOLD`:价值存在,但关键证据或控制不足 +- `NO-GO`:定位、控制或风险边界已不成立 + +## 9. 最低留档包 + +每个项目至少保留以下材料: + +- 目标用户与使用边界说明 +- 业务 sponsor / owner 名单 +- 数据流图 +- 部署模式说明 +- 供应商数据条款审查记录 +- 权限矩阵 +- 日志方案说明 +- 模板 / 规则库版本清单 +- benchmark 与失败用例清单 +- 周报 / 双周报 +- 停机事件记录 +- `30 / 60 / 90` 天复盘纪要 +- pilot 收口结论 +- 扩容或终止决策纪要 + +## 10. 一页执行判断 + +只有当下面五句话都成立时,才建议继续推进: + +- 我们卖的是**企业内合同 / 合规辅助工具**,不是公众法律意见服务。 +- 我们能做到**受控部署 + 权限隔离 + 日志留存 + 来源回链**。 +- 我们接受**人工签发不会被移除**。 +- 我们已经准备好**benchmark、失败用例、周报和停机机制**。 +- 我们能在 `30 / 60 / 90` 天复盘中持续解释价值、风险和控制状态。 + +只要其中任一项不成立,就不应把这个场景当成可规模扩张的产品线。 diff --git a/research/china-contract-compliance-copilot-management-brief-2026-03-20.md b/research/china-contract-compliance-copilot-management-brief-2026-03-20.md new file mode 100644 index 0000000..ab1b1f0 --- /dev/null +++ b/research/china-contract-compliance-copilot-management-brief-2026-03-20.md @@ -0,0 +1,132 @@ +# 中国企业合同 / 合规 copilot 管理层决策 brief + +日期:2026-03-20 + +适用对象:管理层、产品负责人、法务负责人、合规负责人、安全负责人 + +目的:把现有 research/ 研究包压缩成一份 1-2 页管理层 brief,聚焦**中国法域下的企业合同 / 合规 copilot**作为首个进入场景,明确目标客户、服务模式、go / no-go gate、必要 safeguards、证据缺口、停机条件与未来 90 天监测触发器。 + +这不是法律意见,而是 `2026-03-20` 的产品、合规和市场进入研究快照。 + +## 1. 一句话结论 + +建议:**GO(P0)**。 + +但这是一个**受限前提下的 GO**: + +- 只做**企业内部**合同 / 合规辅助工具 +- 只做**风险提示、条款比对、义务映射、审批前预筛查** +- 不做**自动审批** +- 不做**面向公众的法律意见服务** +- 不把客户文档默认放进共享训练池 + +如果这些前提守不住,应转 `NO-GO`。 + +## 2. 为什么先做这个场景 + +管理层层面的核心逻辑: + +- **价值明确**:直接对应合同审查提效、返工下降、合规漏项减少。 +- **责任边界更可控**:这是企业内辅助工具,不是对公众输出结论型法律服务。 +- **流程天然存在**:合同审查和审批流程本来就有模板、留痕、审批和升级节点。 +- **采购路径更现实**:中国大中型企业更容易接受私有化、专属云、VPC、权限隔离和审计日志完整的产品。 + +## 3. 目标客户与服务模式 + +### 3.1 目标客户 + +优先客户: + +- 大中型企业法务团队 +- 合规团队 +- 采购 / 销售合同审核团队 +- 强监管行业业务法务 + +优先行业: + +- 金融 +- 医药 / 医疗 +- 能源 / 基础设施 +- 平台型企业 +- 出海制造 + +### 3.2 推荐服务模式 + +| 项目 | 建议 | +| --- | --- | +| 产品定位 | 企业内闭环辅助工具 | +| 部署方式 | 私有化 / 专属云 / VPC / 严格 tenant isolation | +| 首阶段范围 | 合同 intake、条款抽取、redline compare、模板比对、合规义务映射、审批前预筛查 | +| 输出定位 | 风险提示、候选 redline、义务映射建议 | +| 人机边界 | 人工最终确认;不自动批准;不自动签发 | +| 商业路径 | 先 pilot / 项目,再做扩容与平台化 | + +## 4. go / no-go gate + +| Gate | 管理层要问的问题 | 必须具备 | 不满足时的处理 | +| --- | --- | --- | --- | +| Gate 0:商机筛选 | 这是不是企业内受控工具,而不是公众服务? | 明确业务 sponsor、明确使用边界、客户接受人工签发 | 不立项 | +| Gate 1:安全 / 法务评审 | 能不能进真实数据前准备? | 数据流图、部署模式、供应商条款、权限方案 | 不进真实数据环境 | +| Gate 2:pilot 启动 | 能不能进入受限 pilot? | 模板 / 规则库、失败用例、审批流接入方案、review queue | 只做离线样例 | +| Gate 3:pilot 中期 | 是继续、缩 scope,还是进入 HOLD? | 人工采纳率、升级命中率、用户反馈、日志抽查 | 缩 scope 或 HOLD | +| Gate 4:pilot 收口 | 是否形成正式 go / no-go 结论? | 失败用例复盘、数据流图更新、日志样例、法务复核反馈 | 不进扩容评审 | +| Gate 5:扩容前 | 能否从单业务单元扩到更多团队? | 指标趋势、未解决风险清单、最终复盘结论 | 保持试点状态 | + +## 5. 必须守住的最低 safeguards + +| 控制域 | 最低要求 | +| --- | --- | +| 部署 | 私有化 / 专属云 / VPC / 严格租户隔离 | +| 权限 | 文档级、matter 级、客户级权限控制 | +| 数据 | 敏感信息识别、导出限制、保留 / 删除策略 | +| 训练与日志 | 不默认用客户数据再训练;保留访问、输出、审批日志 | +| 输出 | 风险提示必须回链来源;高风险结论必须提示“需人工确认” | +| 流程 | 接入审批流;高风险升级到人工 review queue | +| 供应商治理 | retention、deletion、sub-processor、事故通报要在合同里写清楚 | +| 评测 | 有 benchmark 和失败用例;没有就不进真实业务 | + +## 6. 当前证据缺口 + +管理层真正还缺的,不是“再多一点概念”,而是下面四类证据: + +- **benchmark**:中文合同场景的条款抽取、红线识别、义务映射、风险判断 +- **客户访谈**:至少 3 家目标客户,确认 CLM / DMS / OA 流程、审批权归属、采购关切 +- **数据流图**:哪些文档进模型、是否涉及个人信息、是否跨境、是否进入供应商日志 +- **失败用例库**:漏识别重大条款、错标监管义务、模板错配、错误升级 / 漏升级 + +## 7. 立即停机条件 + +出现以下任一情况,应立即暂停、降级或转 `NO-GO`: + +- 出现跨客户 / 跨 matter 数据混用或权限穿透 +- 无法解释或执行 retention、deletion、sub-processor、日志策略 +- 客户要求系统直接自动审批高风险合同 +- 输出无法稳定回链来源 +- 高风险合同升级规则明显失效 +- 用户持续把系统输出当成最终结论而非辅助建议 + +## 8. 未来 90 天监测触发器 + +未来 90 天,管理层不需要天天盯政策新闻,但要盯下面几类触发器: + +| 触发器 | 为什么重要 | 触发后动作 | +| --- | --- | --- | +| 国家网信办继续发布生成式 AI 备案 / 登记公告,或范围解释趋严 | 如果产品边界滑向公众服务,监管义务会显著上升 | 重新确认产品是否仍是企业内闭环工具 | +| 《人工智能生成合成内容标识办法》配套执行或执法口径出现新要求 | 会影响标识、协议、日志和导出策略 | 复核产品标识与留痕方案 | +| 行业监管部门对金融、医疗、平台等行业 AI 应用提出更具体要求 | 首批客户大概率来自这些强监管行业 | 按行业补专门控制项或调整首批客户排序 | +| 客户采购尽调普遍把 retention、sub-processor、tenant isolation 卡得更严 | 决定能不能拿下首批客户 | 提前补供应商条款与隔离能力证明 | +| pilot 指标连续 4 周未改善 | 说明产品可能没有真实业务价值 | 缩 scope、补 benchmark 或转 HOLD | +| 高风险升级命中率、来源回链可用率、人工采纳率任一连续异常 | 说明核心控制失效 | 进入管理层专项复盘 | + +## 9. 当前管理层建议 + +如果现在要决定“第一站做不做中国法律 AI”,我的建议是: + +- **做** +- **但只做中国企业合同 / 合规 copilot** +- **并且必须限定为受控部署、辅助判断、可审计、有人类签发的服务模式** + +更直接一点: + +- 把它当成“高价值企业软件 + AI 辅助风控工具”,可以推进。 +- 把它当成“自动给法律结论或自动审批合同的机器人”,不应推进。 diff --git a/research/china-contract-compliance-copilot-management-memo-2026-03-20.md b/research/china-contract-compliance-copilot-management-memo-2026-03-20.md new file mode 100644 index 0000000..0500f56 --- /dev/null +++ b/research/china-contract-compliance-copilot-management-memo-2026-03-20.md @@ -0,0 +1,190 @@ +# 中国法域企业合同 / 合规 copilot 管理层 go / no-go 决策 memo + +日期:2026-03-20 + +适用对象:管理层、产品负责人、法务负责人、合规负责人、安全负责人、售前与交付负责人 + +目的:在现有《法律 LLM / AI 与法律交叉调研》《法律 AI 机会 / 风险矩阵(可执行版)》和《中国法域法律 AI go / no-go 决策备忘录》的基础上,把**中国法域下的企业合同 / 合规 copilot**单独收口成一份更适合管理层拍板的 go / no-go 决策 memo,明确目标用户、服务模式、合规边界、必要 safeguards、pilot gate、停机条件与证据缺口。 + +这不是法律意见,而是 `2026-03-20` 的产品、合规和市场进入研究快照。 + +## 1. 结论先行 + +管理层建议:**GO(P0)** + +但这个 `GO` 不是“可以随便做”,而是下面这种**受限前提下的 GO**: + +- 只做**企业内部**合同 / 合规 copilot,不做面向公众的法律意见服务。 +- 只做**辅助审查、风险提示、条款比对、合规义务映射、审批前预筛查**,不做自动审批。 +- 只在**私有化 / 专属云 / VPC / 受控租户隔离**模式下进入真实业务流。 +- 只在**有人工签发人、有权限隔离、有来源回链、有日志**的前提下进入 pilot。 + +一句话判断: + +- 如果产品定位是“企业内、可审计、可复核的合同 / 合规辅助工具”,建议推进。 +- 如果产品定位滑向“自动审批”“开放式法律判断”“客户文档默认进入共享训练池”,应直接转 `NO-GO`。 + +## 2. 为什么这个场景值得先做 + +管理层层面的核心原因只有四个: + +- 商业价值清晰:合同审查、红线比对、审批前筛查和合规义务映射都能直接对应时间节省、返工减少和漏项下降。 +- 责任边界相对可控:它是**企业内辅助工具**,不是直接面向公众的结论型法律服务。 +- 工作流天然存在:企业法务和合规团队本来就有审批、留痕、模板、版本和升级节点,容易接入。 +- 更符合中国市场真实采购方式:大中型企业更容易接受本地部署、专属实例、权限隔离和审计日志完整的产品。 + +## 3. 推荐目标用户与服务模式 + +### 3.1 目标用户 + +优先用户: + +- 大中型企业法务团队 +- 合规团队 +- 采购 / 销售合同审核团队 +- 强监管行业业务法务 + +更适合作为首批客户的行业: + +- 金融 +- 医药 / 医疗 +- 能源 / 基础设施 +- 平台型企业 +- 出海制造 + +### 3.2 推荐服务模式 + +| 项目 | 建议 | +| --- | --- | +| 服务模式 | 企业内闭环辅助工具 | +| 部署方式 | 私有化、专属云、VPC 或至少 tenant-isolated 部署 | +| 首阶段产品范围 | 合同 intake、条款抽取、redline compare、模板比对、合规义务映射、审批前预筛查 | +| 输出定位 | 风险提示、候选 redline、义务映射建议、预筛查建议 | +| 人机边界 | 人工最终确认;不自动批准、不自动签发 | +| 商业模式 | 先以项目 / pilot 进入,再谈平台化扩容 | + +## 4. 管理层 go / no-go 边界 + +### 4.1 可以做 + +- 企业内部合同草稿审查 +- 条款比对和风险提示 +- 企业内部政策与外部监管义务映射 +- 审批前预筛查和升级建议 + +### 4.2 现在不要做 + +- 面向公众开放问答并输出个案法律判断 +- 自动通过或拒绝高风险合同 +- 把客户合同和业务数据混入共享训练池 +- 直接输出可对外签发的正式法律意见 +- 直接接入法院、仲裁或监管提交链路 + +### 4.3 直接转 NO-GO 的信号 + +- 客户要求“系统自动审批合同” +- 供应商默认保留并再训练客户文档 +- 无法提供文档 / matter / 客户级权限隔离 +- 无法提供日志、版本和删除策略说明 +- 输出不能回链到条款、模板或规则来源 + +## 5. 决策所依赖的当前规则锚点 + +这份 memo 不把“内部企业工具”和“面向境内公众提供服务”混在一起。 + +管理层需要把握的关键点: + +- 《生成式人工智能服务管理暂行办法》第 2 条明确,**向中华人民共和国境内公众提供**生成内容服务时适用该办法;企业、教育科研机构等**未向境内公众提供服务**的研发和应用,不适用该办法本身,但并不等于可以脱离其他法律义务。官方文本同时要求提供和使用生成式 AI 服务遵守法律法规、尊重知识产权、保护商业秘密、个人信息和合法权益。 + 官方链接:https://www.miit.gov.cn/zcfg/qtl/art/2023/art_f4e8f71ae1dc43b0980b962907b7738f.html +- 同一办法第 7 条要求训练数据具有合法来源;涉及个人信息时,应取得个人同意或符合法律规定的其他情形。第 11 条要求不得收集非必要个人信息、不得非法留存可识别身份的输入信息和使用记录。 + 官方链接:https://www.miit.gov.cn/zcfg/qtl/art/2023/art_f4e8f71ae1dc43b0980b962907b7738f.html +- 《中华人民共和国个人信息保护法》第 6 条要求处理个人信息应具有明确、合理目的,并限于实现处理目的的最小范围。 + 官方链接:https://www.npc.gov.cn/npc/c2/c30834/202108/t20210820_313088.html +- 《人工智能生成合成内容标识办法》自 `2025-09-01` 起施行,并与配套标识标准同步实施。对落入相关网络信息服务情形的产品,需要提前把标识、协议、日志和导出策略考虑进去。 + 官方链接:https://www.gov.cn/zhengce/zhengceku/202503/content_7014286.htm +- 国家网信办 `2026-01-09` 公告显示,提供具有舆论属性或者社会动员能力的生成式 AI 服务,仍需履行备案或登记程序;这进一步支持“企业内闭环工具优先、公众开放服务后置”的产品排序。 + 官方链接:https://www.cac.gov.cn/2026-01/09/c_1769688009588554.htm + +对管理层的实际含义: + +- **企业内合同 / 合规 copilot 可以做,但要按企业软件 + AI 风险工具来做,而不是按开放公众机器人来做。** +- 一旦产品边界滑向公众服务、开放输出或高传播属性场景,监管要求和备案负担会明显上升。 + +## 6. 必须具备的最低 safeguards + +| 控制域 | 最低要求 | 解释 | +| --- | --- | --- | +| 部署 | 私有化 / 专属云 / VPC / 严格 tenant isolation | 不把不同客户或不同 matter 放进同一共享上下文 | +| 权限 | 文档级、matter 级、客户级权限控制 | 符合法务真实工作流,不只做账号级权限 | +| 数据 | 敏感信息识别、导出限制、保留 / 删除策略 | 支撑客户安全审查和采购尽调 | +| 训练与日志 | 不默认用客户数据再训练;保留访问、输出、审批日志 | 既防泄露,也支持追责和复盘 | +| 输出 | 风险提示必须回链来源;高风险结论必须提示“需人工确认” | 避免系统输出被误当成最终结论 | +| 流程 | 接入审批流;高风险升级到人工 review queue | 不允许把模型直接放到自动批准位 | +| 供应商治理 | 合同明确 retention、deletion、sub-processor、事故通报 | 没有这些,售前就不应进入正式 pilot | +| 评测 | 有基础 benchmark 和失败用例 | 没有 benchmark,只能做离线样例,不应进真实业务 | + +## 7. 建议的 pilot gate + +| Gate | 目标 | 必须拿到的材料 | 不满足时的处理 | +| --- | --- | --- | --- | +| Gate 0:售前资格筛选 | 判断客户是否值得进入方案阶段 | 明确业务 sponsor、目标团队、使用边界 | 不立项,只保留问题定义 | +| Gate 1:安全 / 法务评审 | 判断是否可进入真实数据前准备 | 数据流图、部署模式、供应商条款、权限方案 | 不进真实数据环境 | +| Gate 2:pilot 启动 | 判断是否可进入受限 pilot | 模板 / 规则库、失败用例、审批流接入方案、review queue | 只做离线样例 | +| Gate 3:pilot 中期 | 判断是否继续、缩 scope 或补控制 | 看板指标、人工采纳率、升级命中率、用户反馈 | 进入 HOLD 或缩小 scope | +| Gate 4:pilot 收口 | 判断是否形成正式 go / no-go 结论 | 失败用例复盘、数据流图更新、日志样例、法务复核反馈 | 不进扩容评审 | +| Gate 5:扩容前 | 判断是否从单业务单元扩到更多团队 | 关键指标趋势、未解决风险清单、最终复盘结论 | 保持试点状态,不扩容 | + +## 8. 建议的停机条件 + +以下任一情况出现,都应触发**立即暂停、降级或转 NO-GO**: + +- 出现跨客户 / 跨 matter 数据混用或权限穿透 +- 无法解释或执行数据删除、日志留存、sub-processor 管理 +- 客户要求自动批准合同,且不接受人工签发 +- 模型输出无法稳定回链来源 +- 高风险合同升级规则明显失效 +- 用户持续把系统输出当成最终结论而非辅助建议 + +建议的处置顺序: + +1. 先停相关功能或切回离线模式 +2. 隔离受影响数据或租户 +3. 由安全、法务、产品共同复盘 +4. 给出“恢复 / 缩 scope / 终止”的正式结论 + +## 9. 当前最关键的证据缺口 + +在真正扩大投入前,还缺这些证据: + +- 一套面向中文合同场景的 benchmark: + - 条款抽取 + - 红线识别 + - 义务映射 + - 风险等级判断 +- 至少 3 家目标客户访谈: + - CLM / DMS / OA 流程现状 + - 审批权归属 + - 安全与采购最关注的问题 +- 清晰的数据流图: + - 哪些文档进入模型 + - 是否涉及个人信息 + - 是否跨境 + - 是否进入供应商日志 +- 失败用例集合: + - 漏识别重大条款 + - 错标监管义务 + - 模板错配 + - 错误升级或漏升级 + +## 10. 管理层最终建议 + +如果管理层要决定“现在是否投入中国法域的法律 AI 第一站”,我的建议是: + +- **可以投** +- **但只能投企业内合同 / 合规 copilot** +- **并且必须限定为受控部署、辅助判断、可审计、有人类签发的服务模式** + +更直白一点: + +- 把它当成“高价值的企业软件 + AI 辅助风控工具”,可以推进。 +- 把它当成“自动给法律结论的机器人”,不应推进。 diff --git a/research/china-contract-compliance-copilot-ops-checklist-2026-03-20.md b/research/china-contract-compliance-copilot-ops-checklist-2026-03-20.md new file mode 100644 index 0000000..2865863 --- /dev/null +++ b/research/china-contract-compliance-copilot-ops-checklist-2026-03-20.md @@ -0,0 +1,136 @@ +# 中国法域企业合同 / 合规 copilot 操作级 checklist + +日期:2026-03-20 + +用途:作为《中国法域企业合同 / 合规 copilot 管理层 go / no-go 决策 memo》的配套执行清单,用于售前、立项、pilot 启动、运行、停机和扩容前评审。 + +适用范围: + +- 中国法域 +- 企业内部合同 / 合规 copilot +- 不适用于面向公众的开放式法律意见服务 + +使用方式: + +- 每一项都要有责任人和可留档的证据。 +- 任何红线项未满足,都不应进入下一 gate。 +- 这是执行 checklist,不替代法律意见或安全审查结论。 + +## 1. 售前 / 商机筛选 checklist + +| 检查项 | 责任人 | 需要的证据 | 不满足时怎么处理 | +| --- | --- | --- | --- | +| 明确客户是企业内法务 / 合规团队,而不是公众服务团队 | 销售 + 产品负责人 | 客户场景说明 | 不进入方案阶段 | +| 明确使用场景是合同审查、redline、义务映射或审批前预筛查 | 产品负责人 | 使用边界说明 | 不立项,只保留探索 | +| 确认客户接受“辅助判断 + 人工签发” | 销售 + 法务负责人 | 会议纪要 / 邮件确认 | 若客户坚持自动审批,转 NO-GO | +| 明确业务 sponsor 和法务 / 合规 owner | 销售 + 产品负责人 | 客户组织图 / owner 名单 | 不进入真实 pilot 讨论 | +| 判断客户是否已有 OA、CLM、DMS 或审批系统 | 产品负责人 + 解决方案架构师 | 系统现状清单 | 若完全没有可接流程,先降为概念验证 | + +## 2. 立项前 checklist + +| 检查项 | 责任人 | 需要的证据 | 不满足时怎么处理 | +| --- | --- | --- | --- | +| 完成目标用户与使用边界说明 | 产品负责人 + 法务负责人 | 立项说明文档 | 不立项 | +| 完成数据流图 | 架构负责人 + 安全负责人 | 数据流图 | 不进真实数据环境 | +| 明确部署模式:私有化 / 专属云 / VPC / 严格租户隔离 | 架构负责人 | 架构方案文档 | 不进入客户安全审查 | +| 完成供应商数据条款审查 | 采购 + 法务 + 安全 | 供应商条款审查记录 | 不签约、不接供应商 | +| 明确保留 / 删除 / 再训练策略 | 法务 + 安全 + 供应商接口人 | 条款摘要 / DPA / 安全问卷 | 不进入 pilot | +| 明确是否涉及个人信息、商业秘密、跨境传输 | 法务 + 安全 | 合规评估记录 | 需要补评估后再继续 | + +## 3. pilot 前 checklist + +| 检查项 | 责任人 | 需要的证据 | 不满足时怎么处理 | +| --- | --- | --- | --- | +| 完成文档级、matter 级、客户级权限设计 | 安全 + 法务运营 | 权限矩阵 | 不接真实文档 | +| 建立日志方案:访问、输出、审批、版本 | 安全 + 架构负责人 | 日志设计说明 | 不开 pilot | +| 明确 review queue 和人工签发人 | 法务负责人 | 流程图 / 角色说明 | 不开 pilot | +| 建立高风险升级规则 | 法务负责人 + 产品负责人 | 升级规则清单 | 仅能做离线样例 | +| 建立模板库、红线规则和政策规则库 | 知识管理 + 业务法务 | 规则库版本清单 | 不进入真实工作流 | +| 准备 benchmark 和失败用例清单 | 研究 / 知识管理 | 评测集与失败样例列表 | 不接真实业务流 | +| 确认输出可回链来源 | 产品负责人 + 研发 | 样例输出截图 | 不允许上线给真实用户 | + +## 4. pilot 启动 checklist + +| 检查项 | 责任人 | 需要的证据 | 不满足时怎么处理 | +| --- | --- | --- | --- | +| pilot 只覆盖 1 个业务单元 | 产品负责人 + 客户 owner | pilot scope 说明 | 缩 scope 后再启动 | +| pilot 只覆盖 2 到 3 类高频标准合同 | 产品负责人 + 业务法务 | 合同类型清单 | 不允许扩大范围 | +| 明确“只做风险提示 + redline 建议 + 预筛查” | 产品负责人 | 启动会材料 | 不允许加入自动审批 | +| 完成首批用户培训 | 产品负责人 + 法务运营 | 培训材料 / 参会记录 | 不放量 | +| 明确反馈入口和问题升级流程 | 产品负责人 | 反馈表单 / 群组 / 工单流程 | 先不上线 | + +## 5. pilot 运行 checklist + +| 检查项 | 责任人 | 需要的证据 | 红旗信号 | +| --- | --- | --- | --- | +| 每周复核单份合同平均审查时长变化 | 产品负责人 | 周报 | 时长不降反升 | +| 每周复核风险提示人工采纳率 | 法务运营 + 产品负责人 | 周报 / 标注结果 | 提示长期被视为噪声 | +| 每周复核高风险升级命中率 | 法务负责人 | 升级样本复盘 | 高风险样本漏升级 | +| 每周复核来源回链可用率 | 产品负责人 + 研发 | 样本抽查 | 回链失效或规则版本错乱 | +| 每周新增失败用例并归档 | 研究 / 知识管理 | 失败用例库 | 同类错误重复出现 | +| 双周复核权限、日志、导出限制 | 安全负责人 | 抽查记录 | 出现越权访问或异常导出 | + +## 6. 立即停机 / 降级 checklist + +以下任一情况出现,必须触发停机或降级动作: + +- [ ] 出现跨客户 / 跨 matter 数据混用 +- [ ] 出现权限穿透或未授权访问 +- [ ] 供应商无法解释 retention、deletion 或日志策略 +- [ ] 业务方要求系统直接自动审批高风险合同 +- [ ] 输出无法稳定回链到条款、模板或规则来源 +- [ ] 高风险升级规则明显失效 + +停机 / 降级动作: + +- [ ] 立即关闭相关功能或切回离线模式 +- [ ] 锁定相关日志和样本 +- [ ] 通知安全、法务、产品三方负责人 +- [ ] 在同日给出初步事件分级 +- [ ] 在下一个工作日给出“恢复 / 缩 scope / 终止”建议 + +## 7. pilot 收口 checklist + +| 检查项 | 责任人 | 需要的证据 | 不满足时怎么处理 | +| --- | --- | --- | --- | +| 完成失败用例复盘 | 研究负责人 + 法务负责人 | 失败用例报告 | 不进入扩容讨论 | +| 更新数据流图和部署说明 | 架构负责人 + 安全负责人 | 更新版数据流图 | 不进入扩容讨论 | +| 提供至少一轮法务复核反馈 | 法务负责人 | 反馈纪要 / 审核记录 | 继续试点,不扩容 | +| 提供日志样例和权限抽查结果 | 安全负责人 | 抽查报告 | 继续试点,不扩容 | +| 明确未解决风险清单 | 产品负责人 + 安全负责人 + 法务负责人 | 风险清单 | 进入 HOLD | + +## 8. 扩容前 checklist + +| 检查项 | 责任人 | 需要的证据 | 不满足时怎么处理 | +| --- | --- | --- | --- | +| 管理层确认仍保持“企业内辅助工具”定位 | 管理层 sponsor + 产品负责人 | 扩容评审纪要 | 不扩容 | +| 未新增公众服务、自动审批、外部签发等高风险用途 | 产品负责人 + 法务负责人 | 新 scope 说明 | 不扩容 | +| 关键指标趋势稳定 | 产品负责人 | 指标趋势图 | 保持试点 | +| 权限、日志、删除策略可在更大范围复用 | 安全负责人 | 扩容技术评估 | 保持试点 | +| benchmark 和失败用例覆盖随扩容同步增加 | 研究负责人 | 新版评测说明 | 保持试点 | + +## 9. 最低留档包 + +每个项目至少保留以下材料: + +- 目标用户与使用边界说明 +- 数据流图 +- 部署模式说明 +- 供应商数据条款审查记录 +- 权限矩阵 +- 日志设计说明 +- benchmark 与失败用例清单 +- 周报 / 双周报 +- pilot 收口结论 +- 扩容或终止决策纪要 + +## 10. 一页执行判断 + +只有当下面四句话都成立时,才建议继续推进: + +- 我们卖的是**企业内合同 / 合规辅助工具**,不是公众法律意见服务。 +- 我们能做到**受控部署 + 权限隔离 + 日志留存 + 来源回链**。 +- 我们接受**人工签发永远不被拿掉**。 +- 我们已经准备好**benchmark、失败用例和停机机制**。 + +只要其中任一项不成立,就不应把这个场景当成可规模扩张的产品线。 diff --git a/research/china-contract-compliance-copilot-validation-plan-2026-03-20.md b/research/china-contract-compliance-copilot-validation-plan-2026-03-20.md new file mode 100644 index 0000000..40713cf --- /dev/null +++ b/research/china-contract-compliance-copilot-validation-plan-2026-03-20.md @@ -0,0 +1,343 @@ +# 中国企业合同 / 合规 copilot 验证计划 + +日期:2026-03-20 + +用途:把中国企业合同 / 合规 copilot 在进入真实 pilot 前后必须补齐的**benchmark 设计、失败样本库结构、目标客户访谈提纲、数据流图要求和 pilot pass / fail 阈值**收敛成一份验证计划,避免执行 tracker 里只写“要有 benchmark / 失败用例 / 数据流图”,但没有统一标准。 + +适用范围: + +- 中国法域 +- 企业内部合同 / 合规 copilot +- 首个 pilot 客户或首个业务单元 +- 不适用于面向公众的开放式法律意见服务 +- 不适用于法院 / 仲裁 / 监管提交链路自动化 + +关联文档: + +- `research/china-contract-compliance-copilot-management-memo-2026-03-20.md` +- `research/china-contract-compliance-copilot-management-brief-2026-03-20.md` +- `research/china-contract-compliance-copilot-ops-checklist-2026-03-20.md` +- `research/china-contract-compliance-copilot-execution-tracker-2026-03-20.md` +- `research/legal-ai-opportunity-risk-matrix-2026-03-20.md` + +这不是法律意见,而是 `2026-03-20` 的执行验证文档。 + +## 1. 使用原则 + +- 没有验证计划,就不应把“有 benchmark”当成已经满足。 +- benchmark、失败样本库、客户访谈、数据流图和 pilot 阈值必须共用同一套场景边界。 +- 任何 `P0 / 极高` 级失败样本,不允许被“平均指标”掩盖。 +- 任何涉及自动审批、公众服务、外部签发或 court-facing 提交链路的诉求,都不在本计划通过范围内。 +- 这份计划的结论应直接服务 Gate 1、Gate 2、Gate 4、Gate 5 决策,以及 `30 / 60 / 90` 天复盘。 + +## 2. 验证工作流与 owner + +| 工作流 | 主要 owner | 共同参与 | 产出 | +| --- | --- | --- | --- | +| benchmark 设计与执行 | 研究 / 知识管理负责人 | 法务负责人、产品负责人 | benchmark 说明、样本清单、评测结果 | +| 失败样本库建设 | 研究 / 知识管理负责人 | 法务运营、研发负责人、法务负责人 | 失败样本库、回归清单、复盘结论 | +| 目标客户访谈 | 产品负责人 | 销售、法务运营、法务负责人 | 访谈提纲、访谈纪要、共性问题矩阵 | +| 数据流图与控制要求 | 架构负责人 + 安全负责人 | 法务负责人、合规负责人 | 数据流图、日志 / 保留 / 删除说明 | +| pilot pass / fail 判定 | 产品负责人 + 法务负责人 | 安全负责人、管理层 sponsor、客户业务 owner | Gate 结论、`HOLD / GO / NO-GO` 建议 | + +## 3. benchmark 设计 + +### 3.1 benchmark 只评什么 + +本计划只评首个 pilot 范围内的能力,不做泛化炫技: + +- 条款抽取 +- 候选 redline 建议 +- 合规义务映射 +- 高风险升级判断 +- 来源回链 +- 输出格式与人工 review queue 可用性 + +不在本轮 benchmark 里的内容: + +- 自动审批 +- 对外签发 +- 开放式法律判断 +- 面向法院、仲裁或监管提交链路的最终文书输出 + +### 3.2 样本切片维度 + +每个 in-scope 合同类型,至少按下面这些维度切片: + +- 标准模板样本 +- 谈判修改样本 +- 高风险条款样本 +- 边界 / 干扰样本 +- 如 OCR 在 scope 内,还要有扫描 / 版式噪声样本 + +推荐最低切片: + +| 切片 | 每个合同类型最低准备量 | 说明 | +| --- | --- | --- | +| 标准模板样本 | `10` | 验证基础流程是否稳定 | +| 谈判修改样本 | `10` | 验证 redline、条款冲突和版本差异 | +| 高风险条款样本 | `10` | 验证升级规则和风险提示 | +| 边界 / 干扰样本 | `5` | 验证错配模板、缺失字段、异常格式 | +| OCR / 版式噪声样本 | `5` | 仅在 OCR 进入 pilot scope 时必备 | + +如果 pilot 只覆盖 `2-3` 类高频标准合同,就只做这 `2-3` 类,不提前扩到长尾合同。 + +### 3.3 样本来源要求 + +benchmark 样本只能来自下列来源之一: + +- 已获批准可用于验证的脱敏历史合同 +- 内部模板库与红线模板 +- 法务人工构造的 stress / adversarial 样本 +- 已归档失败样本的脱敏重放版本 + +默认禁止: + +- 未脱敏的真实客户敏感文档直接进入通用验证集 +- 无法说明授权来源的外部合同文本 +- 把客户 pilot 文档默认沉淀进共享训练池 + +### 3.4 标注要求 + +每个样本至少要有下面这些标注字段: + +| 字段 | 必填说明 | +| --- | --- | +| `sample_id` | 唯一编号 | +| `contract_type` | 合同类型 | +| `slice_type` | 标准 / 谈判 / 高风险 / 干扰 / OCR | +| `risk_level` | `P0 / P1 / P2 / P3` | +| `expected_clauses` | 应抽到的关键条款 | +| `expected_red_flags` | 应命中的升级 / 风险提示 | +| `expected_obligations` | 应映射出的义务点 | +| `expected_source` | 应回链到的模板 / 规则 / 政策来源 | +| `reviewer` | 标注法务或知识管理负责人 | +| `version_tag` | 规则库 / 模板库 / 模型版本 | + +### 3.5 核心评测指标 + +| 指标 | 定义 | 为什么必须看 | +| --- | --- | --- | +| 条款抽取完整率 | 抽到的关键条款数 / 应抽取关键条款数 | 判断基础抽取是否可用 | +| 高风险升级召回率 | 被正确升级的高风险样本数 / 全部高风险样本数 | 判断核心风险是否会漏掉 | +| 来源回链可用率 | 成功回链的输出数 / 应回链输出数 | 判断输出能否被人工审查 | +| 合规义务映射通过率 | 被法务接受的义务映射条目 / 全部映射条目 | 判断义务映射是否可信 | +| 候选 redline 可采纳率 | 被法务接受的 redline 条目 / 被审查 redline 条目 | 判断输出是否真能进入工作流 | +| 无支撑结论率 | 没有来源支撑却给出结论的输出数 / 审查输出数 | 判断 hallucination / unsupported claim 风险 | +| 输出格式合规率 | 满足指定输出结构的结果数 / 全部结果数 | 判断是否能进入 review queue | + +## 4. 失败样本库结构 + +### 4.1 失败样本库目标 + +失败样本库不是“问题清单”,而是下面三件事的统一底座: + +- Gate 2 是否可进真实 pilot +- Gate 3 / Gate 4 复盘时有没有重复犯错 +- Gate 5 扩容时有没有把旧错带进新范围 + +### 4.2 最低字段结构 + +每条失败样本至少要记录: + +| 字段 | 说明 | +| --- | --- | +| `failure_id` | 唯一编号 | +| `sample_id` | 关联 benchmark 或 live sample | +| `contract_type` | 合同类型 | +| `stage` | `benchmark / pilot / regression` | +| `issue_type` | 条款漏识别 / 错误升级 / 漏升级 / 来源失效 / 模板错配 / 输出越界 | +| `severity` | `P0 / P1 / P2 / P3` | +| `expected_behavior` | 应该发生什么 | +| `actual_behavior` | 实际发生了什么 | +| `impact` | 会造成什么业务或合规后果 | +| `root_cause_guess` | 初步根因 | +| `owner` | 当前修复 owner | +| `status` | `new / confirmed / fixing / fixed / regression-passed / waived` | +| `introduced_version` | 问题出现时的版本 | +| `fixed_version` | 修复后的版本 | +| `retest_date` | 最近一次回归时间 | + +### 4.3 失败样本分类 + +最低要分下面几类: + +- `P0`:高风险条款漏升级、权限 / 数据越界、无来源高风险结论 +- `P1`:关键条款漏抽取、义务映射严重错误、来源回链失效 +- `P2`:候选 redline 低质量、模板错配、输出格式不合规 +- `P3`:措辞、排序、说明性细节问题 + +默认规则: + +- 任意 `P0` 样本在 Gate 2 验收样本里出现,直接不通过 +- 同类 `P1` 样本连续两周未关闭,不进入扩容讨论 +- `P2 / P3` 可以存在,但必须说明是否影响业务采纳率 + +### 4.4 回归要求 + +- 每次修复后必须回放对应失败样本 +- 每次规则库或模板库版本升级,都至少回放全部 `P0 / P1` 样本 +- 新增合同类型前,必须把旧 `P0 / P1` 回归跑一遍 + +## 5. 目标客户访谈提纲 + +### 5.1 最低访谈覆盖 + +在决定是否扩大投入前,至少完成 `3` 家目标客户访谈,每家至少覆盖: + +- 业务 sponsor +- 法务 / 合规实际 reviewer +- 安全 / IT 或采购接口人 + +### 5.2 访谈目标 + +- 验证场景是否真有高频、重复、可标准化的工作量 +- 验证人工签发和升级机制是否能接入现有流程 +- 验证客户对部署、日志、删除、sub-processor 的底线要求 +- 验证客户会不会把产品错误地当成自动审批工具 + +### 5.3 访谈问题框架 + +| 模块 | 必问问题 | +| --- | --- | +| 当前流程 | 合同从 intake 到签发经过哪些系统和角色? | +| 工作量结构 | 哪 `2-3` 类合同最适合先做?月均量级多少? | +| 风险结构 | 哪些条款或义务一旦漏掉,后果最严重? | +| 签发边界 | 谁对最终文本负责?哪些节点必须人工签发? | +| 安全与部署 | 是否接受 SaaS?是否必须私有化 / VPC? | +| 数据治理 | 是否允许供应商日志留存?是否允许再训练? | +| 集成条件 | 现有 CLM / DMS / OA / 审批流是什么? | +| 成功标准 | pilot 成功在客户眼里意味着什么?节时、漏项减少还是 review 透明度提升? | +| 失败条件 | 出现什么情况客户会立刻停 pilot? | + +### 5.4 访谈交付物 + +每次访谈至少产出: + +- 一页纪要 +- 客户当前流程图 +- 风险点清单 +- 采购 / 安全卡点清单 +- 对 pilot pass / fail 指标的反馈 + +## 6. 数据流图要求 + +### 6.1 数据流图不是只画架构框 + +本计划要求的数据流图,至少要能回答: + +- 什么文档在什么时候进入模型 +- 哪些数据进入日志 +- 是否涉及个人信息、商业秘密或跨境传输 +- 哪些处理在供应商边界内发生 +- 删除和导出怎么执行 + +### 6.2 必须出现的节点 + +| 节点 | 最低说明 | +| --- | --- | +| 文档入口 | 合同来源、上传方式、预处理 | +| 权限控制层 | 文档级 / matter 级 / 客户级权限 | +| 检索与规则层 | 模板库、规则库、知识库、版本号 | +| 模型推理层 | 模型类型、部署边界、是否出网 | +| 输出层 | 风险提示、redline、义务映射、review queue | +| 日志层 | 访问日志、输出日志、审批日志、版本日志 | +| 导出与删除层 | 删除策略、保留期限、导出限制 | +| 供应商边界 | sub-processor、支持人员访问、事故通报路径 | + +### 6.3 必须出现的标注 + +- 每个节点是否处理个人信息 +- 每个节点是否处理商业秘密 +- 数据是否会跨租户、跨客户、跨区域 +- 默认保留时间 +- 删除触发条件 +- 谁可访问 + +### 6.4 最低交付包 + +- 逻辑数据流图 +- 日志与保留表 +- 供应商边界说明 +- 删除 / 导出流程说明 +- 版本化知识库与规则库说明 + +## 7. pilot pass / fail 阈值 + +### 7.1 Gate 2:进入真实 pilot 前的离线阈值 + +| 指标 | 通过阈值 | 不通过阈值 | +| --- | --- | --- | +| `P0` 高风险漏升级 | `0` 个 | 任意 `1` 个都不通过 | +| 高风险升级召回率 | `>= 95%` | `< 90%` | +| 来源回链可用率 | `>= 98%` | `< 95%` | +| 无支撑结论率 | `0` | 任意 `1` 个高风险结论无支撑都不通过 | +| 条款抽取完整率 | `>= 90%` | `< 85%` | +| 合规义务映射通过率 | `>= 85%` | `< 75%` | +| 候选 redline 可采纳率 | `>= 70%` | `< 60%` | + +Gate 2 默认规则: + +- 只要 `P0` 漏升级、无支撑高风险结论、权限越界样本出现,就不进入真实 pilot +- 如果只是 `P2 / P3` 级问题偏多,可以补规则后重跑,不必直接终止 + +### 7.2 Gate 3:pilot 运行中的 HOLD / 缩 scope 阈值 + +| 指标 | 正常范围 | 进入 `HOLD / 缩 scope` 的阈值 | +| --- | --- | --- | +| 单份合同平均审查时长 | 不要求首周立刻下降,但应可解释 | 连续 `2` 周明显不降反升且无法解释 | +| 风险提示人工采纳率 | `>= 60%` | 连续 `2` 周 `< 50%` | +| 高风险升级召回率抽样 | `>= 95%` | `< 90%` 或出现 `P0` 漏升级 | +| 来源回链可用率 | `>= 97%` | 任一周 `< 95%` | +| 同类失败样本关闭时长 | `2` 周内关闭 | 连续 `2` 周未关闭 | +| 用户越界使用 | 不应出现 | 把输出当最终结论或要求去掉人工签发 | + +### 7.3 Gate 4:pilot 收口 / 是否可扩容的阈值 + +| 指标 | 建议通过阈值 | +| --- | --- | +| `P0 / 极高` 未解决问题 | `0` | +| 过去 `4` 周高风险升级召回率 | `>= 95%` | +| 过去 `4` 周来源回链可用率 | `>= 98%` | +| 过去 `4` 周风险提示人工采纳率 | `>= 65%` | +| 审查效率改进 | `>= 15%`,或虽未降时长但有书面证据证明风险识别显著提升 | +| 同类 `P1` 失败重复出现 | 不应连续复发 | +| 数据流图 / 日志 / 法务反馈 | 必须全部更新完毕 | + +### 7.4 Gate 5:扩容前阈值 + +- Gate 4 阈值至少持续 `4` 周稳定 +- 新合同类型加入前,先补 benchmark 和失败样本 +- 不允许一边扩容、一边引入公众服务、自动审批或 court-facing 用途 + +## 8. 30 / 60 / 90 天复盘与本计划的关系 + +| 节点 | 本计划必须提供的材料 | +| --- | --- | +| Day 30 | benchmark 首轮结果、失败样本初版、首批访谈纪要、数据流图初版 | +| Day 60 | 趋势图、失败样本分类报告、访谈共性问题矩阵、数据流图更新版 | +| Day 90 | Gate 4 / 5 验证结论、回归结果、扩容前差距清单 | + +## 9. 最低留档包 + +项目至少要保留下面这些验证材料: + +- benchmark 说明和样本切片定义 +- benchmark 结果表 +- 失败样本库 +- `P0 / P1` 回归记录 +- 目标客户访谈纪要 +- 数据流图与日志 / 保留说明 +- Gate 2 / Gate 4 / Gate 5 验证结论 + +## 10. 一页判断 + +只有当下面五句话都成立时,才建议把 pilot 从“可演示”推进到“可进入真实业务流”: + +- 我们已经明确 pilot 只覆盖 `2-3` 类高频标准合同。 +- 我们有可复跑的 benchmark,而不是零散样例。 +- 我们有结构化失败样本库,而不是会后口头总结。 +- 我们有能落到供应商边界和日志保留的数据流图。 +- 我们已经把 `P0 / P1` 阈值写清楚,并接受一旦命中就停机 / HOLD。 + +只要其中任一项不成立,就不应把这个场景当成可扩容产品线。 diff --git a/research/china-legal-ai-go-no-go-memo-2026-03-20.md b/research/china-legal-ai-go-no-go-memo-2026-03-20.md new file mode 100644 index 0000000..c744a7d --- /dev/null +++ b/research/china-legal-ai-go-no-go-memo-2026-03-20.md @@ -0,0 +1,435 @@ +# 中国法域法律 AI go / no-go 决策备忘录 + +日期:2026-03-20 + +目的:基于现有《法律 LLM / AI 与法律交叉调研》和《法律 AI 机会 / 风险矩阵(可执行版)》,为一个明确法域和两个高价值场景给出可执行的 go / no-go 判断,帮助决定先做什么、暂时不做什么,以及上线前最低控制要求是什么。 + +结论先行: + +- 目标司法辖区:**中国** +- 评估场景 1:**企业合同 / 合规 copilot** +- 评估场景 2:**中文 Legal RAG / citation-safe drafting** +- 总体建议: + - 场景 1:**GO(P0)** + - 场景 2:**CONDITIONAL GO(P1)** + - 当前明确不建议切入:**面向公众的开放式法律意见 bot**、**直接进入法院提交链路的自动化文书工具** + +这不是法律意见,而是 `2026-03-20` 的产品与合规研究快照。 + +## 1. 为什么选中国法域 + +选中国,不是因为“监管更松”,而是因为: + +- 企业内闭环场景更明确,尤其适合本地部署、私有化、行业化知识工程和可审计产品。 +- 生成式 AI 的公共服务路径已经有较清晰的规则边界,适合把“内部企业工具”和“面向公众服务”明确分开。 +- 2025 年后,AI 生成内容标识和备案 / 登记路径更细化,意味着做产品时要先想清楚 deployment model,而不是只想模型能力。 + +因此,最稳的进入方式不是做“全民法律机器人”,而是做**企业内、可控、可复核**的法律 AI。 + +## 2. 目标用户 + +### 场景 1:企业合同 / 合规 copilot + +目标用户: + +- 大中型企业法务团队 +- 合规团队 +- 采购 / 销售合同审核团队 +- 监管要求较重行业的业务法务 + +核心工作流: + +- 合同 intake +- 条款抽取 +- 红线比对 +- 内部条款模板匹配 +- 合规义务映射 +- 审批流前置筛查 + +### 场景 2:中文 Legal RAG / citation-safe drafting + +目标用户: + +- 企业法务研究岗 +- 律所内部研究和起草团队 +- 监管跟踪和政策研究团队 + +核心工作流: + +- 法规 / 监管规则 / 案例检索 +- 研究备忘录草稿 +- 引用插入与出处回链 +- 法域限定和时效检查 +- 内部知识库问答 + +## 3. 决策摘要 + +| 场景 | 决策 | 进入优先级 | 推荐部署方式 | 不建议的做法 | +| --- | --- | --- | --- | --- | +| 企业合同 / 合规 copilot | GO | P0 | 私有化 / 专属云 / VPC,先做企业内闭环 | 直接对公众提供法律判断;自动审批;把客户合同用于开放训练 | +| 中文 Legal RAG / citation-safe drafting | CONDITIONAL GO | P1 | 内部研究与起草辅助,必须带来源和人工复核 | 无引用裸答;直接输出对外正式法律意见;直接用于法院提交 | + +### 3.1 决策会一页检查表 + +| 检查问题 | 企业合同 / 合规 copilot | 中文 Legal RAG / citation-safe drafting | 如果答案是“否” | +| --- | --- | --- | --- | +| 能否做本地部署、专属云或 VPC? | 必须 | 强烈建议 | 不进入正式 pilot,只保留概念验证 | +| 能否做到文档 / matter / 客户级权限隔离? | 必须 | 必须 | 直接 no-go | +| 能否回链来源并暴露给用户? | 强烈建议 | 必须 | 不进入真实业务使用 | +| 能否保留日志、版本和审批痕迹? | 必须 | 必须 | 不进入法务或合规正式流程 | +| 是否已有明确人工签发人? | 必须 | 必须 | 不对外输出、不进入高风险场景 | +| 是否完成数据流图和供应商数据条款审查? | 必须 | 必须 | 只能做隔离环境内部测试 | +| 是否已有基础评测集和失败用例? | 强烈建议 | 必须 | 只能做受限试点,不能规模推广 | + +### 3.2 建议的最小 pilot 范围 + +企业合同 / 合规 copilot: + +- 只选 1 个业务单元 +- 只覆盖 2 到 3 类高频标准合同,例如 NDA、采购合同、销售框架协议 +- 只做“风险提示 + redline 建议 + 审批前预筛查”,不做自动审批 + +中文 Legal RAG / citation-safe drafting: + +- 只选 1 个明确法域主题包,例如数据合规或劳动用工合规 +- 只服务内部研究和备忘录草稿 +- 只允许输出带来源、带日期、带法域标签的回答 +- 不允许直接用于法院提交或对外正式法律意见 + +### 3.3 业务流程 -> 最低控制要求矩阵 + +| 场景 | 业务流程节点 | 允许自动化程度 | 最低控制要求 | 主要控制责任人 | +| --- | --- | --- | --- | --- | +| 企业合同 / 合规 copilot | 合同 intake / 上传 | 只允许自动分类与预处理 | matter / 客户标签、文档级权限、敏感信息识别、上传日志 | 法务运营 + IT / 安全 | +| 企业合同 / 合规 copilot | 条款抽取 / redline compare | 可自动生成候选结果,但不得自动定案 | 模板版本锁定、来源回链、差异高亮、人工 review queue | 业务法务负责人 | +| 企业合同 / 合规 copilot | 合规义务映射 | 只能做建议,不得自动批准 | 法规版本与日期显示、规则来源展示、unsupported claim 拦截 | 合规负责人 | +| 企业合同 / 合规 copilot | 审批前预筛查 | 只做风险分层,不做自动放行 | 高风险升级规则、审批流集成、审计日志、签发人明确 | 法务负责人 | +| 中文 Legal RAG / citation-safe drafting | 法规 / 监管规则检索 | 可自动召回与排序 | 法域过滤、时间过滤、来源展示、查询日志 | 知识管理 / 研究负责人 | +| 中文 Legal RAG / citation-safe drafting | 研究备忘录草稿 | 只允许生成 draft | citation verifier、unsupported claim 拦截、输出附日期和法域 | 起草律师 / 法务 | +| 中文 Legal RAG / citation-safe drafting | 内部知识库问答 | 只允许内部参考 | 权限隔离、权威来源优先、非权威内容标注、人工升级入口 | 知识管理 + 法务运营 | +| 中文 Legal RAG / citation-safe drafting | 对外交付前复核 | 不允许自动化签发 | 人工签发、修改留痕、最终版本归档、禁止直接用于法院提交 | 最终签发律师 / 法务负责人 | + +### 3.4 pilot 验收 / 停机门槛 + +| 场景 | pilot 继续推进的信号 | 应立即暂停或降级的信号 | pilot 结束时必须拿到的证据 | +| --- | --- | --- | --- | +| 企业合同 / 合规 copilot | 法务团队愿意持续使用;风险提示能稳定回链来源;审批流能接住高风险升级;没有出现客户数据隔离事故 | 出现跨客户 / matter 数据混用;供应商不能解释日志和删除策略;业务方要求自动审批高风险合同 | 失败用例清单、审批流集成截图或流程说明、数据流图、至少一轮法务复核反馈 | +| 中文 Legal RAG / citation-safe drafting | 用户愿意点击来源;法域和日期过滤能减少误用;草稿输出对人工起草有明确提效;引用校验能抓到关键错误 | 出现无来源裸答进入真实工作流;用户持续绕过来源查看;高风险问题无法拒答或升级;被要求直接用于法院或对外交付 | 引用准确性抽样结果、来源使用日志、拒答 / 升级案例、知识库更新与版本记录 | + +### 3.5 立项 gate 必交件 + +| Gate | 必交件 | 责任人 | 不满足时的处理 | +| --- | --- | --- | --- | +| 立项前 | 目标用户与使用边界说明 | 产品负责人 + 法务负责人 | 不立项,只保留问题定义 | +| 立项前 | 数据流图与部署模式说明 | IT / 安全 + 架构负责人 | 不进入真实数据环境 | +| 立项前 | 供应商数据条款和保留 / 删除策略 | 采购 + 法务 + 安全 | 不签约,不接入供应商 | +| pilot 前 | 权限隔离与日志方案 | IT / 安全 + 法务运营 | 不开 pilot,只能做离线样例 | +| pilot 前 | 基础评测集和失败用例清单 | 研究 / 知识管理 + 业务法务 | 不允许接真实业务流 | +| pilot 中 | 用户反馈与人工复核记录 | 产品负责人 + 最终签发人 | 连续出现负反馈则暂停或缩 scope | +| pilot 收口 | 引用准确性 / 风险提示有效性抽样结果 | 研究负责人 + 法务负责人 | 不进入扩容决策 | +| 扩容前 | 最终 go / no-go 复盘结论 | 产品负责人 + 法务负责人 + 安全负责人 | 不扩容,保留试点状态 | + +### 3.6 决策状态定义 + +| 状态 | 含义 | 允许做什么 | 不允许做什么 | 下一步动作 | +| --- | --- | --- | --- | --- | +| GO | 关键控制、证据和责任人都已到位,可进入下一阶段 | 进入 pilot 或扩容;接入受控真实业务流 | 越过已定义边界做更高风险用途 | 按 gate 计划推进,并持续复核 | +| CONDITIONAL GO | 可以推进,但必须带条件或限 scope | 在限定业务单元、限定主题包、限定输出边界下试点 | 扩大到公众、法院、对外交付等高风险场景 | 先补齐缺失控制或证据,再评估是否升级为 GO | +| HOLD | 方向不必放弃,但当前证据或控制不足 | 保留离线评测、离线样例、设计验证 | 接入真实数据、进入真实工作流 | 明确缺口清单,补齐后再重开评审 | +| NO-GO | 当前不应继续推进该用途或该部署方式 | 终止当前方案,必要时退回需求定义 | 继续投入 pilot、上线或扩容 | 关闭当前路径,必要时改场景、改部署或改边界 | + +### 3.7 pilot 看板指标 + +| 场景 | 建议持续跟踪的指标 | 作用 | 红旗信号 | +| --- | --- | --- | --- | +| 企业合同 / 合规 copilot | 单份合同平均审查时长变化 | 判断是否真的提效 | 时长没有下降,反而增加法务返工 | +| 企业合同 / 合规 copilot | 风险提示被人工采纳的比例 | 判断提示是否有业务价值 | 大量提示被判定为噪声或误报 | +| 企业合同 / 合规 copilot | 高风险合同升级命中率 | 判断升级规则是否有效 | 高风险样本未被升级,或低风险样本大量误升 | +| 企业合同 / 合规 copilot | 来源回链可用率 | 判断输出是否可核验 | 回链失效、模板版本错乱、规则来源无法打开 | +| 中文 Legal RAG / citation-safe drafting | 来源点击率 / 查看率 | 判断用户是否真的在用来源 | 用户长期忽略来源,直接复制结论 | +| 中文 Legal RAG / citation-safe drafting | 引用准确性抽样通过率 | 判断 citation-safe 能力是否成立 | 抽样中持续出现误引、漏引、过期引用 | +| 中文 Legal RAG / citation-safe drafting | 高风险问题拒答 / 升级命中率 | 判断安全边界是否有效 | 高风险问题被正常回答且未触发升级 | +| 中文 Legal RAG / citation-safe drafting | 草稿后人工修改幅度 | 判断草稿是否真能帮助起草 | 人工需要大面积重写,提效不明显 | + +### 3.8 pilot 评审节奏 + +| 节奏 | 参与角色 | 必看内容 | 触发动作 | +| --- | --- | --- | --- | +| 每周 | 产品负责人、法务运营、研究/知识管理 | 3.7 看板指标、用户反馈、失败用例新增情况 | 调整 prompt、知识库、review queue 或 pilot scope | +| 双周 | 法务负责人、合规负责人、IT/安全 | 数据隔离、日志、来源回链、升级命中率、拒答命中率 | 决定是否保留当前 scope、增加控制、或进入 HOLD | +| pilot 中期 | 产品负责人、法务负责人、最终签发人 | 已交付样本质量、人工修改幅度、真实工作流摩擦点 | 决定是否继续 pilot、缩 scope、或补额外控制 | +| pilot 收口 | 产品负责人、法务负责人、安全负责人、研究负责人 | 3.4 收口证据、3.5 gate 必交件完成情况、红旗信号复盘 | 输出正式 go / no-go / hold 结论,决定是否进入扩容评审 | +| 扩容前评审 | 产品负责人、法务负责人、安全负责人、业务 sponsor | 所有 gate 证据、关键指标趋势、未解决风险清单 | 批准扩容、维持试点、或转为 NO-GO | + +### 3.9 首批 pilot 客户筛选标准 + +| 场景 | 优先接入的首批客户画像 | 最低前置条件 | 暂缓 / 不接入信号 | +| --- | --- | --- | --- | +| 企业合同 / 合规 copilot | 大中型企业法务 / 合规团队;月度标准合同量高;已有 OA、CLM 或 DMS;愿意先在单业务单元试点 | 明确业务 sponsor;能谈本地部署、专属云或 VPC;可接审批流;可提供模板、红线规则和失败样本 | 目标是“自动审批”;拒绝权限和日志审查;希望默认用客户合同做共享训练 | +| 企业合同 / 合规 copilot | 强监管行业客户,例如金融、医药、能源、平台、出海制造;已有合规义务映射需求 | 有明确合规 owner;能提供当前政策清单;接受高风险条款自动升级到人工复核 | 没有合规 owner;政策版本长期无人维护;要求跳过最终人工签发 | +| 中文 Legal RAG / citation-safe drafting | 企业法务研究、政策研究、知识管理团队;愿意先围绕单一主题包做内部研究辅助 | 语料来源和更新 owner 明确;用户接受“先看来源再用答案”;对外输出必须人工签发 | 需要开放式全法域裸答;不接受来源展示;计划直接用于外部正式法律意见 | +| 中文 Legal RAG / citation-safe drafting | 律所内部研究 / 起草团队;愿意先做内部 memo、培训材料和草稿辅助 | matter / 客户隔离可落地;citation verifier 可嵌入流程;愿意保留人工修改和签发痕迹 | 期望直接替代律师定稿;不愿保留修改痕迹;要求直连法院或监管提交链路 | + +首批 pilot 更适合优先接的客户: + +- 已经存在较稳定工作流,而不是还在重新定义业务流程的客户。 +- 愿意接受“辅助判断 + 人工签发”边界,而不是把产品当成自动决策器的客户。 +- 愿意配合提供失败样本、审批流信息和知识更新机制的客户。 + +不建议作为首批 pilot 客户的典型信号: + +- 一开始就要求面向公众上线开放式法律意见 bot。 +- 一开始就要求进入法院、仲裁或监管提交链路。 +- 明确拒绝数据流图、权限、日志、来源回链或保留 / 删除策略审查。 + +## 4. 场景 1:企业合同 / 合规 copilot + +### 4.1 为什么是 GO + +这是当前中国法域最适合率先落地的法律 AI 场景之一,因为它同时满足四个条件: + +- 价值明确:节省审查时间、提升条款一致性、减少合规漏项。 +- 责任边界相对可控:主要作为企业内部辅助工具,而不是直接向社会公众输出法律意见。 +- 工作流清晰:合同与合规流程天然带审批、留痕、版本管理和人工复核节点。 +- 更容易做成本地部署与权限隔离,符合客户真实采购方式。 + +### 4.2 合规边界 + +可以做: + +- 企业内部合同草稿审查 +- 条款风险提示 +- 企业内部政策与外部监管义务映射 +- 审批前的预筛查 + +不要做: + +- 替代法务做最终法律结论 +- 自动通过或拒绝高风险合同 +- 把客户合同和业务数据混入共享训练池 +- 面向公众开放问答并给个案结论 + +### 4.3 最低控制要求 + +- 部署: + - 优先本地部署、专属云或 VPC + - 不允许把不同客户 / matter 文档混进同一共享上下文 +- 数据: + - 文档级权限控制 + - 敏感信息识别、脱敏和导出限制 + - 明确保留、删除和再训练策略 +- 输出: + - 每条风险提示都能回链到条款、模板或规则来源 + - 高风险结论必须显示“需人工确认” +- 流程: + - 接入审批工作流 + - 留审计日志 + - 保留版本差异 +- 供应商治理: + - 合同明确数据用途、分包商、事故通报和删除义务 + +### 4.4 证据缺口 + +在真正立项或上线前,还缺这些证据: + +- 至少一套面向中文合同场景的评测集: + - 条款抽取 + - 红线识别 + - 义务映射 + - 风险等级判断 +- 至少 3 家目标客户的访谈: + - 现有 CLM / DMS / OA 流程是什么 + - 谁拥有审批权 + - 采购最担心的是什么 +- 清晰的数据流图: + - 哪些文档进模型 + - 是否包含个人信息 + - 是否跨境 + - 是否进入供应商系统日志 +- 失败用例集合: + - 漏识别重大条款 + - 错标监管义务 + - 模板错配 + +### 4.5 go / no-go 触发条件 + +保持 GO 的条件: + +- 能做到私有化或受控部署 +- 能接权限和审批流 +- 能回链来源并保留审计日志 +- 输出被定义为“辅助判断”而非“自动决策” + +转为 NO-GO 的条件: + +- 客户要求直接自动审批合同 +- 供应商模式要求默认保留并再训练客户文档 +- 无法证明文档隔离、日志和删除能力 + +## 5. 场景 2:中文 Legal RAG / citation-safe drafting + +### 5.1 为什么是 CONDITIONAL GO + +这是法律 AI 的基础能力层,但在中国法域只能**有条件地做**。原因不是商业价值不高,而是: + +- 价值很高,几乎可服务所有法律工作流; +- 但只要引用失真、法域混淆、法规过期或无出处裸答,就会立即失去可用性; +- 一旦被用户当成正式法律意见或直接进入法院 / 监管提交链路,风险会陡增。 + +所以它适合先做成: + +- 内部研究辅助 +- 备忘录草稿工具 +- 引用检查器 +- 法规 / 政策更新提醒器 + +不适合一开始就做成: + +- 直接输出可对外签发的法律意见 +- 面向公众的“你该怎么打官司 / 怎么维权”的结论工具 +- 诉讼或行政提交的自动文书系统 + +### 5.2 合规边界 + +可以做: + +- 基于法规、部门规章、监管问答、企业内部规范的检索增强问答 +- 输出出处、版本、时间、法域 +- 仅作为内部起草辅助 + +不要做: + +- 允许无来源裸答 +- 混合法域知识而不标注适用范围 +- 直接承诺“本答案可直接用于正式法律意见或提交材料” + +### 5.3 最低控制要求 + +- 检索: + - 只从许可清晰、来源可信、可更新时间戳的语料中检索 + - 检索结果必须暴露给用户 +- 引用: + - 输出必须附来源 + - 标明法规 / 规则的版本和日期 + - 无依据时明确拒答或降级 +- 模型行为: + - 默认先检索再生成 + - 对 unsupported claim 做拦截 + - 对高风险问题强制“仅供内部参考” +- 流程: + - 用户可一键查看原文 + - 用户修改痕迹与最终签发人分离 + - 对外输出必须人工确认 + +### 5.4 证据缺口 + +- 需要一套中国法域专用的评测集: + - 法规定位 + - 版本时效 + - 条文引用 + - 政策口径摘要 + - 误引 / 幻觉率 +- 需要语料清单: + - 哪些法规库、监管文件、案例或内部资料可以合法接入 + - 更新 SLA 是什么 +- 需要产品验证: + - 用户是否真的会点击来源 + - 引用显示方式是否足够快 + - 法域和日期过滤是否能降低误用 + +### 5.5 go / no-go 触发条件 + +升级为更明确 GO 的条件: + +- 有稳定、可更新、可授权的中国法域知识库 +- 有 citation verifier 和时间 / 法域过滤 +- 有高风险问题拒答与人工复核策略 + +转为 NO-GO 的条件: + +- 产品定位成“开放式法律意见机器人” +- 无法保证来源回链和更新时间 +- 计划直接进入法院提交或对外正式法律意见链路 + +## 6. 中国法域下的统一 no-go 边界 + +在当前阶段,下列事情不建议做: + +- 面向公众提供开放式、结论型、个案化法律意见机器人 +- 把法律 AI 直接接进法院或监管机关提交链路 +- 自动替代律师 / 法务作最终签发 +- 缺乏标识、日志、权限控制和来源回链的开放生成 + +原因不是“不能用 AI”,而是这些路径同时叠加了: + +- 公众误导风险 +- 责任归属不清 +- 输入数据敏感 +- 输出需要强可核验性 +- 监管与平台义务更重 + +## 7. 中国法域下建议的进入顺序 + +1. **P0:企业合同 / 合规 copilot** +2. **P1:中文 Legal RAG / citation-safe drafting** +3. **P2:受控公共法律服务中的 intake / 分流助手** +4. **P3:法院或监管提交前的内部 QA 工具** + +判断逻辑: + +- 先做企业内闭环,再做对外工具 +- 先做辅助判断,再做高责任输出 +- 先把权限、来源、标识、日志做扎实,再扩展产品边界 + +## 8. 最低产品证明包(没有这些,不建议正式进入) + +- 合规包: + - 数据流图 + - 部署模式说明 + - 数据保留 / 删除策略 + - 标识和日志方案 +- 产品包: + - 用户角色和审批流 + - 来源回链设计 + - 高风险问题拒答策略 +- 评测包: + - 中文法域 benchmark + - 幻觉 / 误引率 + - 关键场景失败案例 +- 商业包: + - 目标行业名单 + - 典型客户访谈 + - 替代或增强的现有工作流说明 + +## 9. 本备忘录依赖的当前规则锚点 + +- 《生成式人工智能服务管理暂行办法》: + - 面向中国境内公众提供生成式 AI 服务时,属于明确监管范围 + - 制度基础与《网络安全法》《数据安全法》《个人信息保护法》等衔接 + - 官方链接:https://www.miit.gov.cn/zcfg/qtl/art/2023/art_f4e8f71ae1dc43b0980b962907b7738f.html +- 《人工智能生成合成内容标识办法》及配套标准: + - 2025-09-01 起施行 + - 明确显式 / 隐式标识及传播环节义务 + - 官方链接:https://www.gov.cn/zhengce/zhengceku/202503/content_7014286.htm +- 2025 年生成式 AI 备案 / 登记公告: + - 说明面向公众上线的相关服务已有持续备案 / 登记路径 + - 官方链接:https://www.cac.gov.cn/2026-01/09/c_1769688009588554.htm +- 《中华人民共和国个人信息保护法》: + - 个人信息处理应具有明确、合理目的,并限于实现处理目的的最小范围 + - 官方链接:https://www.npc.gov.cn/npc/c2/c30834/202108/t20210820_313088.html + +## 10. 最终判断 + +如果只能在中国法域选两个场景开始做,我的建议是: + +- **先做:企业合同 / 合规 copilot** +- **再做:中文 Legal RAG / citation-safe drafting** + +前者是最稳的商业入口,后者是长期最关键的能力底座。 + +当前不建议把资源优先投入到: + +- 面向公众的开放式法律意见 bot +- 法院提交自动化 + +因为这些方向会比前两个场景更早撞上更重的责任、流程和治理成本。 diff --git a/research/court-facing-ai-rules-sanction-risk-tracker-2026-03-20.md b/research/court-facing-ai-rules-sanction-risk-tracker-2026-03-20.md new file mode 100644 index 0000000..f9d4da2 --- /dev/null +++ b/research/court-facing-ai-rules-sanction-risk-tracker-2026-03-20.md @@ -0,0 +1,431 @@ +# 法院链路 AI 规则与 sanction 风险跟踪表 + +日期:2026-03-20 + +目的:补一份**聚焦法院 / 仲裁 / tribunal 提交链路**的专项跟踪文档,专门盯 filing、citation、fact verification、affidavit / witness material、disclosure 和 sanction risk。这份文档不重复一般性监管清单,而是把最容易改变“能不能碰 court-facing workflow”的规则和风险收口成一张表。 + +适用范围: + +- `research/legal-ai-opportunity-risk-matrix-2026-03-20.md` +- `research/legal-ai-regulatory-monitoring-tracker-2026-03-20.md` +- `research/singapore-legal-ai-go-no-go-memo-2026-03-20.md` +- `research/hong-kong-legal-ai-go-no-go-memo-2026-03-20.md` +- `research/hong-kong-legal-ai-management-brief-2026-03-20.md` +- `research/uk-australia-uae-legal-ai-market-comparison-2026-03-20.md` +- `research/china-legal-ai-go-no-go-memo-2026-03-20.md` +- `research/legal-ai-research-package-register-2026-03-20.md` + +这不是法律意见,而是 `2026-03-20` 的研究运营快照。 + +## 1. 结论先行 + +当前阶段,这个研究包对 **court-facing AI** 的默认结论仍然不变: + +- **只允许内部 pre-filing QA、citation check、fact-verification support、filing checklist** +- **不允许 auto-file** +- **不允许把 AI 输出包装成 court-ready final product** +- **不允许把未经强复核的 AI 输出直接写进 affidavits、witness statements、expert reports 或正式提交材料** + +如果必须用一句话概括本轮官方材料的共同方向,就是: + +- 法院并没有普遍禁止 AI +- 但法院和法官持续强调:**提交责任、人类核验责任、引文真实性和事实真实性仍由提交方承担** +- 一旦把**虚构引文、错误事实、未经核验的 AI 输出**放进 court documents,风险会快速从“产品错误”升级为**costs / wasted costs / strike-out / regulator referral / contempt** 级别的问题 + +## 2. 为什么单独盯这一层 + +现有研究包已经把 court-facing workflow 放在最后进入的高风险阶段,但这类风险和一般合规风险不同,原因有三点: + +- 它直接碰到**candor to tribunal / duties to court / evidence integrity** +- 它的后果不是普通产品投诉,而可能是**程序制裁、费用制裁、职业纪律或法官直接点名** +- 它的规则并不总是写在同一个地方,而是散落在**court guide、practice direction、judgment、disciplinary referral、judicial guidance** 里 + +所以需要一份单独 tracker,把“能不能碰 filing、citation、fact verification”单列出来。 + +## 3. 默认产品红线与最小允许范围 + +| 项目 | 当前默认结论 | +| --- | --- | +| auto-file / 自动向法院提交 | 不允许 | +| 生成可直接提交的 final pleading / submission | 不允许 | +| 生成 affidavits / witness statements / expert reports 的最终内容 | 不允许 | +| 无来源裸答进入 court documents | 不允许 | +| AI 直接决定 facts / authorities 是否真实 | 不允许 | +| 内部 pre-filing QA | 可以,但必须有人类逐项确认 | +| citation checker | 可以,但必须回链 primary source | +| fact-verification checklist | 可以,但只能作为人工核验辅助 | +| filing checklist / disclosure checklist | 可以,但不能替代律师 / court user 最终责任 | + +## 4. 法域专项跟踪表 + +| 法域 | 当前官方姿态 | filing / disclosure 规则 | citation / fact / evidence 规则 | sanction / 程序风险 | 当前默认产品姿态 | +| --- | --- | --- | --- | --- | --- | +| 中国 | 按当前研究包处理:**仍无单独的全国 court-user GenAI filing 指引被纳入本包**;继续盯最高法、互联网法院和法院公告 | 当前研究包里仍把法院提交链路工具放在 `P2 / 最后进入`,不把中国作为 court-facing 首发法域 | 当前重点仍是“不要直接进入法院提交链路”;citation / fact verifier 只允许作为内部 QA 思路,不允许对外承诺 | 一旦出现正式 court-facing 规则,默认按 `L3` 处理并专项复核 | 继续保持 `NO-GO / 不进入 filing chain` | +| 新加坡 | Singapore Courts **不禁止** court users 使用 GenAI 起草 court documents,但明确 guide 不改变既有法定义务、规则、职业规范和 practice directions | 除非法院特别要求,**不需要预先主动声明**使用了 GenAI;但 court user 对输出负全部责任 | 法律人和 SRP 都必须确保提交材料**独立核验、准确、真实、适当**;AI 不得被用来 fabricate / tamper evidence;现有要求继续适用于被引用的 case law、legislation、textbooks、articles | guide 本身没有单列 AI-only sanction ladder,但风险直接回到既有 court / professional consequences | 只允许 professional-internal drafting + pre-filing QA;不允许 auto-file | +| 香港 | 当前公开法庭材料里,最直接的官方文本仍是 **Judiciary 给 judges / staff 的 AI 指引**;其重点不是放开 court-facing use,而是提醒核验、保密和 legal analysis 风险 | 本轮 source set 未纳入单独的 public court-user filing guide;因此仍不把香港当作 court-facing workflow 已明确开放的法域 | Judiciary 明确提醒:AI 可能虚构 cases / citations / quotes,也可能给出 factual errors;法官在适当情形下应提醒律师履行义务并确认其已核验 AI research / citations,也可向 litigant 询问做了哪些 accuracy checks;在无 proven confidentiality 与 adequate verification 前,using GenAI for legal analysis **not recommended** | 当前公开材料更像“谨慎信号”而不是“可直接操作的 court-user safe harbour”;因此 sanction risk 暂按高风险待处理 | 保持 `NO-GO / not court-ready by default`;只保留内部 research / QA | +| 英格兰及威尔士 | CJC 已发布 **Interim Report and Consultation**,范围明确覆盖 pleadings、witness statements、expert reports;`current-work` page 显示 eight-week consultation 正在进行,`latest-news` page 进一步写明截止到 `2026-04-14 23:59`;同时 `Ayinde / Al-Haroun` 已把假引文 / 假材料的 sanction 路径讲清楚 | 当前不再只是抽象讨论“需不需要新规则”;consultation 的初步 proposal 已经是:在特定情形下要求 legal representatives 就 AI 使用作 declaration,核心落点是**AI 被用于生成法院拟采信的证据** | `Ayinde / Al-Haroun` 明确:律师对提交给法院的材料仍负专业责任;把 false citations / false material 放进 court documents,即使是 AI 造成、或未充分检查,也会进入 court response 范围 | 已明确存在的法院 powers 包括:public admonition、costs order、wasted costs order、strike-out、regulator referral、contempt,极端情形甚至 police referral;假材料进 court 可能触发 Hamid jurisdiction | 只允许内部 pre-filing QA、citation / fact verifier、draft review;不允许任何 auto-file 或 final court-doc automation | +| 澳大利亚 | Federal Court 当前仍在形成最终 position;2025-06-13 consultation 已结束,AI Project Group 正在 review submissions。Queensland 已给出更具体的 public guidance 和 practice directions | Federal Court 现阶段强调:parties 对 tendered material 继续负责;如 Judge / Registrar 要求,则应 disclose AI use。Queensland 则把 “accuracy of references” 拉到 practice directions 层 | Queensland 明确提醒:GenAI 会 make up fake cases / citations / quotes、refer to non-existent texts、get facts wrong;使用 AI 起草 affidavit / witness statement 时,最终文本必须准确反映当事人自己的 knowledge and words | Queensland 公开写明:如果带着 fake citations 或 inaccurate AI-generated information 进 court 并导致 hearing delay,**可能被下 costs order** | 澳大利亚继续按“碎片化 court rules”处理:只允许单法院 / 单工作流的内部 QA,不允许全国统一 court automation | + +## 5. 需要重点盯的 court-facing 触发点 + +以下事件出现时,不应该只记进一般性监管 tracker,而应直接回写本文件: + +- 新的 court guide / practice direction / registrar circular 明确 AI filing、AI disclosure、AI certification +- 新 judgment 直接处理 fake citations、fabricated facts、AI-generated witness material +- 法院首次明确要求: + - mandatory disclosure + - affidavit / witness statement 特别规则 + - expert report 使用 AI 的 leave requirement + - citation 或 authority 逐项核验要求 +- 法院或监管机构开始把 AI misuse 上升到: + - wasted costs + - strike-out + - regulator referral + - contempt + +## 6. 对产品和交付的直接含义 + +如果产品要碰 court-facing workflow,当前最小可接受要求至少包括: + +- `citation verifier`:每条 authority 必须回链到 primary source +- `fact verifier checklist`:事实、日期、金额、当事人、程序状态都要人工确认 +- `document history`:保留 prompt、检索结果、输出、编辑痕迹和 sign-off +- `no auto-file`:任何 filing 只能由人类最终提交 +- `role-based sign-off`:律师 / 法务 / court user 对最终文本具名负责 +- `evidence boundary`:不得让 AI fabricate、embellish、strengthen、dilute evidence +- `disclosure readiness`:法院或 registrar 一旦要求 disclose,产品和团队要能立即说明 AI 在哪一步被用到 + +## 7. 一旦命中,默认按 L3 处理的红旗 + +- 出现 fake citation、fake quote、non-existent authority +- court document 中出现 AI 补出来的事实、日期、引述、法律依据 +- 团队想把 affidavit / witness statement / expert report 交给 AI 直接起草最终内容 +- 团队要求系统直接 auto-file 或生成“可直接提交法院”的 final package +- 法院开始要求 disclosure / certification,但产品和流程没有准备 +- 出现代表性 sanctions case,表明 court-facing 风险已从“理论上高”变成“正在被处罚” + +## 8. 默认回写哪些文档 + +| 变化类型 | 默认回写文档 | +| --- | --- | +| 新加坡 court-user guide / circular / sanctions 变化 | `singapore-legal-ai-go-no-go-memo`、`legal-ai-regulatory-monitoring-tracker`、本文件 | +| 香港 Judiciary / Law Society / court-facing 边界变化 | `hong-kong-legal-ai-go-no-go-memo`、`hong-kong-legal-ai-management-brief`、`legal-ai-regulatory-monitoring-tracker`、本文件 | +| 英格兰及威尔士 court-doc AI 规则 / sanctions 变化 | `uk-australia-uae-legal-ai-market-comparison`、`legal-ai-opportunity-risk-matrix`、`legal-ai-regulatory-monitoring-tracker`、本文件 | +| 澳大利亚 court guidance / practice direction / sanctions 变化 | `uk-australia-uae-legal-ai-market-comparison`、`legal-ai-opportunity-risk-matrix`、本文件 | +| 中国 court-facing AI 规则首次明确化 | `china-legal-ai-go-no-go-memo`、`legal-ai-opportunity-risk-matrix`、`legal-ai-regulatory-monitoring-tracker`、本文件 | + +## 9. 专项监测节奏与责任分工 + +### 9.1 周度节奏 + +每周固定扫一轮下面这些来源: + +- 新加坡:Singapore Courts、Registrar’s Circular、相关 judgments / notices +- 香港:Judiciary、HKLII / 公开判决、Law Society 新 guidance +- 英格兰及威尔士:CJC、Judiciary judgments、相关 practice direction / consultation 更新 +- 澳大利亚:Federal Court notices、Queensland Courts practice directions / guidance +- 中国:最高法、互联网法院、地方法院公开规则与通报 + +默认责任人: + +- `Research owner`:每周扫源、去重、记录新事件 +- `Legal / compliance owner`:判断是否改 filing / disclosure / verification 边界 +- `Product owner`:判断是否要冻结 court-facing roadmap、营销表述或 demo 范围 + +owner-of-record(专项 tracker): + +| 范围 | primary owner of record | secondary owner | 必做留痕 | +| --- | --- | --- | --- | +| 本 tracker 周度 source sweep | `Research owner` | `Legal / compliance owner` | 每周至少新增一条 dated no-change 或 change event | +| filing / disclosure / verification 边界判断 | `Legal / compliance owner` | `Product owner + Research owner` | 命中 `L2 / L3` 时补影响分析并决定是否改默认产品红线 | +| roadmap / demo freeze 决策 | `Product owner` | `Management sponsor` | 命中 `L3` 时同步冻结相关对外表述或 roadmap 项 | +| 仓库 / 知识库归档同步 | `Research owner` | `Product owner` | `L2 / L3` 事件与月度汇总同步到 `legal-ai-research-package-register-2026-03-20` | + +### 9.2 月度节奏 + +每月固定做一次 sanctions / judgments 回看,单独盯: + +- fake citation +- fabricated facts +- affidavit / witness statement 风险 +- expert report 风险 +- court-imposed disclosure / certification / costs / wasted costs / strike-out + +### 9.3 季度节奏 + +每季度至少做一次 court-facing 结论回顾: + +- 是否仍维持“只做 internal pre-filing QA,不做 auto-file” +- 是否有某个法域已出现足以改变 roadmap 的正式规则 +- 是否需要把某个场景从 `HOLD` 调整为 `NO-GO`,或反过来从观察名单调到可试点 + +## 10. L2 / L3 触发阈值细化 + +下面这些变化,默认不按普通 L1 处理: + +### 10.1 默认按 L2 处理 + +- 新 consultation、practice note、registrar circular 明确提到 AI filing / AI disclosure +- 新 judgment 讨论 fake citation、hallucination、fact verification,但未直接出现 sanctions +- 法院或职业团体首次把 affidavit / witness statement / expert report 单独拿出来谈 AI 风险 + +### 10.2 默认按 L3 处理 + +- 法院要求 mandatory disclosure、certification 或 affidavit supporting AI use +- 法院或 judgment 明确写出 costs / wasted costs / strike-out / regulator referral / contempt 的可适用后果 +- 出现代表性 sanctions case,足以改变某法域的 court-facing 风险判断 +- 任一法域发布可直接改变产品边界的正式 practice direction 或等效规则 + +## 11. 触发后的默认动作 + +### 11.1 命中 L2 时 + +- `5` 个工作日内补一段影响分析 +- 明确是否需要更新: + - `legal-ai-opportunity-risk-matrix` + - 本文件 + - 对应法域 memo / brief +- 检查现有产品表述是否需要更保守 + +### 11.2 命中 L3 时 + +- `48` 小时内发起专项复核 +- 立即冻结以下任一对外表述或 roadmap 项: + - court-ready drafting + - filing automation + - affidavit / witness statement generation + - automated citation / fact approval +- 先把相关场景打到 `HOLD`,再决定是否恢复 +- 必要时同步更新售前禁区、demo 边界和客户沟通口径 + +### 11.3 命中 sanctions case 时的默认问题清单 + +- 这是 fake citation、fake quote、fake fact,还是 disclosure failure? +- 法院处罚的是律师、当事人、机构,还是全部? +- 法院是在“已有规则”下处罚,还是因为新 guidance 已经落地? +- 我们现有产品 / 流程是否会让同类问题重复出现? + +## 12. 本轮官方锚点 + +新加坡: + +- Singapore Courts `Guide on the Use of Generative AI Tools by Court Users` + - `https://www.judiciary.gov.sg/docs/default-source/news-and-resources-docs/guide-on-the-use-of-generative-ai-tools-by-court-users.pdf?sfvrsn=3900c814_1` + +香港: + +- Hong Kong Judiciary `Guidelines on the Use of Generative Artificial Intelligence` + - `https://www.judiciary.hk/doc/en/court_services_facilities/guidelines_on_the_use_of_generative_ai.pdf` + +英格兰及威尔士: + +- Civil Justice Council `Use of AI in preparing court documents` + - `https://www.judiciary.uk/related-offices-and-bodies/advisory-bodies/cjc/current-work/use-of-ai-in-preparing-court-documents/` +- Civil Justice Council `Latest news` + - `https://www.judiciary.uk/related-offices-and-bodies/advisory-bodies/cjc/latest-news/` +- Civil Justice Council `Interim Report and Consultation - Use of AI for Preparing Court Documents` + - `https://www.judiciary.uk/wp-content/uploads/2026/03/Interim-Report-and-Consultation-Use-of-AI-for-Preparing-Court-Docume.pdf` +- `Ayinde v London Borough of Haringey; Al-Haroun v Qatar National Bank` + - `https://www.judiciary.uk/wp-content/uploads/2025/06/Ayinde-v-London-Borough-of-Haringey-and-Al-Haroun-v-Qatar-National-Bank.pdf` + +澳大利亚: + +- Federal Court of Australia `Notice to the Profession` + - `https://www.fedcourt.gov.au/news-and-events/29-april-2025` +- Federal Court of Australia Annual Report 2024–25 (`Use of Generative Artificial Intelligence`) + - `https://www.fedcourt.gov.au/__data/assets/pdf_file/0011/572897/Part-1.pdf` +- Queensland Courts `Using Generative AI` + - `https://www.courts.qld.gov.au/going-to-court/using-generative-ai` +- Queensland Courts `Practice Direction 4 of 2025: Accuracy of References in Submissions` + - `https://www.courts.qld.gov.au/__data/assets/pdf_file/0011/882875/lc-pd-4-of-2025-Accuracy-of-References-in-Submissions.pdf` + +中国: + +- 最高人民法院《最高人民法院关于规范和加强人工智能司法应用的意见》 + - `https://www.court.gov.cn/zixun/xiangqing/382461.html` +- 最高人民法院《推进现代科技与司法审判工作深度融合 最高法发布“法信法律基座大模型”研发成果》 + - `https://www.court.gov.cn/zixun/xiangqing/447711.html` +- 最高人民法院官网 + - `https://www.court.gov.cn/` +- 最高人民法院知识产权法庭官网 + - `https://ipc.court.gov.cn/` + +当前这组中国侧官方材料,更像法院内部 AI 应用规范、智能审判 / 诉讼服务建设与涉 AI 裁判规则来源,而不是面向 court users 的单独 filing / disclosure / verification guide。 + +## 13. 下次刷新优先页面清单 + +| 法域 | 优先页面 | 本轮核到的关键信号 | 下次刷新最该看什么 | 默认频率 | +| --- | --- | --- | --- | --- | +| 新加坡 | Singapore Courts `Guide on the Use of Generative AI Tools by Court Users` | court user 可使用 GenAI,但仍负独立核验和提交责任;默认不要求预先主动声明;法院可在有疑问时要求说明是否用了 AI 以及如何核验 | 是否新增 disclosure / certification 要求;是否新增与 affidavits、witnesses、experts 相关的补充规则 | 每周 | +| 香港 | Hong Kong Judiciary `Guidelines on the Use of Generative Artificial Intelligence` | 强调可能出现 fictitious cases / citations / quotes、factual errors;在无 proven confidentiality 与 adequate verification 前,legal analysis 不推荐;法官可要求律师或 litigant 说明核验情况 | 是否扩展到 court users / filing materials;是否新增更明确的 disclosure、verification 或 sanctions 信号 | 每周 | +| 英格兰及威尔士 | Civil Justice Council `Use of AI in preparing court documents` + `Latest news` + interim consultation paper | 当前 consultation 仍在进行;current-work page 显示 eight-week consultation closing on `2026-04-14`,latest-news page 进一步写明截止到 `2026-04-14 23:59`;interim paper 已提出在特定情形下作 AI-use declaration,核心是 AI 被用于生成法院拟采信的证据;范围明确覆盖 pleadings、witness statements、expert reports | consultation 是否结束并转 final report;是否出现 CPR / practice direction / court-document-specific disclosure 要求 | 每周 | +| 英格兰及威尔士 | `Ayinde / Al-Haroun` judgment | sanctions levers 已较清楚,足以改变对 court-facing drafting 的默认风险判断 | 是否有后续同类案件把 sanctions ladder 进一步固化或扩张 | 每周 | +| 澳大利亚 | Federal Court `Notice to the Profession` | Court 正在形成 Guideline 或 Practice Note;parties 继续对 tendered material 负责;若 Judge / Registrar 要求则应 disclose AI use | consultation 后是否落地正式 Guideline / Practice Note;是否出现更明确的 disclosure 义务 | 每周 | +| 澳大利亚 | Queensland Courts `Using Generative AI` + `Accuracy of References in Submissions` practice directions | 明确提醒 fake citations / quotes / facts 风险;不准把 affidavit / witness statement 弄成不反映本人 knowledge and words 的材料;出现 delay 可触发 costs order | 是否新增其他法院层级或更多子流程规则;是否把 AI-related accuracy issue 继续写进 practice directions | 每周 | +| 中国 | 最高法 / 互联网法院 / 地方法院公开规则与公告 | 当前更容易看到的是法院内部 AI 应用规范、诉讼服务 / 审判辅助建设材料和涉 AI 裁判规则;本轮 source set 里仍未形成单独全国 court-user AI filing guide | 是否首次出现 court-facing AI 提交、核验、披露或制裁的正式规则文本 | 每周 | + +## 14. 监测事件记录模板 + +每次命中 court-facing 事件,建议至少按下面这张表记录一次: + +| 日期 | 法域 | 来源页面 | 变化类型 | 触发等级 | 这次变化说了什么 | 会不会改变当前边界 | 需要更新哪些文档 | owner | 截止时间 | +| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | +| `YYYY-MM-DD` | 中国 / 新加坡 / 香港 / 英格兰及威尔士 / 澳大利亚 | 官方链接 | disclosure / verification / sanctions / filing / evidence | `L1 / L2 / L3` | 用一句话描述新增规则或案件 | `不会 / 可能 / 会` | `matrix / memo / brief / 本文件` | `Research / Legal / Product` | `YYYY-MM-DD` | + +最低填写要求: + +- 必须写清这是**规则更新、判例、practice direction、registrar circular 还是 consultation** +- 必须写清这次变化影响的是**filing、citation、fact verification、evidence、disclosure 还是 sanction** +- 必须写清默认动作:**记录、重看、专项复核、先转 HOLD** + +## 15. 本次实时刷新记录(2026-03-20 UTC) + +| 日期 | 法域 | 来源页面 | source access summary | 变化类型 | 触发等级 | 这次变化说了什么 | 会不会改变当前边界 | 需要更新哪些文档 | owner | 截止时间 | +| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | +| `2026-03-20` | 新加坡 | Singapore Courts guide | Guide:`direct` | verification / disclosure | `L1` | 本次实时复核未见新 court-user disclosure 义务;仍是不要求预先主动声明,但法院可要求说明 AI 使用与核验方式 | `不会` | `本文件` | `Research` | `2026-03-27` | +| `2026-03-20` | 香港 | Hong Kong Judiciary guidance | Guidance:`direct` | verification / filing | `L1` | 本次实时复核未见单独的 public court-user filing guide;当前仍以核验、保密和对 legal analysis 的谨慎态度为主 | `不会` | `本文件` | `Research` | `2026-03-27` | +| `2026-03-20` | 中国 | 最高法 AI 司法应用意见 + 最高法官网 / 知识产权法庭官网 source sweep | `SPC` AI 司法应用意见:`source unavailable (automation)`,`court.gov.cn` 当前 shell 抓取返回 `403`;法信材料:`source unavailable (automation)`,同类 `SPC` 官方页当前 shell 抓取返回 `403`,后续默认转人工浏览器核验 | filing / verification | `L1` | 本次实时复核看到的仍是法院内部 AI 应用规范、智能审判 / 诉讼服务建设和涉 AI 裁判规则来源;在当前官方 source set 中,仍未见单独全国 court-user AI filing / disclosure / verification guide | `不会` | `本文件` | `Research` | `2026-03-27` | +| `2026-03-20` | 英格兰及威尔士 | CJC current-work page + latest-news page + interim report | current-work:`direct`;latest-news:`direct`;interim report:`direct` | disclosure / evidence | `L2` | current-work page 仍显示 eight-week consultation 正在进行,latest-news page 进一步写明 consultation 截止到 `2026-04-14 23:59`;interim paper 的初步 proposal 仍指向:在特定情形下要求 legal representatives 作 AI-use declaration,核心是 AI 被用于生成法院拟采信的证据 | `可能` | `uk-australia-uae-legal-ai-market-comparison` / `legal-ai-regulatory-monitoring-tracker` / `本文件` | `Research + Legal` | `已提前完成 2026-03-25 L2 分析;下次看 2026-04-14 23:59` | +| `2026-03-20` | 澳大利亚 | Federal Court AI notice / annual report + Queensland guidance / practice direction | `Federal Court` 官方 notice / annual-report 页面:`source unavailable (automation)`,`fedcourt.gov.au` 当前 shell 抓取返回 `403` challenge;Queensland guidance:`direct`;Queensland practice direction:`direct` | disclosure / sanctions | `L1` | 本次实时复核未见 Federal Court 已落地最终 guideline / practice note;Queensland 的 fake citations、accuracy 与 costs-risk 信号保持有效 | `不会` | `本文件` | `Research` | `2026-03-27` | + +### 15.1 首次 court-facing sweep 的逐锚点判断(按 section 17.1 / 17.3 补齐) + +为了避免首轮 `2026-03-20` court-facing 刷新只有法域级摘要、却没有逐锚点 `no-change / change` 判断,现按后续周度 sweep 的最低留痕格式补齐如下: + +| 日期 | 法域 | 本轮检查过的官方锚点 | source access / fallback used | 每个锚点的 `no-change / change` 判断 | impact type | meaningful-change 结论 | 会不会改变当前统一边界 | 需要回写哪些文档 | owner | 下次动作 | +| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | +| `2026-03-20` | 新加坡 | `Singapore Courts` court-user guide | Guide:`direct` | Guide:`no-change`,仍明确法院不禁止使用 generative AI 准备 court documents;除非法院特别要求,预先主动声明 AI 使用并非默认义务;court user 仍须独立核验并不得用 AI 生成拟向法院依赖的证据 | `verification / disclosure / evidence` | `不是,仍按 L1 no-change` | `不会` | `无` | `Research` | `2026-03-27` | +| `2026-03-20` | 香港 | `Judiciary` generative AI guidance | Guidance:`direct` | Guidance:`no-change`,当前公开锚点仍是面向 Judges / Judicial Officers / support staff 的内部使用原则,并未形成单独 public court-user filing / disclosure guide;因此仍不足以改写香港 court-user 边界判断 | `verification / filing` | `不是,仍按 L1 no-change` | `不会` | `无` | `Research` | `2026-03-27` | +| `2026-03-20` | 中国 | `SPC` AI 司法应用意见;`SPC` / 法信公开材料 | 意见:`source unavailable (automation)`,`court.gov.cn` 官方页在本轮 shell 抓取中返回 `403`;法信材料:`source unavailable (automation)`,同类 `SPC` 官方页在本轮 shell 抓取中返回 `403`,后续如需复核默认转人工浏览器核验 | 意见:`no-change`,仍是法院侧 AI 应用治理与辅助审判定位;法信材料:`no-change`,仍是法律大模型基础设施和法院侧能力建设信号,不构成全国 court-user filing / disclosure / verification 规则 | `filing / verification` | `不是,仍按 L1 no-change` | `不会` | `无` | `Research` | `2026-03-27` | +| `2026-03-20` | 英格兰及威尔士 | `CJC current-work` page;`CJC latest-news` page;`CJC` interim report / consultation paper | current-work:`direct`;latest-news:`direct`;interim report:`direct` | current-work:`no-change`,仍是针对 pleadings、witness statements、expert reports 的规则咨询并于 `2026-04-14` 截止;latest-news:`no-change`,仍把截止时间写到 `2026-04-14 23:59`;interim report:`no-change`,仍将“AI 用于生成法院拟采信的证据时可能需要 declaration”作为当前核心 proposal | `disclosure / evidence` | `是,按 L2` | `可能` | `uk-australia-uae-legal-ai-market-comparison` / `legal-ai-regulatory-monitoring-tracker` / `本文件` | `Research + Legal` | `已提前完成 2026-03-25 L2 分析;下次看 2026-04-14 23:59` | +| `2026-03-20` | 澳大利亚 | `Federal Court` AI notice;`Queensland Courts` generative AI guidance;`Queensland` accuracy-of-references practice direction | Federal Court:`source unavailable (automation)`,`fedcourt.gov.au` 官方页在本轮 shell 抓取中返回 `403` challenge;Queensland guide:`direct`;Queensland practice direction:`direct` | Federal Court:`no-change`,`2025-04-29` notice 仍表示 Court 正在考虑 guideline / practice note,现阶段仍以既有义务和法官 / registrar 要求下的 disclosure 为主;Queensland guide:`no-change`,仍强调 affidavit / witness statement 必须反映当事人本人知识与表达,且 fake citations 可能带来 costs risk;Queensland practice direction:`no-change`,仍将 AI 失准、虚构来源和 responsible person 识别作为当前控制点 | `disclosure / verification / sanctions` | `不是,仍按 L1 no-change` | `不会` | `无` | `Research` | `2026-03-27` | + +### 15.2 提前完成 `2026-03-25` 的英格兰及威尔士 `CJC` `L2` 影响分析 + +基于 `CJC current-work` page、`latest-news` page、interim report 与 consultation cover sheet,本轮把原排队到 `2026-03-25` 的 `L2` 影响分析提前完成,结论如下: + +- 当前变化**足以上调英格兰及威尔士 court-facing evidence / disclosure 风险描述的精度**,但**还不足以改写当前统一边界**。 +- 近端风险并不是“所有 court documents 都要统一 disclosure”,而是 `CJC` consultation 已把最可能收紧的点收窄到 evidence-stage 文档: + - `trial witness statements`:方向是要求 legal representatives 作声明,确认 AI 没有被用于生成、改写、加强、弱化或重述证人证据 + - `expert reports`:方向是要求 experts 识别并说明被用于报告的 AI(纯行政用途除外) + - `skeleton arguments / advocacy documents / disclosure lists and statements`:当前 consultation 反而倾向认为**暂不需要额外 court rule** +- 对研究包的直接影响: + - `NO-GO / no auto-file / no court-ready final output` 结论不变 + - 英格兰及威尔士 court-facing 风险应更明确地区分 `evidence-stage` 与一般 drafting / research:前者更接近 future declaration territory,后者目前仍主要落在既有 court duties + human verification + - 如果未来要做 England-and-Wales court-adjacent workflow,至少要补 `document-type gating`、`evidence-chain provenance log`、`disclosure readiness pack` + - 暂不回写 `legal-ai-opportunity-risk-matrix`,因为法域排序与统一 go / no-go 边界未变 +- 因此,原 `2026-03-25` checkpoint 视为已提前完成;下一次英格兰及威尔士官方节点保持 `2026-04-14 23:59` +- 同日 live recheck 还确认:`CJC current-work` page 当前暴露的 consultation PDF 已切换到 `/2026/03/` official path;这只需要刷新 source anchor,不构成新的 substantive change,也不需要重新开启 `L2` 分析。 + +## 16. 中国 court-facing 监测分层规则 + +为了避免中国侧每次刷新都重复争论“这算不算真正的 court-facing 信号”,默认按下面三层处理: + +| 证据层级 | 典型来源 | 默认等级 | 默认动作 | +| --- | --- | --- | --- | +| 法院内部 AI 应用 / 建设层 | 最高法关于 AI 司法应用的意见、智慧法院 / 法信 / 模型发布、审判辅助 / 诉讼服务建设稿件 | `L1` | 记录到本文件;说明这是 court-side governance / infrastructure,而不是 court-user filing rule;不改变当前 `NO-GO / 不进入 filing chain` | +| 面向诉讼参与人或律师的流程层 | 立案平台公告、诉讼服务中心规则、互联网法院操作说明、地方法院给 litigants / lawyers 的 AI 使用要求 | `L2` | `5` 个工作日内补影响分析;判断是否已出现 disclosure / verification / certification 信号;必要时回写中国 memo 和矩阵 | +| 全国性或高位阶 court-user 规则 / 制裁层 | 最高法正式 notice / 司法解释 / 全国诉讼服务规则、明确要求 AI disclosure / certification 的全国性文本、代表性 sanctions case | `L3` | `48` 小时内专项复核;先把任何 court-facing roadmap 维持在 `HOLD / NO-GO`;同步回写中国 memo、矩阵和总 tracker | + +中国侧只有在第二层或第三层材料出现时,才应认真考虑是否把“目前没有单独全国 court-user filing guide”的判断改掉。 + +## 17. 当前开放监测队列 + +为了避免 section 15 的 refresh 记录写完后没人继续追踪,当前默认开放队列如下: + +| 截止日期 | 法域 | 必做动作 | 默认 owner | 触发后默认回写 | +| --- | --- | --- | --- | --- | +| `2026-03-27` | 中国 / 新加坡 / 香港 / 澳大利亚 | 完成下一轮周度 source sweep,并用 section 14 模板至少记录一次 no-change 或 change event | `Research` | `本文件`,必要时回写对应 memo / tracker | +| `2026-04-14 23:59` | 英格兰及威尔士 | 在 CJC consultation closing time 当天复核 consultation page、latest-news page 和 interim paper,判断是否从 `L2` 升到更明确的规则变更判断 | `Research + Legal` | `本文件`、`legal-ai-regulatory-monitoring-tracker`、`uk-australia-uae-legal-ai-market-comparison`、必要时 `legal-ai-opportunity-risk-matrix` | + +### 17.1 下次 court-facing sweep 的最低留痕包 + +为了避免后续 court-facing 周度刷新只留下“看过了 / 没变化”而无法复核,后续 `no-change` 或 `change` 事件至少要带上下面这些留痕项: + +- 检查日期(UTC)和执行 owner +- 本轮至少检查过的官方锚点 + - 新加坡 `Singapore Courts` court-user guide + - 香港 `Judiciary` generative AI guidance + - 英格兰及威尔士 `CJC current-work` page + - 英格兰及威尔士 `CJC latest-news` page + - 英格兰及威尔士 relevant consultation / interim or final report page + - 澳大利亚 `Federal Court` court-facing AI notice / annual-report entry + - 澳大利亚 `Queensland Courts` guidance / practice direction + - 中国 `SPC` AI 司法应用 / 法院公开规则源 +- 对每个锚点补一条 source access 状态: + - `direct` + - `redirected` + - `timeout` + - `source unavailable` + - `source unavailable (automation)` +- 如果原定官方 URL 重定向、超时、反爬挑战或暂时不可达: + - 不得直接记为 `no-change` + - 必须写明 `source unavailable` 或 `source unavailable (automation)` + - 如果使用 fallback official page,必须补记具体页面 + - 如果自动抓取被 challenge / `403` 拦截,必须把人工浏览器复核写入 `下次动作` + - 如果没有找到可替代的官方页面,必须把该锚点标为“本轮未完成核查” +- 对每个锚点写一句 `no-change / change` 判断 +- 如果判断为 `change`,必须写明影响的是: + - filing + - disclosure + - verification + - evidence + - sanctions +- 必须写明会不会改变当前 court-facing 统一边界: + - `不会` + - `可能` + - `会` +- 如果答案是 `可能` 或 `会`,必须同步指定要不要回写: + - `legal-ai-opportunity-risk-matrix` + - 对应法域 memo / brief + - `legal-ai-regulatory-monitoring-tracker` + - `本文件` +- 最后用 section `17.2` 的规则判断:这次变化到底只是 `L1` 留痕,还是已经构成需要升级分析、冻结路线或改默认产品红线的 meaningful change + +### 17.2 court-facing sweep 的 meaningful-change 判定 + +为了避免把所有 court-facing 新材料都误判成需要重写整包,默认按下面几类问题判断什么才算值得升级处理的 meaningful change: + +- `L1 no-change` + - 只是重复强化既有核验、保密、职责或 accuracy 要求 + - 只是法院内部 AI 应用 / 建设 / 治理材料,没有新增 court-user filing / disclosure / certification 规则 + - consultation / guidance 仍维持既有方向,没有把要求推进到新的正式规则层级 + - 对英格兰及威尔士 `CJC` 事项,如果 `current-work`、`latest-news` 与相关 consultation / interim material 只是继续确认 consultation 仍 open、scope 仍覆盖 pleadings / witness statements / expert reports,且 evidence-stage proposal 没有实质推进,默认只按 tracker-level `L1 no-change` 记为 status verification;在 `2026-04-14 23:59` closing-time recheck 之前,不重复开启新的 `L2` 分析或 memo 回写 +- 官方源可达性 / 替代页面 + - 如果只是 court-facing 官方页在本轮自动抓取中重定向、超时、`403` challenge 或暂时不可达,但已有官方替代页面或人工浏览器复核仍支撑相同结论,默认不算 substantive meaningful change;应按 `L1` 记录 `source unavailable / source unavailable (automation)` 与 fallback official page used + - 如果关键官方锚点在本轮既无法自动访问,也找不到可替代的官方页面,默认仍不算规则实质变化,但不得写成“已完成 no-change”;必须把该锚点记为未完成核查,并把人工复核或补刷写入 `下次动作` +- `L2 meaningful change` + - 新 consultation、registrar circular、guidance、practice note 或 judgment 可能改变某个子流程的 filing / disclosure / verification / evidence 风险 + - 新材料开始把 pleadings、witness statements、expert reports、authorities、disclosure review 等子流程单独拿出来谈 AI 规则 + - 新材料虽然尚未形成正式强制义务,但已经足以要求补影响分析并判断是否需要上调风险或更新 memo / matrix +- `L3 meaningful change` + - 出现 mandatory disclosure、certification、leave requirement、practice direction、正式 court-user filing rule、代表性 sanctions case,或其他足以改变默认产品边界的高位阶材料 + - 任一法域把 fake citation、fabricated facts、AI-generated evidence、court-user misuse 直接推到 costs / wasted costs / strike-out / regulator referral / contempt 的明确适用层 + - 中国出现全国性或高位阶的 court-user AI filing / disclosure / verification 正式规则文本 + +只要命中下面任一结果,就应认定为 meaningful change,而不是普通留痕: + +- 会改变当前 `no auto-file / no court-ready final product` 的统一红线 +- 会改变 affidavits、witness statements、expert reports、citation verifier、fact verifier 的默认允许范围 +- 会改变 disclosure readiness、role-based sign-off 或 evidence boundary 的最低要求 +- 会改变 `legal-ai-opportunity-risk-matrix` 里的 go / no-go、优先级或法域进入顺序 +- 会要求冻结现有 court-facing roadmap、demo 范围或对外表述 + +如果新材料只是重复强化既有要求,没有改变上述任一边界或控制项,默认保留在 `L1 no-change`,只需按 section `17.1` 留下可复核证据,并按 section `14` 记录 dated event,不必重写 memo / matrix。 + +### 17.3 court-facing 周度标准输出模板 + +为了避免不同执行人留下的 court-facing dated update 颗粒度不一致,后续周度 sweep 默认至少按下面模板输出一次,哪怕结论是 `no-change` 也一样: + +| 日期 | 法域 | 本轮检查过的官方锚点 | source access / fallback used | 每个锚点的 `no-change / change` 判断 | impact type | meaningful-change 结论 | 会不会改变当前统一边界 | 需要回写哪些文档 | owner | 下次动作 | +| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | +| `YYYY-MM-DD` | 中国 / 新加坡 / 香港 / 英格兰及威尔士 / 澳大利亚 | 列出本轮实际检查过的官方页面 | 对每个锚点写 `direct / redirected / timeout / source unavailable / source unavailable (automation)`;如使用 fallback official page 则补记具体页面 | 至少一句话写清每个锚点是 `no-change` 还是 `change` | `filing / disclosure / verification / evidence / sanctions` | `不是` / `是,按 L2` / `是,按 L3` | `不会 / 可能 / 会` | `matrix / memo / brief / regulatory tracker / 本文件 / 无` | `Research / Legal / Product` | `YYYY-MM-DD` | + +最低填写要求: + +- 如果结论是 `不是 meaningful change`,也必须写明为什么仍是 `L1 no-change` +- `source access / fallback used` 这一列不能留空;如果没有 fallback,也要明确写 `direct`、`redirected`、`timeout`、`source unavailable` 或 `source unavailable (automation)` +- 如果官方页被 `403` challenge、超时或其他自动抓取限制拦截,必须明确写出 `source unavailable (automation)`,并把人工浏览器复核写入 `下次动作` +- 如果结论是 `是,按 L2` 或 `是,按 L3`,必须写清楚触发点是 filing、disclosure、verification、evidence 还是 sanctions +- 如果决定“不回写任何文档”,也要明确写 `无`,避免事后无法判断是“无需回写”还是“漏写” +- 如果本轮新增了 section `12` 之外的新官方 court-facing 锚点,应把新源补进 section `12` 或在本轮记录中说明为什么可以替代既有锚点 diff --git a/research/hong-kong-legal-ai-go-no-go-memo-2026-03-20.md b/research/hong-kong-legal-ai-go-no-go-memo-2026-03-20.md new file mode 100644 index 0000000..b250c8f --- /dev/null +++ b/research/hong-kong-legal-ai-go-no-go-memo-2026-03-20.md @@ -0,0 +1,331 @@ +# 香港法域法律 AI go / no-go 决策备忘录 + +日期:2026-03-20 + +目的:基于现有《法律 LLM / AI 与法律交叉调研》《法律 AI 机会 / 风险矩阵(可执行版)》以及新加坡与香港的对比段落,补一份**香港单独法域**的 go / no-go 备忘录,避免香港继续只作为比较段落存在。本备忘录聚焦 2 个更适合香港起步的高价值法律 AI 场景,明确目标用户、服务模式、合规边界、必要 safeguard、证据缺口和进入优先级。 + +这不是法律意见,而是 `2026-03-20` 的产品、合规和市场进入研究快照。 + +## 1. 结论先行 + +- 目标司法辖区:**香港** +- 评估场景 1:**企业内法律 / 合规 copilot** +- 评估场景 2:**香港法双语 Legal RAG / citation-aware drafting** +- 总体建议: + - 场景 1:**GO(P0)** + - 场景 2:**CONDITIONAL GO(P1)** +- 当前明确不建议优先切入: + - **面向公众的开放式法律意见 bot** + - **直接进入法院提交链路或诉讼材料定稿链路的自动化工具** + +一句话判断: + +- 如果把香港第一站产品定义为**企业内、可审计、强权限控制、强人工复核**的法律 / 合规工作流工具,可以推进。 +- 如果把香港第一站产品定义为**对公众直接给法律结论**,或**替代律师 / 法务直接完成法院或正式法律分析**,不应推进。 + +## 2. 为什么香港值得单独做一份 memo + +香港并不是“风险更低”的法域,但它有三层值得单独判断的结构: + +- **AI / 隐私治理框架相对明确**: + - PCPD 已经发布 `AI: Model Personal Data Protection Framework` + - PCPD 之后又发布员工使用生成式 AI 的清单式指南 + - 说明香港对组织内部使用 AI 的治理重点,已经聚焦到风险评估、人类监督、允许输入什么、输出如何留存、如何做 incident response +- **法律行业数字化和 LawTech 推动是真实存在的**: + - DoJ 已把 LawTech(包括 AI / Gen AI)写进正式政策推动文件 + - DoJ 还把 `Legal Knowledge Engineers` 纳入人才清单,说明法律科技与 AI 已经被当成法律服务能力建设的一部分 +- **法院链路反而给了更强的“不要过早外扩”信号**: + - 香港 Judiciary 的第一套生成式 AI 指南是给 judges、judicial officers 和 support staff 的 + - 其中明确写到:如果没有已证明的 confidential / restricted / private information 保护能力和足够的核验机制,使用 generative AI 做 legal analysis **不推荐** + +这三点合起来意味着: + +- 香港很适合先做**组织内部工具** +- 但不适合把第一站押在**对外法律意见**或**court-facing automation** + +## 3. 目标用户与服务模式 + +### 场景 1:企业内法律 / 合规 copilot + +目标用户: + +- 大中型企业法务团队 +- 合规团队 +- 金融、保险、平台、医疗、零售、物流等强监管行业的区域合规或业务法务团队 +- 有跨境合同、双语材料和内部政策映射需求的团队 + +建议服务模式: + +- **企业内闭环辅助工具** +- 优先 `tenant-isolated SaaS`、专属实例、VPC 或其他强隔离部署 +- 只做条款比对、风险提示、义务映射、政策一致性检查、审批前预筛查 +- 不做自动审批,不做自动对外签发 + +### 场景 2:香港法双语 Legal RAG / citation-aware drafting + +目标用户: + +- 律所研究 / 起草团队 +- 企业法务研究与知识管理团队 +- 仲裁 / 争议解决支持团队 + +建议服务模式: + +- **专业人士内部 research / drafting assistant** +- 默认“先检索、再生成” +- 每次输出都带来源、日期、法域和引用 +- 优先做内部 research memo、法规 / 判例检索、双语摘要和 citation-aware drafting + +## 4. 决策摘要 + +| 场景 | 决策 | 进入优先级 | 推荐服务模式 | 当前不建议的做法 | +| --- | --- | --- | --- | --- | +| 企业内法律 / 合规 copilot | GO | P0 | 企业内、受控、可审计的 workflow copilot,先服务企业法务 / 合规团队 | 自动审批;面向公众开放;跨客户 / 跨 matter 混用数据;默认共享训练 | +| 香港法双语 Legal RAG / citation-aware drafting | CONDITIONAL GO | P1 | 专业人士内部 research / drafting assistant,必须带来源和人工复核 | 无来源裸答;对公众输出个案化法律意见;未经强复核直接用于法院 / 正式法律分析 | + +## 5. 场景 1:企业内法律 / 合规 copilot + +### 5.1 为什么是 GO + +这是我认为香港最适合作为第一落地场景的原因: + +- PCPD 的 `Model Framework` 和员工使用 GenAI 指南,都更像是在告诉组织**如何受控使用 AI**,而不是鼓励开放式对外输出。 +- DoJ 的 `Promoting LawTech` paper 说明,香港政府正在推动法律和争议解决行业采用 LawTech,包括 AI / Gen AI,并计划继续补 ethical 和 security 指南。 +- 香港企业法务和合规场景天然更容易放进: + - 权限管理 + - 日志留存 + - 人工复核 + - 双语文件处理 + - 采购与信息安全审查 + +换句话说,香港现阶段更适合做**in-house legal / compliance software with AI controls**,而不是面向公众的“法律答案机器人”。 + +### 5.2 合规边界 + +可以做: + +- 合同 intake 和 triage +- 条款提取、模板比对、红线提示 +- 内部政策与监管义务映射 +- 审批前预筛查 +- 双语合同摘要和结构化要点提取 + +不要做: + +- 自动批准高风险合同 +- 自动给客户 / 业务方形成正式法律结论 +- 把客户材料默认输入开放公众模型 +- 用同一共享上下文处理多个客户或多个敏感 matter + +### 5.3 必要 safeguard + +- 部署与数据: + - tenant isolation + - matter / document / business-unit 级权限控制 + - retention、deletion、training、sub-processor 条款清晰 +- 输出与流程: + - 风险提示可回链到条款、模板、政策或规则来源 + - 高风险项目必须进入人工 review queue + - 输出明确标记为内部辅助结果,而非最终签发文本 +- 组织治理: + - 内部 AI policy + - permitted tools list + - incident response + - 员工培训与反馈机制 + +### 5.4 证据缺口 + +- 香港客户到底更偏好: + - tenant-isolated SaaS + - 专属实例 + - VPC + - 本地 hosting +- 至少 3 家目标客户访谈: + - 企业法务 + - 合规团队 + - 强监管行业业务 owner +- 一套香港场景 benchmark: + - 双语条款抽取 + - 风险提示 + - 义务映射 + - 升级命中率 + +### 5.5 go / no-go 触发条件 + +保持 GO 的条件: + +- 能明确限制为企业内工具 +- 有强权限隔离和日志方案 +- 高风险输出必须人工复核 +- 供应商条款能说清 retention / deletion / training + +转为 NO-GO 的条件: + +- 客户要求自动审批 +- 无法做租户或 matter 隔离 +- 供应商默认保留或再训练客户数据 +- 无法交代 incident response 和 data handling policy + +## 6. 场景 2:香港法双语 Legal RAG / citation-aware drafting + +### 6.1 为什么是 CONDITIONAL GO + +这个场景价值很高,但我把它放在 `P1` 而不是 `P0`,原因不是商业价值不足,而是香港目前给出的官方信号更偏“审慎采用”: + +- 香港 Judiciary 的 AI 指南明确提示: + - generative AI chatbots 受日期范围、法域覆盖和可访问法律材料类型限制 + - 如果没有 proven confidentiality protection 和足够核验机制,using generative AI for legal analysis **is not recommended** +- Law Society 的 AI position paper 明确强调: + - hallucinations 风险 + - data protection / data governance + - transparency / disclosure + - proper human oversight + +这意味着香港法 Legal RAG 不是不能做,而是必须做成: + +- **source-grounded** +- **bilingual aware** +- **human-reviewed** +- **not court-ready by default** + +### 6.2 合规边界 + +可以做: + +- 香港法规、判例、practice materials 的检索增强问答 +- 内部 research memo 草稿 +- 双语摘要、翻译辅助、citation-aware drafting +- 监管更新监测和知识管理 + +不要做: + +- 对公众输出个案化法律意见 +- 无来源裸答 +- 未经律师 / 法务核验直接用于法院、仲裁或正式外部法律分析 +- 把模型输出包装成“完整 legal analysis substitute” + +### 6.3 必要 safeguard + +- 检索与引用: + - source pinning + - 日期过滤 + - 法域过滤 + - citation verifier +- 数据与权限: + - confidential / privileged / restricted information 不进未批准工具 + - 按 matter / client 做隔离 +- 输出控制: + - draft / internal use 标记 + - 不支持无来源输出 + - 高风险问题走升级或拒答 +- 人工复核: + - 研究 memo、客户交付件、仲裁 / 法院材料必须人工签发 + +### 6.4 证据缺口 + +- 一套香港法专用评测集: + - 引用准确率 + - 过期 / 失效材料识别 + - bilingual retrieval + - jurisdiction mix-up 错误率 +- 语料策略: + - primary law + - case law + - regulatory materials + - practice materials + - 中英双语覆盖 +- 用户证据: + - 律师 / 法务是否真的点击来源 + - 双语呈现是否真有提效 + - 输出是否能被接受为“draft aid”而非“final answer” + +### 6.5 go / no-go 触发条件 + +保持 CONDITIONAL GO 的条件: + +- 输出稳定回链来源 +- 有 citation 和 date filter +- confidential 数据不进开放工具 +- 所有正式对外交付都保留人工签发 + +转为 NO-GO 的条件: + +- 只能做无来源裸答 +- 无法说明语料来源和更新机制 +- 用户坚持把输出当成最终法律分析 +- 产品被要求直接进入法院 / 仲裁提交链路 + +## 7. 香港法域下的统一 no-go 边界 + +当前阶段,不建议优先做: + +- 面向公众的开放式法律意见 bot +- 直接进入法院提交链路的自动化工具 +- 未经强复核的 litigation / arbitration drafting automation +- 默认把客户 / matter 数据送入开放公众模型 + +原因不是“香港不能做 legal AI”,而是: + +- 官方材料当前更支持**受控组织内部使用** +- 法院侧已经给出“legal analysis 不宜轻率交给 GenAI”的明确信号 +- Law Society 的已有材料也把 human oversight、transparency、data governance 放在核心位置 + +## 8. 如果从香港内部进入顺序来排,我会这样做 + +1. **企业内法律 / 合规 copilot** +2. **香港法双语 Legal RAG / citation-aware drafting** +3. **双语文档摘要 / 翻译辅助** +4. **争议解决前置 research assistant** + +最后才考虑: + +- 面向公众的法律问答 +- 法院 / 仲裁提交链路自动化 + +## 9. 本备忘录依赖的当前规则和市场锚点 + +香港: + +- PCPD《Artificial Intelligence: Model Personal Data Protection Framework》 + - 强调 risk assessment、human oversight、customisation and AI model management、communication and engagement + - 官方链接:https://www.pcpd.org.hk/english/resources_centre/publications/files/ai_protection_framework.pdf +- PCPD《Checklist on Guidelines for the Use of Generative AI by Employees》 + - 要求组织明确 permitted tools、input / output 边界、lawful and ethical use、bias prevention、data security、incident response、training 和 feedback mechanism + - 官方链接:https://www.pcpd.org.hk/english/resources_centre/publications/files/guidelines_ai_employees.pdf +- PCPD `2025-05-08` AI compliance checks + - 说明 PCPD 已对 `60` 家机构开展 AI 相关 compliance checks,不是只停留在原则性倡议 + - 官方链接:https://www.pcpd.org.hk/english/news_events/media_statements/press_20250508.html +- Hong Kong Judiciary《Guidelines on the Use of Generative Artificial Intelligence》 + - 明确指出如无 proven confidentiality protection 和 adequate checking / verification mechanism,using generative AI for legal analysis is not recommended + - 官方链接:https://www.judiciary.hk/doc/en/court_services_facilities/guidelines_on_the_use_of_generative_ai.pdf +- DoJ《Promoting LawTech》 + - 说明香港政府正推动 legal and dispute resolution sector 采用 LawTech(包括 AI / Gen AI),并计划补 ethical 与 security 指南 + - 官方链接:https://www.doj.gov.hk/en/legco/pdf/ajls20250602e1.pdf +- DoJ `2025-05-07` press release + - 说明 `Legal Knowledge Engineers` 已被纳入人才清单,反映香港对法律 AI / legaltech 能力建设的政策支持 + - 官方链接:https://www.doj.gov.hk/en/community_engagement/press/20250507_pr1.html +- The Law Society of Hong Kong《The Impact of Artificial Intelligence on the Legal Profession》 + - 强调 hallucination、data protection、data governance、transparency、disclosure 和 human oversight + - 官方链接:https://www.hklawsoc.org.hk/-/media/HKLS/Home/News/2024/LSHK-Position-Paper_AI_EN.pdf + +## 10. 最终判断 + +如果现在要决定“香港能不能作为法律 AI 的单独扩张法域进入”,我的判断是: + +- **能做** +- **但先做企业内、可审计、强权限控制、强人工复核的工具** +- **不把香港第一站押在公众法律意见或法院链路自动化上** + +更具体一点: + +- 第一优先级:**企业内法律 / 合规 copilot** +- 第二优先级:**香港法双语 Legal RAG / citation-aware drafting** + +如果这两类场景能做到: + +- secure deployment +- source-grounded output +- human sign-off +- clear data handling policy + +那么香港值得作为新加坡之后的下一阶段法域推进。 diff --git a/research/hong-kong-legal-ai-management-brief-2026-03-20.md b/research/hong-kong-legal-ai-management-brief-2026-03-20.md new file mode 100644 index 0000000..2c62b77 --- /dev/null +++ b/research/hong-kong-legal-ai-management-brief-2026-03-20.md @@ -0,0 +1,148 @@ +# 香港法律 AI 管理层决策 brief + +日期:2026-03-20 + +适用对象:管理层、产品负责人、法务负责人、合规负责人、安全负责人 + +目的:把现有 `research/` 研究包压缩成一份 1-2 页管理层 brief,聚焦**香港法域**的首轮进入判断,明确目标客户、服务模式、go / no-go gate、必要 safeguards、证据缺口、停机条件与未来 90 天监测触发器。 + +这不是法律意见,而是 `2026-03-20` 的产品、合规和市场进入研究快照。 + +## 1. 一句话结论 + +建议:**可以推进香港法域,但第一站只做企业内法律 / 合规 copilot,记为 `GO(P0)`。** + +同时保留一个受限的第二优先级: + +- **香港法双语 Legal RAG / citation-aware drafting:`CONDITIONAL GO(P1)`** + +当前明确不建议推进: + +- 面向公众的开放式法律意见 bot +- 直接进入法院 / 仲裁提交链路的自动化工具 +- 未经强复核的 litigation / arbitration drafting automation + +如果这些边界守不住,应转 `NO-GO` 或至少 `HOLD`。 + +## 2. 为什么香港值得做,但不能激进做 + +管理层层面的核心逻辑: + +- **治理信号明确**:`PCPD` 的 AI / 隐私治理框架和员工使用 GenAI 指南,都更支持组织内部、可审计、有人类监督的采用方式。 +- **政策推动真实存在**:`DoJ` 已把 LawTech 和 AI 纳入正式推动路径,说明香港不是“完全没有市场基础设施”的观察名单。 +- **法院边界反而更保守**:`Judiciary` 已明确给出对 legal analysis 的谨慎信号,不适合把香港第一站押在 court-facing automation 上。 +- **市场切口更现实**:企业法务 / 合规团队、双语文档和跨境流程,是更容易落进权限、日志、人工签发和采购审查体系的场景。 + +所以,香港适合做的是: + +- 企业内、可审计、强权限控制、强人工复核的工具 + +而不是: + +- 对公众直接给法律结论 +- 替代律师 / 法务做最终法律分析 +- 直接进入法院 / 仲裁提交链路 + +## 3. 目标客户与服务模式 + +### 3.1 目标客户 + +优先客户: + +- 大中型企业法务团队 +- 合规团队 +- 金融、保险、平台、医疗、零售、物流等强监管行业的区域法务 / 合规团队 +- 有双语材料、跨境合同和内部政策映射需求的团队 + +### 3.2 推荐服务模式 + +| 项目 | 建议 | +| --- | --- | +| 首发场景 | 企业内法律 / 合规 copilot | +| 第二阶段场景 | 香港法双语 Legal RAG / citation-aware drafting | +| 产品定位 | 企业内闭环辅助工具 / 专业人士内部 research assistant | +| 部署方式 | `tenant-isolated SaaS`、专属实例、`VPC` 或其他强隔离部署 | +| 首阶段范围 | 条款比对、风险提示、义务映射、审批前预筛查、双语摘要 | +| 输出定位 | 风险提示、候选 redline、内部 research / drafting 草稿 | +| 人机边界 | 人工最终确认;不自动审批;不自动对外签发;不默认 court-ready | + +## 4. go / no-go gate + +| Gate | 管理层要问的问题 | 必须具备 | 不满足时的处理 | +| --- | --- | --- | --- | +| Gate 0:场景边界 | 这是企业内受控工具,而不是公众法律服务吗? | 明确目标团队、明确使用边界、客户接受人工签发 | 不推进 | +| Gate 1:部署 / 数据 | 能否说清部署、权限、日志、retention / deletion / training policy? | 强隔离部署、matter / document 级权限、数据处理条款清晰 | 不进真实数据或不进采购 | +| Gate 2:输出 / 控制 | 输出能否回链来源,且高风险问题可升级或拒答? | source-grounded output、review queue、human sign-off | 不开 pilot;保持离线验证 | +| Gate 3:pilot 证据 | 有没有访谈、benchmark、失败用例和真实使用反馈? | 客户访谈、香港法评测集、失败用例库、日志抽查 | 缩 scope 或转 `HOLD` | +| Gate 4:扩张判断 | 是否仍是 internal-only 模式,且没有滑向法院 / 公众服务边界? | 内部工具定位稳定、控制可复用、监测项无重大变化 | 不扩张;保持试点或转 `NO-GO` | + +## 5. 必须守住的最低 safeguards + +| 控制域 | 最低要求 | +| --- | --- | +| 部署 | `tenant isolation`、专属实例 / `VPC` 或其他强隔离方案 | +| 权限 | matter / client / document / business-unit 级权限控制 | +| 数据 | confidential / privileged / restricted information 不进未批准工具 | +| 日志与治理 | 保留访问、输出、review、导出日志;有 incident response 和 permitted tools list | +| 输出 | 不支持无来源裸答;高风险问题必须升级、拒答或进入人工 review queue | +| 引用与更新 | `source pinning`、date filter、jurisdiction filter、citation verifier | +| 人机边界 | 客户交付件、正式法律分析、法院 / 仲裁材料必须人工签发 | +| 供应商治理 | retention、deletion、training、sub-processor 条款清晰 | + +## 6. 当前证据缺口 + +管理层当前还缺的,不是更多概念,而是下面几类证据: + +- **客户访谈**:至少 3 家目标客户,确认部署偏好、采购关切、人工签发边界和双语工作流真实需求。 +- **评测集**:一套香港法专用评测集,覆盖引用准确率、过期 / 失效材料识别、bilingual retrieval、jurisdiction mix-up 错误率。 +- **语料策略**:primary law、case law、regulatory materials、practice materials 的中英双语覆盖和更新机制。 +- **使用证据**:律师 / 法务是否真的点击来源、是否接受输出只是 draft aid、双语呈现是否带来真实提效。 + +## 7. 立即停机条件 + +出现以下任一情况,应立即暂停、降级或转 `NO-GO`: + +- 产品被要求转成面向公众的法律意见服务 +- 产品被要求直接进入法院 / 仲裁提交链路 +- 无法说明或执行 retention、deletion、training、sub-processor、日志策略 +- confidential 数据被送入未批准工具 +- 输出无法稳定回链来源 +- 用户持续把输出当成最终法律分析而非辅助建议 +- 客户要求系统直接自动批准高风险合同或直接生成可提交的最终材料 + +## 8. 未来 90 天监测触发器 + +未来 90 天,管理层最值得盯的不是所有新闻,而是这些会改结论的触发器: + +| 触发器 | 为什么重要 | 触发后动作 | +| --- | --- | --- | +| `PCPD` 发布新的正式 AI / 隐私治理指引、FAQ 或模板 | 会直接改变组织控制、permitted tools、incident response、input / output 边界 | 复核 minimum safeguards、客户条款和部署方案 | +| `PCPD` 的 AI compliance checks 或执法表述命中 data handling / retention / governance 问题 | 会直接影响产品售前禁区和 stop conditions | 更新 stop conditions、数据处理表述和销售边界 | +| `Judiciary` 把 AI guidance 延伸到 court users、提交材料、disclosure 或 sanctions | 会直接影响香港法 Legal RAG 的 `P1` 可做边界 | 复核 `CONDITIONAL GO` 是否仍成立;必要时转 `HOLD` | +| `DoJ` 的 LawTech / AI 推动明显扩大或收紧 | 会改变香港作为扩张法域的进入顺序和服务模式 | 更新香港优先级和目标客户 | +| `Law Society` 发布更正式的 AI 使用规范 | 会抬高 disclosure、supervision、training、human oversight 要求 | 更新 professional-internal 边界和客户沟通要求 | +| 代表性案件或司法公开表态改变 court-facing 风险判断 | 会改变“not court-ready by default”的核心假设 | 复核统一 no-go 边界和 verifier controls | + +## 9. 当前管理层建议 + +如果现在要决定“香港能不能作为法律 AI 的单独扩张法域推进”,我的建议是: + +- **能做** +- **但先做企业内法律 / 合规 copilot** +- **并且必须限定为可审计、强权限控制、强人工复核的 internal-only 服务模式** + +更直接一点: + +- 把香港当成“企业内法律 / 合规软件 + AI 辅助工作流工具”,可以推进。 +- 把香港当成“公众法律意见服务或法院链路自动化”的起点,不应推进。 + +第二阶段才考虑: + +- 香港法双语 Legal RAG / citation-aware drafting + +前提是: + +- source-grounded output 稳定 +- human sign-off 能落实 +- data handling policy 说得清 +- 监测 tracker 没有触发足以改结论的重大变化 diff --git a/research/legal-ai-opportunity-risk-matrix-2026-03-20.md b/research/legal-ai-opportunity-risk-matrix-2026-03-20.md new file mode 100644 index 0000000..f2c5d8f --- /dev/null +++ b/research/legal-ai-opportunity-risk-matrix-2026-03-20.md @@ -0,0 +1,211 @@ +# 法律 AI 机会 / 风险矩阵(可执行版) + +日期:2026-03-20 + +适用范围:基于《法律 LLM / AI 与法律交叉调研》,把高价值法律 AI 场景进一步映射到具体法律业务流程、目标司法辖区、核心合规要求、必要 safeguard、进入壁垒和商业价值,便于做产品筛选、市场进入和治理设计。 + +使用方式: + +- 这不是法律意见,而是 2026-03-20 的研究快照。 +- 排序逻辑优先看五件事:商业价值、能否先以辅助工具进入受控流程、责任暴露面、数据与集成壁垒、能否建立稳定的人类复核闭环。 +- “适合优先做”不等于“风险低”,只表示更适合先在企业内闭环或可审计流程中落地。 + +## 1. 执行结论:优先级建议 + +### P0:立即进入 + +1. 企业合同 / 合规 copilot +2. 法律检索 / Legal RAG / citation-safe drafting +3. AI governance / provenance / 标识与审计工具 + +### P1:有条件进入 + +4. e-discovery / 文档审查 / 内部调查辅助 +5. 律所内部 drafting / due diligence assistant + +### P2:晚进入 + +6. 面向公众的法律问答 / triage bot +7. 面向法院提交链路的诉讼文书助手 + +为什么这样排: + +- 先做“帮助人做判断”的工具,再做“可能被用户当成正式法律意见”的工具。 +- 先做企业内闭环、权限可控、可审计的流程,再做面对消费者或法院的高责任流程。 +- 先把 citation、jurisdiction、audit、handoff 做成基础设施,再扩展到公开问答和诉讼提交链路。 + +## 2. 主矩阵:按场景、法域、合规、壁垒和商业价值排序 + +| 排名 | 场景 | 具体法律业务流程 | 目标司法辖区 | 核心合规要求 | 必要 safeguard | 进入壁垒 | 商业价值 | 风险等级 | 优先级建议 | +| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | +| 1 | 企业合同 / 合规 copilot | 合同 intake、条款抽取、redline compare、监管义务映射、审批支持 | 中国 / 欧盟 / 美国 | 商业秘密、个人信息、供应商治理、内部政策版本管理 | 私有化或 VPC 部署、文档级权限控制、来源链接、版本化知识库、人工批准、全量日志 | 需要接入 CLM / DMS、权限体系、行业模板沉淀、企业安全审查 | 很高 | 中高 | P0:最适合率先落地,先做“辅助审查”而不是自动定案 | +| 2 | 法律检索 / Legal RAG / citation-safe drafting | 法规与案例检索、research memo 草稿、citation check、知识更新提醒 | 美国 / 欧盟 / 中国 | 引用真实性、法域过滤、时效控制、版权与许可边界、律师监督责任 | RAG、source pinning、citation verifier、日期和法域过滤、unsupported claim 拦截、人工复核 | 需要高质量法律语料、法域适配、持续更新、引用解析和编辑体验 | 很高 | 中高 | P0:应作为法律 AI 的基础层能力优先建设 | +| 3 | AI governance / provenance / 标识与审计工具 | 模型 inventory、使用政策执行、输出标识、审计、事件响应、供应商治理 | 欧盟 / 中国 / 全球 | AI Act 治理要求、内容标识、审计追踪、模型版本可追溯、供应商管理 | model registry、policy engine、元数据或标识、不可篡改日志、角色隔离、事件响应流程 | 需要跨系统集成、标准变化跟踪、面向合规团队销售能力 | 高 | 中 | P0:适合作为平台层或合规基础设施切入 | +| 4 | e-discovery / 文档审查 / 内部调查 | custodian triage、相关性和 privilege tagging、时间线整理、抽样 QA、调查摘要 | 美国 / 欧盟 | 证据链、privilege、保密、数据保留、跨境传输和访问控制 | matter isolation、privilege segregation、chain-of-custody logs、抽样 QA、保留与删除策略、人工覆盖权 | 需要与 review 平台集成、安全资质、专家调参、跨境治理能力 | 很高 | 高 | P1:企业级机会大,但进入门槛和责任成本都高 | +| 5 | 律所内部 drafting / due diligence assistant | data room 摘要、issue list、尽调 memo、条款草稿、内部知识复用 | 美国为主,兼顾欧盟 / 中国 | 律师职业责任、客户保密、供应商尽调、监督义务、收费合理性 | matter isolation、客户级权限隔离、outside counsel guideline 检查、人工签发、供应商评估 | 需要律所信任、采购周期长、工作习惯改变成本高 | 高 | 高 | P1:能做,但前提是强治理和强交付能力 | +| 6 | 面向公众的法律问答 / triage bot | intake、FAQ、材料清单、办事路径提示、转人工或转机构 | 中国 / 美国 / 欧盟 | 消费者保护、未授权法律服务风险、隐私、日志留存、误导风险 | jurisdiction gating、拒答和升级规则、风险主题拦截、人工转接、清晰边界提示、日志留存 | 投诉与责任暴露高、事实复杂性高、需要客服和升级体系 | 中高 | 很高 | P2:只适合做分流和 intake,不宜早期做“结论型法律意见” | +| 7 | 面向法院提交链路的诉讼文书助手 | brief / pleading 草稿、citation 与事实核对、filing checklist | 美国尤其敏感,其他法域同样高风险 | candor to tribunal、法院特定 AI 披露或认证要求、事实和引文准确性;英格兰及威尔士当前还处在 `CJC` court-documents consultation 阶段 | 禁止 auto-file、citation verifier、fact verifier、律师逐条确认、法院规则 checklist、完整草稿历史 | 法院规则碎片化、sanctions 风险、极高信任门槛 | 中 | 很高 | P2:最后进入,只建议做内部草稿辅助和 pre-filing QA | + +## 3. 场景和法律领域的映射:哪里最容易出事,哪里最有机会 + +| 法律领域 | 高价值法律 AI 场景 | 对应业务流程 | 最低 safeguard 基线 | 备注 | +| --- | --- | --- | --- | --- | +| Privacy / Confidentiality | 合同 / 合规 copilot、律所 drafting、e-discovery | 合同审查、尽调、内部调查、案卷分析 | VPC 或本地部署、RBAC、matter isolation、DLP、保留和删除策略、供应商尽调 | 这是律所和企业法务采购的第一道门槛 | +| Copyright / IP | Legal RAG、citation-safe drafting、provenance 工具 | 法规/案例引用、知识库构建、输出标识 | 许可清晰的语料、来源归因、可追溯知识来源、训练和输出边界说明 | 版权问题往往决定产品能否规模化销售 | +| Professional Responsibility | Legal RAG、律所 drafting、法院文书辅助 | memo、brief、due diligence、诉讼草稿 | 律师培训、人工签发、引用核验、事实核验、监督和 vendor policy | 在美国法域尤其需要明确“律师最终负责” | +| Liability | 公众 triage、合同 / 合规 copilot、法院文书辅助 | 公众问答、合规建议、对外文书 | scope limitation、风险分级、升级人工、完整日志、错误纠正流程、责任分配 | disclaimer 不是万能 shield,流程控制更重要 | +| Evidence / Procedure | e-discovery、内部调查、法院文书辅助 | 证据整理、特权筛查、时间线、filing QA | chain-of-custody、不可篡改日志、版本管理、抽样复核、人工确认 | 这里的错误往往直接进入程序争议或 sanctions 风险 | + +## 4. 业务流程级的 go / no-go gate + +| 场景 | 上线前最低条件 | 如果缺失,建议怎么处理 | +| --- | --- | --- | +| 企业合同 / 合规 copilot | 文档权限控制、版本化法规/政策库、来源链接、人工批准流 | 只能做 demo 或内部试点,不要进入正式审批链路 | +| Legal RAG / citation-safe drafting | 法域过滤、日期过滤、citation verifier、unsupported claim 拦截 | 不要用于对外 memo、客户交付件或高风险法律分析 | +| AI governance / provenance / 标识与审计 | 模型台账、日志、标识能力、供应商治理、事件响应机制 | 不要对外宣称“合规平台”或“审计平台” | +| e-discovery / 内部调查 | matter isolation、privilege 规则、抽样 QA、保留策略、人工覆盖权 | 不要接真实调查、诉讼保全或高风险证据处理工作 | +| 律所内部 drafting / due diligence | 客户级隔离、律师签发、使用政策、计费与监督规则 | 只适合内部非正式草稿,不适合客户交付前环节 | +| 公众 triage bot | 拒答和升级机制、法域识别、风险主题拦截、人工转接 | 只能做信息导航,不要给个案结论或概率判断 | +| 法院提交链路工具 | 法院规则 checklist、fact / citation verifier、禁止自动提交、律师最终认证 | 不要碰 filing 链路,只保留为内部草稿或 QA 工具 | + +## 5. 按司法辖区看:应该怎么选切入口 + +| 司法辖区 | 更适合先做的场景 | 晚一点再做的场景 | 关键原因 | +| --- | --- | --- | --- | +| 美国 | Legal RAG、律所内部 drafting、e-discovery、合同 / 合规辅助 | 公众法律问答、直接进入法院提交链路 | 律师伦理和法院程序要求具体,适合先做“强复核、强引用”的工具 | +| 欧盟 | AI governance / audit、合同 / 合规 copilot、Legal RAG | 开放式公众问答、黑盒式高风险自动化 | 统一治理框架更强,documentation、auditability、supplier governance 更重要 | +| 中国 | 企业法务 / 合规 copilot、中文 Legal RAG、受控公共法律服务助手、标识与审计工具 | 面向公众的开放式“法律意见机器人” | 更适合本地部署、标识可追溯、受控场景和行业化知识工程 | +| 新加坡 | Singapore-law Legal RAG、citation-safe drafting、律所 / 企业合同与 matter copilot | 面向公众的结论型法律意见 bot、直接进入法院提交链路的自动化工具 | 有最终版法律行业 GenAI 指南、法院用户 AI 指南,以及 LawNet AI / LTP + Copilot 这类更接近真实法律工作流的官方与行业基础设施,适合先做专业人士内部工具 | +| 香港 | 企业内法律 / 合规 copilot、受控双语 Legal RAG、文档总结 / 翻译辅助 | 面向公众的开放式法律意见 bot、未经强复核的诉讼提交辅助 | 当前官方材料更突出 PCPD 的 AI / 隐私治理框架和员工使用 GenAI 指南,以及司法机构内部 AI 指南,因此更适合先做企业内、可审计、强权限控制的场景 | +| 英格兰及威尔士 | England & Wales Legal RAG、citation-safe drafting、受监管的人类签发 drafting assistant | 面向公众的开放式法律意见 bot、未经强复核的 court filing automation | 有 Law Society、SRA、Judiciary、Civil Justice Council 四层材料;`CJC` current-work / latest-news / interim report 已把 court-facing disclosure / evidence 规则推进到带明确截止时间的 consultation 阶段,因此更适合先做 professional-internal 工具,再持续观察 court-facing 边界 | +| 澳大利亚 | professional-internal Legal RAG、合同 / 合规 copilot、due diligence / drafting assistant | 跨州一把梭的 court-facing automation、公众结论型法律意见 bot | Law Council、Federal Court、州法院和隐私监管材料都在更新,但联邦 / 州 / territory 的 court protocol 和执业边界更碎片化,适合先做单州或单工作流切入 | +| 阿联酋 common-law hubs(DIFC / ADGM) | 双语 regulatory / knowledge assistant、企业内合同 / 合规 copilot、争议解决前置 research assistant | 面向 onshore 大众的一般法律意见 bot、全国统一 court automation | common-law hub、AI 治理和数字化司法基础设施强,但 legal services licensing、court guidance 和数据边界更依赖 DIFC / ADGM 等具体区域,不是单一全国框架 | + +如果只看中国之后的近邻扩张优先级,我会这样排: + +1. **先看新加坡** +2. **再看香港** + +原因: + +- 新加坡目前有更明确的“法律行业 + 法院 + legaltech 基础设施”三层官方 / 行业材料,适合把产品直接定义为专业人士内部工作流工具。 +- 香港也有明确的 AI / 数据治理信号,但我本轮看到的官方材料更偏员工使用规范、隐私保障和司法机构内部治理,适合作为第二阶段扩张法域,而不是第一个对外复制市场。 + +如果继续看香港、新加坡之外的下一批扩张市场,我会这样排: + +1. **英格兰及威尔士** +2. **澳大利亚** +3. **阿联酋 common-law hubs(DIFC / ADGM)** + +原因: + +- 英格兰及威尔士在职业规则、法院边界和受监管创新路径上最完整,最适合先复制 professional-internal 的 Legal RAG / citation-safe drafting。 +- 澳大利亚值得做,但联邦 / 州法院协议和执业边界更碎片化,适合先做单州或单工作流切入,而不是全国统一打法。 +- 阿联酋更适合作为 common-law hub + bilingual 合规 / 争议周边工具的观察名单;要先解决 DIFC / ADGM / onshore 的边界问题,再考虑更大规模复制。 + +更细的对比可配合查看: + +- `research/hong-kong-legal-ai-go-no-go-memo-2026-03-20.md` +- `research/uk-australia-uae-legal-ai-market-comparison-2026-03-20.md` + +## 6. 如果从产品进入角度排 roadmap,我会这样做 + +### Phase 1:先补基础设施 + +1. Legal RAG +2. citation-safe drafting +3. 权限、日志、版本、审计 + +目标: + +- 先解决“能不能引用真实来源、能不能分法域、能不能留审计痕迹”。 + +### Phase 2:再做高 ROI 的企业内闭环 + +1. 合同审查 +2. 合规义务映射 +3. 内部政策问答和审批支持 + +目标: + +- 先进入企业法务和合规团队的现有流程,拿到明确 ROI 和可衡量的风险控制收益。 + +### Phase 3:再碰更高门槛的专业场景 + +1. 尽调 +2. e-discovery +3. 内部调查辅助 + +目标: + +- 在已有权限控制、审计和法域适配的基础上,进入证据和特权更重的流程。 + +### Phase 4:最后进入对外和法院链路 + +1. 公众 intake / triage +2. 法院文书 pre-filing QA +3. 法院提交前的内部草稿辅助 + +目标: + +- 只在已有强复核、强升级、强责任分配机制后再进入高责任外部场景。 + +## 7. 所有场景都应默认配置的 safeguard 基线 + +- 法域限定:每次输出都应明确适用司法辖区,避免混用不同法域规则。 +- 时间限定:对法规、案例和监管义务设置版本与更新时间,避免旧法答新问题。 +- 来源可追溯:输出应能回链到法规、案例、政策或内部知识源。 +- 人工复核:任何对外文档、客户交付件、法院材料都必须有人类最终确认。 +- 权限与隔离:按 matter、客户、部门和文档级别做权限控制,不把所有材料混成一个共享上下文。 +- 日志与版本:保留 prompt、检索结果、输出、模型版本和审批记录,方便追责和纠错。 +- 升级与拒答:对高风险个案、证据链、未成年人、刑事、劳动争议等场景设置升级规则。 +- 供应商治理:明确数据保留、再训练、分包商、事故通报和删除机制。 + +## 8. 这份矩阵更像哪些产品 + +更像“好生意”的产品: + +- 企业合同 / 合规 copilot +- Legal RAG + citation-safe drafting +- AI governance / audit / provenance 工具 + +更像“高门槛专业服务 + 重交付”的产品: + +- e-discovery +- 内部调查辅助 +- 律所内部 drafting / due diligence assistant + +更像“责任暴露面过大,不适合早期重压”的产品: + +- 面向公众的法律问答 bot +- 法院提交链路自动化 + +## 9. 快照说明和可继续追踪的材料 + +注意: + +- 这是 2026-03-20 的快照,不是永久静态结论。 +- 美国法院关于生成式 AI 在 filings 中的要求仍在继续演化。 +- 欧盟 AI Act 的不同义务仍在按阶段落地,企业部署侧的解释和执法实践还会继续细化。 + +官方 / 监管: + +- 欧盟委员会 AI Act FAQ: https://digital-strategy.ec.europa.eu/en/faqs/navigating-ai-act +- 欧盟委员会 GPAI obligations guidance: https://digital-strategy.ec.europa.eu/en/faqs/guidelines-obligations-general-purpose-ai-providers +- ABA Formal Opinion 512: https://www.americanbar.org/content/dam/aba/administrative/professional_responsibility/ethics-opinions/aba-formal-opinion-512.pdf +- NIST AI RMF: https://www.nist.gov/itl/ai-risk-management-framework +- U.S. Copyright Office AI project: https://www.copyright.gov/ai/ +- 中国《生成式人工智能服务管理暂行办法》: https://www.miit.gov.cn/zcfg/qtl/art/2023/art_f4e8f71ae1dc43b0980b962907b7738f.html +- 中国《人工智能生成合成内容标识办法》: https://www.gov.cn/zhengce/zhengceku/202503/content_7014286.htm +- 新加坡 Ministry of Law《Guide for Using Generative AI in the Legal Sector》: https://www.mlaw.gov.sg/files/Guide_for_using_Generative_AI_in_the_Legal_Sector__Published_on_6_Mar_2026_.pdf +- 新加坡 Judiciary《Guide on the Use of Generative AI Tools by Court Users》: https://www.judiciary.gov.sg/docs/default-source/news-and-resources-docs/guide-on-the-use-of-generative-ai-tools-by-court-users.pdf?sfvrsn=3900c814_1 +- 香港 PCPD《Artificial Intelligence: Model Personal Data Protection Framework》: https://www.pcpd.org.hk/english/resources_centre/publications/files/ai_protection_framework.pdf +- 香港 PCPD《Checklist on Guidelines for the Use of Generative AI by Employees》: https://www.pcpd.org.hk/english/resources_centre/publications/files/guidelines_ai_employees.pdf +- 香港 Judiciary《Guidelines on the Use of Generative Artificial Intelligence》: https://www.judiciary.hk/doc/en/court_services_facilities/guidelines_on_the_use_of_generative_ai.pdf + +研究 / 技术: + +- LegalBench: https://arxiv.org/abs/2308.11462 +- LegalBench-RAG: https://arxiv.org/abs/2408.10343 +- InternLM-Law: https://arxiv.org/abs/2406.14887 +- Singapore Academy of Law / LawNet AI: https://sal.org.sg/articles/singapore-academy-of-law-signs-global-content-partnerships-to-expand-worldwide-access-of-singapore-law-and-unveils-ai-powered-lawnet-4-0-at-techlaw-fest-2025/ diff --git a/research/legal-ai-regulatory-monitoring-tracker-2026-03-20.md b/research/legal-ai-regulatory-monitoring-tracker-2026-03-20.md new file mode 100644 index 0000000..f60b804 --- /dev/null +++ b/research/legal-ai-regulatory-monitoring-tracker-2026-03-20.md @@ -0,0 +1,429 @@ +# 法律 AI 监管监测清单(中国 / 新加坡 / 香港 / 英格兰及威尔士) + +日期:2026-03-20 + +目的:基于现有 `research/` 研究包,为**中国、新加坡、香港、英格兰及威尔士**建立一份可执行的法规、执法、判例、法院规则与律师伦理意见监测清单,明确监测对象、刷新频率、触发阈值、责任人和结论更新流程。 + +适用范围: + +- `/home/v-boxiuli/PPT/ArgusBot/research/china-legal-ai-go-no-go-memo-2026-03-20.md` +- `/home/v-boxiuli/PPT/ArgusBot/research/china-contract-compliance-copilot-management-memo-2026-03-20.md` +- `/home/v-boxiuli/PPT/ArgusBot/research/china-contract-compliance-copilot-ops-checklist-2026-03-20.md` +- `/home/v-boxiuli/PPT/ArgusBot/research/china-contract-compliance-copilot-management-brief-2026-03-20.md` +- `/home/v-boxiuli/PPT/ArgusBot/research/hong-kong-legal-ai-management-brief-2026-03-20.md` +- `/home/v-boxiuli/PPT/ArgusBot/research/hong-kong-legal-ai-go-no-go-memo-2026-03-20.md` +- `/home/v-boxiuli/PPT/ArgusBot/research/singapore-legal-ai-go-no-go-memo-2026-03-20.md` +- `/home/v-boxiuli/PPT/ArgusBot/research/uk-australia-uae-legal-ai-market-comparison-2026-03-20.md` +- `/home/v-boxiuli/PPT/ArgusBot/research/court-facing-ai-rules-sanction-risk-tracker-2026-03-20.md` +- `/home/v-boxiuli/PPT/ArgusBot/research/legal-ai-opportunity-risk-matrix-2026-03-20.md` +- `/home/v-boxiuli/PPT/ArgusBot/research/legal-llm-law-intersections-2026-03-20.md` +- `/home/v-boxiuli/PPT/ArgusBot/research/legal-ai-research-package-register-2026-03-20.md` + +这不是法律意见,而是 `2026-03-20` 的研究运营文档。 + +## 1. 使用方式 + +这份 tracker 不是“把所有新闻都收进来”,而是只盯会改变 go / no-go 结论、最低 safeguards、市场进入顺序或 pilot 边界的变化。 + +统一责任角色: + +- `Research owner`:负责每周监测、首轮归类、更新 tracker +- `Legal / compliance owner`:负责判断是否改变合规边界或最低控制要求 +- `Product owner`:负责判断是否改变服务模式、默认功能和 stop conditions +- `Market / GTM owner`:负责判断是否改变目标客户、首批法域或对外表述 + +owner-of-record(总包级): + +| 范围 | primary owner of record | secondary owner | 最低留痕 | +| --- | --- | --- | --- | +| 本 tracker 周度维护 | `Research owner` | `Legal / compliance owner` | 每周至少一条 dated update 或 no-change 记录 | +| 中国法域边界判断 | `Legal / compliance owner` | `Research owner + Product owner` | 命中 `L2 / L3` 时同步回写中国 memo 家族 | +| 市场进入顺序与对外表述 | `Market / GTM owner` | `Product owner + Research owner` | 影响法域优先级时回写矩阵和总览 | +| 仓库 / 知识库登记同步 | `Research owner` | `Product owner` | `L2 / L3` 事件与月度汇总同步到 `legal-ai-research-package-register-2026-03-20` | + +统一刷新节奏: + +- `每周`:法规公告、监管执法、法院公告、职业团体更新 +- `每月`:判例扫描、研究包结论回看、监测项去重 +- `每季度`:对相关 memo / matrix 做一次全量刷新判断 + +## 2. 统一触发等级 + +| 等级 | 含义 | 典型触发 | 默认动作 | +| --- | --- | --- | --- | +| L1:记录 | 有新材料,但暂不改变边界 | 新演讲、新非约束性说明、同方向重复公告 | 记录到 tracker,月度汇总 | +| L2:重看 | 可能改变某个控制项或优先级 | 新指引、咨询文件、法院新 circular、行业组织新 AI 指引、单个代表性案件 | `5` 个工作日内补分析;`10` 个工作日内决定是否改 memo | +| L3:立即更新 | 已足以改变现有 go / no-go 或上线边界 | 新法 / 新规生效、正式 practice direction、监管处罚、备案 / 登记要求变化、明确纪律处分或高位阶判例 | `48` 小时内发起专项复核;必要时先把相关场景打到 `HOLD` | + +## 3. 中国法域监测清单 + +当前研究包中,中国法域最敏感的是:企业内工具与面向境内公众服务的边界、标识义务、备案 / 登记口径、个人信息与日志处理、以及涉 AI 司法裁判规则。就 court-facing 风险而言,`2026-03-20` 本轮刷新看到的中国侧官方材料仍以**法院内部 AI 应用规范、智能审判 / 诉讼服务建设和涉 AI 裁判规则来源**为主,而不是单独全国 court-user filing / disclosure guide。 + +对中国 court-facing 监测,默认再加一条分层规则: + +- 法院内部 AI 应用 / 建设材料:默认按 `L1` 记录,不改变“不直接进入法院提交链路”的边界 +- 面向诉讼参与人或律师的立案 / 提交 / 核验 / 说明要求:默认按 `L2` 处理,补影响分析 +- 全国性或高位阶的 court-user filing / disclosure / certification 规则,或代表性 sanctions case:默认按 `L3` 处理并专项复核 + +| 监测条线 | 主要来源 | 盯什么 | 刷新频率 | 触发阈值 | 责任人 | 触发后要改什么 | +| --- | --- | --- | --- | --- | --- | --- | +| 法规与部门规章 | 国家网信办、工信部、中国政府网、全国人大网 | 《生成式人工智能服务管理暂行办法》、`PIPL`、新 AI 立法草案、相关配套规则 | 每周 | 任何影响“是否属于面向境内公众服务”的新解释,或任何影响输入、日志、训练数据、删除义务的新规则 | Research owner + Legal / compliance owner | 更新中国 memo 的统一 no-go 边界、部署和数据条款 | +| 标识与传播治理 | 国家网信办、中国政府网 | 《人工智能生成合成内容标识办法》、配套标准、导出 / 元数据 / 协议 / 平台核验要求 | 每周 | 新的强制标准、执法通报、平台侧核验要求收紧 | Legal / compliance owner + Product owner | 更新中国 memo、ops checklist、management brief 的标识、导出、日志要求 | +| 备案 / 登记与产品边界 | 国家网信办备案 / 登记公告 | 备案数量、登记口径、显著位置公示要求、API 调用应用的登记要求 | 每周 | 出现新的备案 / 登记适用口径,或现有企业内工具可能被重新归入备案 / 登记范围 | Legal / compliance owner + Market / GTM owner | 复核中国场景仍是否保持 `企业内闭环工具` 定位 | +| 执法与处罚 | 国家网信办、地方网信办、工信系统 | 针对标识、备案、个人信息、内容治理的执法通报 | 每周 | 任一处罚直接命中当前产品控制域,例如标识、导出、日志、用户声明、输入信息留存 | Legal / compliance owner | 更新 safeguards、停机条件和售前禁区 | +| 判例与司法政策 | 最高人民法院、最高法知识产权法庭、互联网法院公开渠道 | 训练数据版权、平台责任、AI 生成内容标识、AI 生成物保护、平台注意义务 | 每月 | 出现能重写平台责任边界、训练数据合法性判断或输出责任边界的代表性案件 / 政策文件 | Research owner + Legal / compliance owner | 更新总览文档、矩阵、中国 memo 的规则锚点 | +| 法院 / 公共法律服务规则 | 最高人民法院、法信、地方法院、司法行政系统、`12348` 中国法律服务网 | 法院是否发布 AI 提交 / 审核 / 辅助办案规则;公共法律服务 AI 使用边界;`12348` 或地方公共法律服务体系是否开始对 AI 助手、智能问答、拟人化互动或用户说明义务给出专门规则 | 每月 | 出现正式 court-facing 或 public-facing AI 规则,影响当前“不直接进入法院提交链路”的结论 | Legal / compliance owner + Product owner | 复核中国场景进入顺序和统一 no-go 边界 | +| 律师行业伦理与处分 | 司法部律师工作条线、中华全国律师协会、地方律协 | 律师使用 AI 的业务指引、纪律处分、保密与执业边界提示,以及在 `CAC` 专业领域条款出现后是否跟进发布法律服务领域 AI 规范 | 每月 | 发布全国性律师 AI 指引,或出现具有示范效应的处分 / 通报 | Legal / compliance owner | 更新中国 memo 的人工复核、签发和培训要求 | + +建议的中国重点源清单: + +- `https://www.miit.gov.cn/zcfg/qtl/art/2023/art_f4e8f71ae1dc43b0980b962907b7738f.html` +- `https://www.cac.gov.cn/2025-03/14/c_1743654684782215.htm` +- `https://www.cac.gov.cn/2025-03/14/c_1743654685896173.htm` +- `https://www.cac.gov.cn/2026-01/09/c_1769688009588554.htm` +- `https://www.cac.gov.cn/2025-11/25/c_1765795550841819.htm` +- `https://www.cac.gov.cn/2025-12/27/c_1768571207311996.htm` +- `https://www.cma.gov.cn/zfxxgk/zc/gz/202504/W020250429613208378185.pdf` +- `https://www.npc.gov.cn/npc/c2/c30834/202108/t20210820_313088.html` +- `https://www.court.gov.cn/zixun/xiangqing/382461.html` +- `https://www.court.gov.cn/zixun/xiangqing/447711.html` +- `https://www.court.gov.cn/` +- `https://ipc.court.gov.cn/` +- `https://www.moj.gov.cn/fzgz/fzgzggflfwx/fzgzlsgz/` +- `https://www.moj.gov.cn/fzgz/fzgzggflfwx/fzgzggflfw/` +- `https://www.moj.gov.cn/pub/sfbgw/fzgz/fzgzggflfwx/fzgzlsgz/202505/t20250515_519403.html` +- `https://www.moj.gov.cn/pub/sfbgw/fzgz/fzgzggflfwx/fzgzggflfw/202509/t20250910_524914.html` +- `https://www.moj.gov.cn/pub/sfbgwapp/fzgzapp/ggfzfwapp/lsgzapp/202601/t20260129_531169.html` +- `https://www.moj.gov.cn/pub/sfbgwapp/fzgzapp/ggfzfwapp/ggfzfwapp2/202503/t20250321_516166.html` +- `https://www.12348.gov.cn/` +- `https://www.acla.org.cn/` + +其中,`2026-01-09` 的 `CAC` 公告默认作为备案 / 登记锚点,`2025-11-25` 的 `CAC` 通报默认作为标识执法锚点,`2025-12-27` 的 `CAC` 拟人化互动服务管理征求意见稿默认作为**拟人化互动 / 专业领域附加义务**锚点;如果后续问题落到法律服务行业自身规范,默认再补扫 `司法部律师工作条线`、`司法部公共法律服务条线`、`12348` 与 `中华全国律师协会`。 + +如 `司法部` 两个法律服务栏目根路径继续自循环重定向,默认仍先记录根路径的 `source access` 状态,再按同域 fallback official page 使用顺序补扫: + +- `司法部律师工作条线`:优先使用 `https://www.moj.gov.cn/pub/sfbgw/fzgz/fzgzggflfwx/fzgzlsgz/YYYYMM/tYYYYMMDD_*.html` 这一 article-page family +- `司法部公共法律服务条线`:优先使用 `https://www.moj.gov.cn/pub/sfbgw/fzgz/fzgzggflfwx/fzgzggflfw/YYYYMM/tYYYYMMDD_*.html` 这一 article-page family +- 如 `pub/sfbgw/...` article-page family 也无法直接定位或仍然抓取失败,再补扫同域 `pub/sfbgwapp/...` article-page family +- 如以上 `MOJ` 根路径与两类 article-page family 在自动抓取中都仍失败,必须把该锚点记为 `source unavailable (automation)`,并安排人工浏览器复核同域官方页面;在人工复核完成前不得写成“已完成 no-change” +- 如仍未拿到可复核的同域官方页面,再补记同域站内检索结果页或主管部门首页下的同主题官方页面 + +首次中国 dated update 的最小官方锚点见 section `11.1`,后续每次中国 `L1 / L2 / L3` 更新至少应回链到其中一项或新增同级别官方源。 + +## 4. 新加坡法域监测清单 + +当前研究包中,新加坡法域最敏感的是:法律行业 GenAI 指南是否更新、法院用户规则是否变化、专业责任与 disclosure 边界是否收紧,以及 secure tool / client confidentiality / auditability 的要求是否变化。 + +| 监测条线 | 主要来源 | 盯什么 | 刷新频率 | 触发阈值 | 责任人 | 触发后要改什么 | +| --- | --- | --- | --- | --- | --- | --- | +| 法律行业指导 | Ministry of Law | `Guide for Using Generative AI in the Legal Sector` 的更新、附录和案例扩充、风险分级和 due diligence 变化 | 每周 | 最终版更新,或新增与 confidentiality、tool due diligence、client disclosure、human oversight 直接相关的要求 | Research owner + Legal / compliance owner | 更新新加坡 memo 的 safeguards、evidence gaps、服务模式 | +| 法院规则与 circulars | Singapore Courts、Supreme Court、State Courts、Family Justice Courts | `Guide on the Use of Generative AI Tools by Court Users`、Registrar’s Circular、提交材料验证和 disclosure 要求 | 每周 | 法院改为强制披露、强制 affidavit、禁止某类 AI 使用,或新增 sanctions / costs 风险说明 | Legal / compliance owner + Product owner | 复核 Legal RAG / drafting 是否仍是 `GO (P0)` | +| 律师伦理 / 专业行为 | MinLaw 指南中对 PCR / SCCA Code 的引用,Law Society / SAL 正式资源 | competence、confidentiality、honesty、client disclosure、opt-out、approved tools | 每月 | 发布新的正式行业指引、协会正式 AI 使用规范,或对 existing guide 的重大修订 | Legal / compliance owner | 更新新加坡 memo 的 professional-internal 边界和客户条款 | +| 判例与 sanctions | Singapore Courts judgments、法院通报 | AI citation、hallucination、confidentiality breach、court submission verification 的案件 | 每月 | 出现代表性判例或 sanctions,明确律师 / court user 在 AI 使用上的责任后果 | Research owner + Legal / compliance owner | 更新新加坡 memo 的 no-go 边界和 stop conditions | +| 行业基础设施与实际落地 | MinLaw、SAL、LawNet / LTP 官方发布 | LawNet AI、LTP + Copilot、行业 adoption 项目、secure legal workflow infrastructure | 每月 | 官方基础设施调整导致市场进入方式发生变化,例如 secure tool 路线明显加强或收缩 | Product owner + Market / GTM owner | 更新进入顺序、目标客户和部署建议 | + +建议的新加坡重点源清单: + +- `https://www.mlaw.gov.sg/files/Guide_for_using_Generative_AI_in_the_Legal_Sector__Published_on_6_Mar_2026_.pdf` +- `https://www.judiciary.gov.sg/docs/default-source/news-and-resources-docs/guide-on-the-use-of-generative-ai-tools-by-court-users.pdf?sfvrsn=3900c814_1` +- `https://www.mlaw.gov.sg/driving-the-next-stage-of-digitalisation-through-lift/` +- `https://www.mlaw.gov.sg/enhanced-productivity-for-law-firms-in-singapore-with-the-legal-technology-platform/` +- `https://sal.org.sg/technology/` +- `https://www.judiciary.gov.sg/` + +## 5. 香港法域监测清单 + +当前研究包中,香港法域最敏感的是:`PCPD` 的 AI / 隐私治理框架是否升级为更具体的执行要求、`Judiciary` 是否把当前偏内部治理的 AI 指引延伸到 court-facing 使用边界、`DoJ` 的 LawTech 推动是否进入更明确的行业规范,以及 `Law Society` 是否进一步细化律师使用 AI 的 human oversight、transparency 和 data governance 要求。 + +| 监测条线 | 主要来源 | 盯什么 | 刷新频率 | 触发阈值 | 责任人 | 触发后要改什么 | +| --- | --- | --- | --- | --- | --- | --- | +| AI / 隐私治理框架 | PCPD | `AI: Model Personal Data Protection Framework`、员工使用 GenAI 指南、后续 FAQ / 指南 / 模板 | 每周 | 发布新的正式指引、把原则要求改成更具体的组织控制要求,或对 permitted tools、input / output、incident response 提出更高门槛 | Research owner + Legal / compliance owner | 更新香港 memo 的 minimum safeguards、部署要求和客户条款 | +| 执法与合规检查 | PCPD | AI compliance checks、调查、执法通报、行业提醒 | 每周 | 出现直接命中当前产品控制域的检查结果或执法表述,例如 data handling、employee use、retention、governance failure | Legal / compliance owner + Product owner | 更新 stop conditions、售前禁区和数据处理表述 | +| 法院规则与司法指导 | Hong Kong Judiciary | generative AI guidance、法院公告、court-facing AI 使用边界、confidentiality / verification 要求 | 每周 | `Judiciary` 把当前内部 guidance 扩展到 court users、提交材料、disclosure 或 sanctions,或新增更明确的 litigation / legal analysis 限制 | Legal / compliance owner + Product owner | 复核香港场景中 Legal RAG / drafting 是否仍保持 `CONDITIONAL GO (P1)`,并更新统一 no-go 边界 | +| 法律科技政策与市场基础设施 | DoJ | `Promoting LawTech`、`Legal Knowledge Engineers`、法律科技资助 / 人才 / 行业项目 | 每月 | 官方政策把 AI / LawTech 支持范围显著扩大或收紧,足以改变香港作为扩张法域的进入顺序 | Product owner + Market / GTM owner | 更新矩阵中的香港定位、服务模式和扩张优先级 | +| 律师伦理与专业行为 | The Law Society of Hong Kong | AI position paper、practice guidance、confidentiality、hallucination、transparency、human oversight | 每月 | 发布新的正式 AI 指引,或对 disclosure、supervision、training、client communication 的要求显著提高 | Legal / compliance owner | 更新香港 memo 的 professional-internal 边界、人工复核和客户沟通要求 | +| 判例与司法态度 | Judiciary judgments、HKLII 等公开裁判来源 | AI hallucination、citation error、confidentiality、professional competence、court misuse | 每月 | 出现足以改变 court-facing 风险判断的代表性案件或司法公开表态 | Research owner + Legal / compliance owner | 更新香港 memo 的 no-go 边界、required verifier controls 和风险描述 | + +建议的香港重点源清单: + +- `https://www.pcpd.org.hk/english/resources_centre/publications/files/ai_protection_framework.pdf` +- `https://www.pcpd.org.hk/english/resources_centre/publications/files/guidelines_ai_employees.pdf` +- `https://www.pcpd.org.hk/english/news_events/media_statements/press_20250508.html` +- `https://www.judiciary.hk/doc/en/court_services_facilities/guidelines_on_the_use_of_generative_ai.pdf` +- `https://www.doj.gov.hk/en/legco/pdf/ajls20250602e1.pdf` +- `https://www.doj.gov.hk/en/community_engagement/press/20250507_pr1.html` +- `https://www.hklawsoc.org.hk/-/media/HKLS/Home/News/2024/LSHK-Position-Paper_AI_EN.pdf` + +## 6. 英格兰及威尔士监测清单 + +当前研究包中,英格兰及威尔士最敏感的是:受监管 AI 法律服务的新案例、法院对 AI court documents 的正式规则、职业团体和监管机构对 hallucination / confidentiality / supervision 的要求,以及高位阶案件是否改变“先做 professional-internal 工具”的判断。就 `2026-03-20` 本轮刷新而言,最值得盯的 live signal 是:`CJC` 的 current-work page 仍显示 eight-week consultation 正在进行,latest-news page 进一步写明截止到 `2026-04-14 23:59`,而 interim report 已经把讨论推进到更具体的 proposal,开始触及**AI 用于生成法院拟采信证据时的 declaration 义务**。 + +### 6.1 提前完成 `2026-03-25` 的 `CJC` `L2` 影响分析 + +- 当前 `CJC` signal 已足以把英格兰及威尔士的 court-facing 风险,从泛化的“AI disclosure / evidence 可能收紧”,收窄成更具体的 `evidence-stage gating` 判断。 +- 按当前 consultation paper / cover sheet,最接近 future formal declaration 的是: + - `trial witness statements`:方向是要求声明 AI 没有被用于生成、改写、加强、弱化或重述证人证据 + - `expert reports`:方向是要求识别并说明被用于报告的 AI(纯行政用途除外) + - `skeleton arguments / advocacy documents / disclosure lists and statements`:当前反而没有 pressing case 去增加额外 court rule +- 这不会改变本包把英格兰及威尔士放在 `P0` 的 professional-internal 进入顺序,也不会放松 court-facing `NO-GO`;相反,它更明确说明任何 witness / expert evidence generation or reshaping 都应继续排除在默认产品范围之外。 +- 因此本次回写需要更新 `court-facing` tracker 与 `uk-australia-uae` memo 的 evidence / disclosure 描述,但暂不改 `legal-ai-opportunity-risk-matrix`,因为法域排序与统一 go / no-go 边界未变。 +- 在 `2026-04-14 23:59` consultation closing time 之前,如果 `CJC current-work`、`latest-news` 和相关 consultation / interim material 只是继续确认 consultation 仍 open、scope 仍覆盖 pleadings / witness statements / expert reports,且 evidence-stage proposal 没有实质推进,默认只记为 tracker-level `L1 no-change` status verification;除非出现新的 official material、closing-time 变化、final report / practice direction,或 proposal 明显改变当前 evidence-stage judgement,否则不重复开启新的 `L2` 回写。 +- 同日 live recheck 还确认:`CJC current-work` page 当前暴露的 consultation PDF 已切换到 `/2026/03/` official path,但这只属于 source-anchor refresh,不改变上述 `L2` 判断、统一 `NO-GO` 边界或英格兰及威尔士的进入顺序。 + +| 监测条线 | 主要来源 | 盯什么 | 刷新频率 | 触发阈值 | 责任人 | 触发后要改什么 | +| --- | --- | --- | --- | --- | --- | --- | +| 律师监管与创新准入 | SRA | AI-driven law firm 授权、监管声明、Risk Outlook、纪律处分、consumer protection 条件 | 每周 | 新的 AI firm 授权模式、正式纪律处分、或 SRA 对 client confidentiality / supervision / insurance 的新要求 | Research owner + Legal / compliance owner | 更新英格兰及威尔士在扩张排序中的位置和可做场景 | +| 法院规则与司法指导 | Courts and Tribunals Judiciary、Civil Justice Council | `AI Judicial Guidance`、CJC current-work / latest-news / interim report / final report、practice directions、court documents 规则、AI-generated evidence declaration 走向 | 每周 | consultation 进入 final report、practice direction 生效、court filing AI disclosure 义务正式落地,或 interim proposal 明显收紧到具体 evidence / declaration 要求 | Legal / compliance owner + Product owner | 更新“不要先做 court filing automation”的边界,以及 court-facing evidence / disclosure 控制项 | +| 判例与 sanctions | Judiciary judgments、National Archives judgments | fake citations、AI misuse、professional competence、costs / contempt / referral | 每周 | 出现新的高位阶或高传播案件,足以改变对 legal research / drafting 风险的判断 | Research owner + Legal / compliance owner | 更新 memo 中的 case-law risk 和 required verifier controls | +| 律师伦理与行业指引 | Law Society、Bar Council、BSB | generative AI guidance、confidentiality、privilege、verification、court duties、training expectations | 每月 | 行业指引重大更新,或 profession-wide training / supervision expectation 被明确提高 | Legal / compliance owner | 更新 minimum safeguards、training、review requirement | +| 法院链路细分场景 | Civil / Family / Tribunal 相关官方规则 | pleadings、witness statements、expert reports、disclosure review 等子流程规则 | 每月 | 某一子流程单独发布更严格 AI 规则,例如 disclosure、expert evidence、family bundles | Product owner + Legal / compliance owner | 把英格兰及威尔士场景从泛 Legal RAG 拆成更细的 workflow gating | + +建议的英格兰及威尔士重点源清单: + +- `https://media.sra.org.uk/sra/news/press/garfield-ai-authorised/` +- `https://www.sra.org.uk/sra/research-publications/artificial-intelligence-legal-market/` +- `https://www.lawsociety.org.uk/Topics/AI-and-lawtech/Guides/Generative-AI-the-essentials` +- `https://www.judiciary.uk/guidance-and-resources/artificial-intelligence-ai-judicial-guidance-october-2025/` +- `https://www.judiciary.uk/related-offices-and-bodies/advisory-bodies/cjc/current-work/use-of-ai-in-preparing-court-documents/` +- `https://www.judiciary.uk/related-offices-and-bodies/advisory-bodies/cjc/latest-news/` +- `https://www.judiciary.uk/wp-content/uploads/2026/03/Interim-Report-and-Consultation-Use-of-AI-for-Preparing-Court-Docume.pdf` +- `https://www.judiciary.uk/wp-content/uploads/2025/06/Ayinde-v-London-Borough-of-Haringey-and-Al-Haroun-v-Qatar-National-Bank.pdf` +- `https://www.barcouncil.org.uk/resource/updated-guidance-on-generative-ai-for-the-bar.html` + +## 7. 结论更新流程 + +### 7.1 周度流程 + +1. `Research owner` 每周固定扫一次四组法域的官方源。 +2. 新材料统一按 `L1 / L2 / L3` 分级。 +3. 同一周内出现多条同主题材料时,合并成一个监测事件,不重复刷屏。 + +### 7.2 触发后动作 + +| 触发等级 | 必做动作 | 时限 | +| --- | --- | --- | +| L1 | 记录到 tracker,附一句影响判断 | 当周完成 | +| L2 | 输出 1 段“影响现有结论吗”的分析,并指定要不要改某份 memo / matrix | `5` 个工作日内 | +| L3 | 发起专项复核;必要时先把相关场景改成 `HOLD` 或冻结对外表述 | `48` 小时内 | + +### 7.3 文档更新映射 + +| 变化发生在哪个法域 | 默认要回写的文档 | +| --- | --- | +| 中国 | `china-legal-ai-go-no-go-memo`、`china-contract-compliance-copilot-management-memo`、`china-contract-compliance-copilot-ops-checklist`、`china-contract-compliance-copilot-management-brief`、`legal-ai-opportunity-risk-matrix` | +| 新加坡 | `singapore-legal-ai-go-no-go-memo`、`legal-ai-opportunity-risk-matrix` | +| 香港 | `hong-kong-legal-ai-go-no-go-memo`、`hong-kong-legal-ai-management-brief`、`legal-ai-opportunity-risk-matrix` | +| 英格兰及威尔士 | `uk-australia-uae-legal-ai-market-comparison`、`legal-ai-opportunity-risk-matrix` | +| 跨法域共性变化 | `legal-llm-law-intersections`、`legal-ai-opportunity-risk-matrix`、`court-facing-ai-rules-sanction-risk-tracker`,必要时再回写各法域 memo | + +## 8. 当前建议的首轮监测重点 + +如果只选最值得优先盯的监测项,我会先盯下面这些: + +- 中国: + - 备案 / 登记公告口径是否继续收紧 + - 标识办法执法是否开始进入常态化处罚 + - 最高法 / 互联网法院是否继续输出涉 AI 版权、平台责任、训练数据相关裁判规则 + - 是否首次从法院内部 AI 应用规范 / 建设材料,转向更明确的 court-user filing / disclosure / verification 正式规则文本 +- 新加坡: + - MinLaw 法律行业 GenAI 指南是否继续更新附录、样例、尽调要求 + - Singapore Courts 是否把 disclosure 或 affidavit 要求收紧 + - 是否出现代表性 AI citation / confidentiality 案件 +- 香港: + - PCPD 是否继续把 AI / 隐私治理框架往更具体的组织控制要求推进 + - Judiciary 是否把当前 AI guidance 延伸到 court users、提交材料或更明确的 litigation 边界 + - Law Society 是否把 human oversight、transparency、data governance 进一步写成更具体的律师使用要求 +- 英格兰及威尔士: + - CJC interim report 中关于 AI-generated evidence declaration 的 proposal 是否变成正式规则或 CPR / practice direction + - SRA 是否继续批准新一类 AI legal service model + - 是否出现更多像 `Ayinde` 这类直接影响律师 competence / supervision 的判决 + +## 9. 维护原则 + +- 只记录会影响产品边界、合规边界、对外表述、go / no-go 或 pilot 控制要求的变化。 +- 先看官方源,再看行业组织;媒体报道只能作为线索,不能替代一手材料。 +- 新材料如果只强化旧结论,不必重写所有文档,但要在 tracker 里留下时间戳和一句判断。 +- 任何涉及法院提交、监管申报、公众服务边界的变化,默认按 `L3` 处理,先复核,再决定是否继续推进。 + +## 10. 当前开放 court-facing 监测动作 + +这部分只列已经进入明确截止日期管理的 court-facing 事项,避免它们只留在专项 tracker 里、却没有回到总包级别的跟踪视图。 + +| 截止日期 | 法域 | 事件 / 动作 | 默认 owner | 默认回写 | +| --- | --- | --- | --- | --- | +| `2026-03-27` | 中国 / 新加坡 / 香港 / 澳大利亚 | 完成下一轮 court-facing 周度 source sweep,并至少记录一次 no-change 或 change event | `Research owner` | `court-facing-ai-rules-sanction-risk-tracker`,必要时回写对应 memo | +| `2026-04-14 23:59` | 英格兰及威尔士 | 在 `CJC` consultation closing time 当天复核 current-work page、latest-news page 和 interim paper,判断是否从 `L2` 升到更明确的规则变化结论 | `Research + Legal / compliance owner` | `court-facing-ai-rules-sanction-risk-tracker`、`uk-australia-uae-legal-ai-market-comparison`,必要时 `legal-ai-opportunity-risk-matrix` | + +## 11. 首次 dated monitoring update(2026-03-20 UTC) + +为了让这份 tracker 从“有节奏的研究清单”变成真正可审计的运营资产,首轮基线刷源结果至少按下面方式留痕: + +| 日期 | 法域 | 主题 | 来源页面 | source access summary | 触发等级 | 这次变化 / 核查结论 | 会不会改变当前边界 | owner | 下次动作 | +| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | +| `2026-03-20` | 中国 | 标识治理 | `CAC` 标识办法发布页 + 答记者问 + `2025-11-25` 标识违法违规集中查处通报 | 标识办法发布页:`direct`;答记者问:`direct`;执法通报:`direct` | `L1` | 当前应继续按“显式标识 + 文件元数据隐式标识 + 平台上架核验 + 用户协议说明”理解中国标识要求;`2025-11-25` 的集中查处通报进一步说明标识要求已经进入实际执法与下架处置层面。对研究包而言,这强化了导出、协议、日志和提示语设计的重要性,但没有改变既有产品边界 | `不会` | `Research + Legal / compliance owner` | `2026-03-27` 复核是否出现新的实施通报、执法或配套标准变化 | +| `2026-03-20` | 中国 | 备案 / 登记与产品边界 | `CAC` `2025` 年生成式人工智能服务已备案信息公告 | 备案 / 登记公告:`direct` | `L1` | 当前备案 / 登记口径仍围绕具有舆论属性或者社会动员能力的生成式 AI 服务;已上线应用或功能仍需公示模型名称、备案号或上线编号。这进一步支持研究包把中国 legal AI 首发边界放在 `企业内闭环工具`,同时继续盯 API / 应用层登记口径是否外溢 | `不会` | `Research + Legal / compliance owner + Market / GTM owner` | `2026-03-27` 复核后续备案 / 登记公告是否改变企业内工具边界 | +| `2026-03-20` | 中国 | 行业专项与 court-facing | `人工智能气象应用服务办法` + 最高法 AI 司法应用 / 法信材料 | 气象办法:`direct`;`SPC` AI 司法应用意见:`source unavailable (automation)`,当前 `court.gov.cn` shell 抓取返回 `403`;法信材料:`source unavailable (automation)`,同类 `SPC` 官方页当前 shell 抓取返回 `403` | `L1` | 当前可确认两点:一是中国 AI 合规要求已开始出现行业专项办法,说明部分垂直场景会有额外备案、算法与安全义务;二是当前最高法公开材料仍主要是法院内部 AI 应用规范、智能审判 / 诉讼服务建设和法律模型基础设施,而不是单独全国 court-user filing / disclosure guide | `不会` | `Research + Legal / compliance owner + Product owner` | `2026-03-27` 继续复核是否出现其他行业专项办法或全国 court-user 规则文本 | +| `2026-03-20` | 中国 | 拟人化互动与专业领域附加义务 | `CAC` 《人工智能拟人化互动服务管理暂行办法(征求意见稿)》 + `司法部律师工作条线` / `司法部公共法律服务条线` / `12348` / `中华全国律师协会` follow-on source sweep | 征求意见稿:`direct`;`司法部律师工作条线`栏目根路径:`source unavailable (automation)`,当前 shell 抓取自循环重定向;律师工作 `pub/sfbgw/...` fallback article:`source unavailable (automation)`,同轮 shell 抓取仍自循环重定向;`司法部公共法律服务条线`栏目根路径:`source unavailable (automation)`,当前 shell 抓取自循环重定向;公共法律服务 `pub/sfbgw/...` fallback article:`source unavailable (automation)`,同轮 shell 抓取仍自循环重定向;`12348` 首页:`direct`;`中华全国律师协会` 首页:`direct` | `L2` | 该征求意见稿显示中国 AI 治理已经从通用生成式 AI 规则进一步延伸到具体应用形态:要求显著提示用户正在与人工智能交互、限制用户交互数据对外提供和用于模型训练、触发条件下开展安全评估,并要求应用分发平台核验安全评估和备案情况;同时明确卫生健康、金融、法律等专业领域服务还需同时符合主管部门规定。对本研究包而言,它强化了**不要把中国 legal AI 做成面向公众的拟人化互动服务**这一边界判断,但暂不改变当前以企业内闭环工具为主的统一边界。同日复核 `CAC` 官方页仍写明意见反馈截止时间为 `2026-01-25`,且第三十二条仍保留 `本办法自2026年 月 日起施行`,因此本轮仍按 draft-stage signal 处理,而不是已定稿规则。同日 follow-on 补扫显示:`MOJ` 两个法律服务栏目根路径及当前记录的 `pub/sfbgw/...` fallback article 在自动抓取里仍是 source-access 问题,而不是规则变化;同日 official-domain search 下 `MOJ` 可见的 AI 相关页面仍主要是地方司法行政实践、律所数字化经验或公共法律服务建设稿件,例如 `当律师遇上AI,会擦出怎样的火花`,应继续按 `signal-only / L1 no-change` 处理;`中华全国律师协会` 官方域当前可见的 AI 相关页面仍主要是行业动态或业务进阶内容,例如 `律师如何借助人工智能开展法律业务` 这类 `业务进阶` 页面,默认也只按 `signal-only / L1 no-change` 处理,而不是全国性法律行业 AI 正式指引;`12348` 与 `中华全国律师协会` 当前可直接访问,但本轮未看到足以改写当前边界的全国性法律行业 AI 正式指引或配套规范 | `不会` | `Research + Legal / compliance owner + Product owner` | `2026-03-27` 复核是否出现正式稿、补充答记者问或面向法律等专业领域的配套要求 | + +### 11.1 本次中国 dated update 的最小官方锚点 + +- `关于印发〈人工智能生成合成内容标识办法〉的通知` + - `https://www.cac.gov.cn/2025-03/14/c_1743654684782215.htm` +- `《人工智能生成合成内容标识办法》答记者问` + - `https://www.cac.gov.cn/2025-03/14/c_1743654685896173.htm` +- `网信部门依法集中查处一批存在人工智能生成合成内容标识违法违规问题的移动互联网应用程序` + - `https://www.cac.gov.cn/2025-11/25/c_1765795550841819.htm` +- `国家互联网信息办公室关于发布2025年生成式人工智能服务已备案信息的公告` + - `https://www.cac.gov.cn/2026-01/09/c_1769688009588554.htm` +- `国家互联网信息办公室关于《人工智能拟人化互动服务管理暂行办法(征求意见稿)》公开征求意见的通知` + - `https://www.cac.gov.cn/2025-12/27/c_1768571207311996.htm` +- `司法部` official-domain `signal-only` example page:`当律师遇上AI,会擦出怎样的火花` + - `https://www.moj.gov.cn/pub/sfbgw/fzgz/fzgzggflfwx/fzgzlsgz/202504/t20250410_517208.html` +- `中华全国律师协会` official-domain `signal-only` example page:`律师如何借助人工智能开展法律业务` + - `https://www.acla.org.cn/info/f6e55c6cc22f4f88917ca31970f01663` +- `人工智能气象应用服务办法` + - `https://www.cma.gov.cn/zfxxgk/zc/gz/202504/W020250429613208378185.pdf` +- `《最高人民法院关于规范和加强人工智能司法应用的意见》全文(中英文版)` + - `https://www.court.gov.cn/zixun/xiangqing/382461.html` +- `推进现代科技与司法审判工作深度融合 最高法发布“法信法律基座大模型”研发成果` + - `https://www.court.gov.cn/zixun/xiangqing/447711.html` + +### 11.2 首次中国 sweep 的逐锚点判断(按 section 12.1 / 12.3 补齐) + +为了避免首轮 `2026-03-20` 记录只有主题级摘要、却没有逐锚点 `no-change / change` 判断,现按后续中国周度 sweep 的最低留痕格式补齐如下: + +| 日期 | 主题 | 本轮检查过的官方锚点 | source access / fallback used | 每个锚点的 `no-change / change` 判断 | meaningful-change 结论 | 会不会改变当前统一边界 | 需要回写哪些文档 | owner | 下次动作 | +| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | +| `2026-03-20` | 标识治理 | `CAC` 标识办法通知;`CAC` 标识办法答记者问;`2025-11-25` `CAC` 标识执法通报 | 通知:`direct`;答记者问:`direct`;执法通报:`direct` | 通知:`no-change`,显式标识、隐式标识、导出文件标识、平台核验、用户协议和日志要求仍是当前基线;答记者问:`no-change`,仍是对既有标识机制的解释性补充;执法通报:`no-change`,仍主要说明显式标识、隐式标识、平台核验和用户声明功能已经进入执法重点 | `不是,仍按 L1 no-change` | `不会` | `无(维持 section 11 摘要即可)` | `Research + Legal / compliance owner` | `2026-03-27` | +| `2026-03-20` | 备案 / 登记 | `2026-01-09` `CAC` 备案 / 登记公告 | 公告:`direct` | 公告:`no-change`,仍以具有舆论属性或者社会动员能力的生成式 AI 服务为主要备案 / 登记对象,应用或功能显著位置公示模型名称、备案号或上线编号的要求未改变当前企业内闭环工具边界 | `不是,仍按 L1 no-change` | `不会` | `无(维持 section 11 摘要即可)` | `Research + Legal / compliance owner + Market / GTM owner` | `2026-03-27` | +| `2026-03-20` | 行业专项规则 | `人工智能气象应用服务办法` | 办法:`direct` | 办法:`no-change`,当前仍主要说明中国 AI 合规已进入纵向部门专项治理,并出现可迁移关注的数据身份标识、数据管理和部门协同思路,但尚未形成直接面向法律服务的专项边界 | `不是,仍按 L1 no-change` | `不会` | `无(维持 section 11 摘要即可)` | `Research + Legal / compliance owner + Product owner` | `2026-04-03` | +| `2026-03-20` | 拟人化互动 / 专业领域附加义务 | `2025-12-27` `CAC` 拟人化互动服务管理征求意见稿;`司法部律师工作条线`栏目根路径;律师工作 `pub/sfbgw/...` fallback article;`司法部公共法律服务条线`栏目根路径;公共法律服务 `pub/sfbgw/...` fallback article;`12348` 中国法律服务网首页;`中华全国律师协会` 首页 | 征求意见稿:`direct`;律师工作栏目根路径:`source unavailable (automation)`,当前 shell 抓取自循环重定向;律师工作 `pub/sfbgw/...` fallback article:`source unavailable (automation)`,同轮 shell 抓取仍自循环重定向;公共法律服务栏目根路径:`source unavailable (automation)`,当前 shell 抓取自循环重定向;公共法律服务 `pub/sfbgw/...` fallback article:`source unavailable (automation)`,同轮 shell 抓取仍自循环重定向;`12348` 首页:`direct`;`中华全国律师协会` 首页:`direct` | 征求意见稿:`change`,新增拟人化互动服务单独治理框架,要求显著提示用户正在与人工智能交互、限制用户交互数据对外提供与训练使用、在特定情形下开展安全评估,并要求应用分发平台核验安全评估和备案情况;第三十一条还明确卫生健康、金融、法律等专业领域服务需同时符合主管部门规定。这对任何面向公众的拟人化 legal AI 都是更高约束,但未改变当前企业内闭环工具边界;同日复核该官方页可见意见反馈截止时间仍为 `2026-01-25`,第三十二条仍留空正式生效日期,因此本轮仍按 draft-stage signal 处理,而不是已定稿规则;律师工作栏目根路径 / fallback article:`no-change`,当前只确认 `MOJ` 官方栏目在自动抓取中仍不可达,应继续按 `source unavailable (automation)` + fallback / 人工复核规则处理,不能把这类 source-access 问题当作已有法律行业 AI 指引;公共法律服务栏目根路径 / fallback article:`no-change`,同上;`12348` 首页:`no-change`,当前可直接访问,但同域 AI / 智能法律服务可见材料仍主要是地方公共法律服务或智能服务实践,应按 `signal-only / L1 no-change` 处理,本轮未见全国性公共法律服务 AI 助手专门规范;`中华全国律师协会` 首页:`no-change`,当前可直接访问;同日 official-domain example page 仍以 `律师如何借助人工智能开展法律业务` 这类 `业务进阶` 页面为主,应按 `signal-only / L1 no-change` 处理,本轮未见全国性律师使用 AI 正式指引或纪律规范更新;同日 official-domain search 下 `MOJ` 可见的 AI 相关页面仍以 `当律师遇上AI,会擦出怎样的火花` 这类律师业务 / 律所数智化报道为主,也应按 `signal-only / L1 no-change` 处理,不构成全国性正式规范 | `是,按 L2` | `不会` | `无(维持 section 11 摘要即可;待正式稿或法律行业主管部门正式规范再决定是否回写 memo / matrix)` | `Research + Legal / compliance owner + Product owner` | `2026-03-27` | +| `2026-03-20` | 法院 / 公共法律服务边界 | `SPC` AI 司法应用意见;`SPC` 法信法律基座大模型材料 | 意见:`source unavailable (automation)`,当前 `court.gov.cn` shell 抓取返回 `403`,边界判断沿用已纳入本包的官方文本;法信材料:`source unavailable (automation)`,同类 `SPC` 官方页当前 shell 抓取返回 `403`,后续默认转人工浏览器核验 | 意见:`no-change`,仍是法院侧 AI 司法应用治理与辅助审判定位;法信材料:`no-change`,仍是法律大模型基础设施与 court-side capability 建设信号,不是面向 court users 或 public-facing legal AI 的全国提交 / 披露规则 | `不是,仍按 L1 no-change` | `不会` | `无(维持 section 11 摘要即可)` | `Research + Legal / compliance owner + Product owner` | `2026-04-03` | + +## 12. 中国法域当前开放监测队列 + +为了让中国侧的治理、标识、备案 / 登记和行业专项规则变化进入明确的截止日期管理,当前默认开放队列如下: + +| 截止日期 | 主题 | 必做动作 | 默认 owner | 触发后默认回写 | +| --- | --- | --- | --- | --- | +| `2026-03-27` | 标识治理 | 复核 `CAC` 标识办法是否出现新的答记者问、实施说明、配套标准解释或执法通报 | `Research + Legal / compliance owner` | `china-legal-ai-go-no-go-memo`、`china-contract-compliance-copilot-ops-checklist`、必要时 `legal-ai-opportunity-risk-matrix` | +| `2026-03-27` | 备案 / 登记口径 | 复核 `CAC` 是否发布新的备案 / 登记公告,重点看 API / 功能级登记口径和显著位置公示要求是否变化 | `Research + Legal / compliance owner + Market / GTM owner` | `china-legal-ai-go-no-go-memo`、`china-contract-compliance-copilot-management-memo`、必要时 `legal-ai-opportunity-risk-matrix` | +| `2026-03-27` | 拟人化互动 / 专业领域附加义务 | 复核 `CAC` 拟人化互动服务管理征求意见稿是否出现正式稿、答记者问或补充说明,并同步补扫 `司法部律师工作条线`、`司法部公共法律服务条线`、`12348` 和 `中华全国律师协会`,重点看法律等专业领域条款、用户交互数据限制、安全评估、应用分发平台核验与算法备案要求是否进一步收紧;如 `司法部` 栏目页出现自循环重定向、超时或暂时不可达,必须记录 `source unavailable` 并改查同域官方文章页或站内检索结果页,避免把抓取失败误记为 `no-change` | `Research + Legal / compliance owner + Product owner` | `china-legal-ai-go-no-go-memo`、`legal-ai-opportunity-risk-matrix`,必要时回写中国 memo 家族 | +| `2026-04-03` | 行业专项规则 | 补扫是否出现更多类似 `人工智能气象应用服务办法` 的部门规章或专项规范,并判断法律 AI 是否会受跨行业控制思路影响 | `Research + Legal / compliance owner + Product owner` | `legal-llm-law-intersections`、`legal-ai-opportunity-risk-matrix`,必要时回写中国 memo 家族 | +| `2026-04-03` | 法院 / 公共法律服务规则 | 继续复核最高法、法信、地方法院和司法行政系统,确认是否出现面向 court users 或 public-facing legal AI 的新规则文本 | `Research + Legal / compliance owner + Product owner` | `china-legal-ai-go-no-go-memo`、`court-facing-ai-rules-sanction-risk-tracker`、必要时 `legal-ai-opportunity-risk-matrix` | + +### 12.1 下次中国 sweep 的最低留痕包 + +为了避免后续周度刷新只留下“看过了 / 没变化”这种无法复核的记录,后续中国 `no-change` 或 `change` 事件至少要带上下面这些留痕项: + +- 检查日期(UTC)和执行 owner +- 本轮至少检查过的官方锚点 + - `CAC` 标识办法通知 + - `CAC` 标识办法答记者问 + - `CAC` 已纳入本包的最新标识执法通报 + - 最新一轮 `CAC` 备案 / 登记公告 + - 已纳入本包的拟人化互动 / 专业领域附加义务锚点 + - 已纳入本包的行业专项规则锚点 + - `SPC` AI 司法应用 / 法信材料 + - 如果本轮主题触及法律等专业领域服务,至少再检查: + - `司法部律师工作条线` + - `司法部公共法律服务条线` + - `12348` 中国法律服务网 + - `中华全国律师协会` +- 对每个锚点补一条 source access 状态: + - `direct` + - `redirected` + - `timeout` + - `source unavailable` +- 如果原定官方 URL 自循环重定向、超时或暂时不可达: + - 不得直接记为 `no-change` + - 必须写明 `source unavailable` + - 必须补记 fallback official page used,例如同域官方文章页、站内检索结果页或主管部门首页下的同主题官方页面 + - 对 `司法部律师工作条线` 与 `司法部公共法律服务条线`,默认 fallback 顺序是:栏目根路径 -> 同域 `pub/sfbgw/...` article-page family -> 同域 `pub/sfbgwapp/...` article-page family -> 同域站内检索结果页 + - 如果以上 `MOJ` 自动抓取路径都失败,必须额外写明 `source unavailable (automation)` 并安排人工浏览器复核 + - 如果没有找到可替代的同域官方页面,必须把该锚点标为“本轮未完成核查”,并在 `下次动作` 中指定补刷日期 +- 对每个锚点写一句 `no-change / change` 判断 +- 如果判断为 `change`,必须写明影响的是: + - 标识 + - 备案 / 登记 + - 拟人化互动 / 专业领域附加义务 + - 行业专项 + - 法院 / 公共法律服务边界 +- 必须写明会不会改变当前统一边界: + - `不会` + - `可能` + - `会` +- 如果答案是 `可能` 或 `会`,必须同步指定要不要回写: + - `china-legal-ai-go-no-go-memo` + - `china-contract-compliance-copilot-management-memo` + - `china-contract-compliance-copilot-ops-checklist` + - `legal-ai-opportunity-risk-matrix` + - `court-facing-ai-rules-sanction-risk-tracker` +- 最后用 section `12.2` 的规则判断:这次变化到底只是 `L1` 留痕,还是已经构成需要升级分析或改边界的 meaningful change + +### 12.2 中国 sweep 的 meaningful-change 判定 + +为了避免后续中国周度刷新把所有“新公告 / 新通报 / 新表述”都误判成需要重写整包,默认按下面四类问题判断什么才算值得升级处理的 meaningful change: + +- 标识治理 + - 如果新材料改变了显式标识位置、隐式标识字段、平台核验要求、用户声明 / 协议责任、导出文件处理或留痕要求,至少按 `L2` 处理 + - 如果执法通报直接命中本包当前控制域,例如导出无显式标识、元数据字段缺失、平台未核验、用户声明功能缺失,且足以改写现有 safeguard 设计,按 `L2` + - 如果全国性规则、强制标准或高频执法信号已经足以改变当前默认产品边界,按 `L3` +- 备案 / 登记 + - 如果新公告或地方实践改变了 API / 应用 / 功能级调用是否需要登记、公示模型名称 / 备案号 / 上线编号的要求,至少按 `L2` + - 如果现有“企业内闭环工具”边界可能被重新纳入备案 / 登记口径,按 `L3` +- 拟人化互动 / 专业领域附加义务 + - 如果新材料只是确认 `CAC` 征求意见稿仍在原有方向推进,但未出现正式稿、未新增法律服务领域主管部门配套规范,默认可保留在 `L1` 或维持已记录的 `L2` 待跟踪状态 + - 如果正式稿、答记者问、配套说明或行业主管部门规则收紧了用户交互数据处理、显著提示、安全评估、平台核验或算法备案要求,至少按 `L2` + - 如果 `司法部`、`12348` 或 `中华全国律师协会` 发布了直接面向法律服务 AI、律师使用 AI、公共法律服务 AI 助手或拟人化互动 legal AI 的全国性正式规则 / 指引,至少按 `L2` + - 如果 `司法部`、`12348` 或 `中华全国律师协会` 官方域名下新出现的 AI 相关材料只是地方实践报道、业务进阶文章、行业动态或项目宣传,而不是全国性正式规则 / 指引 / 纪律规范,默认按 `L1 no-change` 记录为 signal-only,不单独据此改写当前统一边界 + - 如果这些法律服务领域规则足以改变当前 `企业内闭环工具` 边界,或把面向公众的 legal AI / public-legal-service AI 明确推向更高合规门槛或 `NO-GO`,按 `L3` +- 官方源可达性 / 替代页面 + - 如果只是 `司法部 / 12348 / 中华全国律师协会` 某个栏目页临时重定向、超时或不可达,但同域官方替代页面仍能支撑相同结论,默认不算 substantive meaningful change;应按 `L1` 记录 `source unavailable + fallback official page used` + - 如果 `MOJ` 根路径与两类同域 article-page family 在本轮自动抓取中都失败,默认仍不算规则实质变化,但必须按 `L1` 记录 `source unavailable (automation)`,并把人工浏览器复核列为下次动作 + - 如果关键官方锚点在本轮既无法直接访问,也找不到可替代的同域官方页面,默认仍不算规则实质变化,但不得写成“已完成 no-change”;必须把该锚点记为未完成核查,并在 `下次动作` 中安排补刷 +- 行业专项规则 + - 如果新的部门规章或专项规范只是证明“AI 合规开始向垂直行业细分”,但尚未改变法律 AI 当前边界,可按 `L1` 记录 + - 如果专项规则开始出现可迁移到法律 AI 的明确控制逻辑,例如额外的质量评估、模型 / 算法管理、数据身份标识、日志或安全义务,至少按 `L2` + - 如果出现直接面向法律服务、合规服务、公共法律服务或相邻高敏场景的专项 AI 规则,按 `L3` +- 法院 / 公共法律服务边界 + - 如果仍然只是法院内部 AI 应用规范、智能审判 / 诉讼服务建设或法律模型基础设施材料,默认按 `L1` + - 如果出现面向律师、诉讼参与人、法院用户或公共法律服务用户的 AI 提交 / 核验 / 披露要求,至少按 `L2` + - 如果出现全国性或高位阶的 court-user / public-facing legal AI 正式规则文本,默认按 `L3` + +只要命中下面任一结果,就应认定为 meaningful change,而不是普通留痕: + +- 会改变 `企业内闭环工具` 的统一边界 +- 会改变拟人化互动 legal AI、公共法律服务 AI 助手或律师使用 AI 的最低控制要求 +- 会改变标识、导出、日志、用户声明、人工复核等最低 safeguard +- 会改变 API / 应用 / 功能级备案或登记的判断 +- 会改变当前 `NO-GO / HOLD` 的 court-facing 结论 +- 会改变 `legal-ai-opportunity-risk-matrix` 中的 go / no-go、优先级或法域切入顺序 + +如果新材料只是重复强化既有要求,没有改变上述任一边界或控制项,默认保留在 `L1 no-change`,只需按 section `12.1` 留下可复核证据,不必重写 memo / matrix / 总览。 + +### 12.3 中国 sweep 的标准输出模板 + +为了避免不同执行人留下的 dated update 颗粒度不一致,后续中国周度 sweep 默认至少按下面模板输出一次,哪怕结论是 `no-change` 也一样: + +| 日期 | 主题 | 本轮检查过的官方锚点 | source access / fallback used | 每个锚点的 `no-change / change` 判断 | meaningful-change 结论 | 会不会改变当前统一边界 | 需要回写哪些文档 | owner | 下次动作 | +| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | +| `YYYY-MM-DD` | 标识 / 备案登记 / 拟人化互动或专业领域附加义务 / 行业专项 / 法院或公共法律服务 | 列出本轮实际检查过的 `CAC / CMA / SPC / 司法部 / 12348 / 律协` 等官方页面 | 对每个锚点写 `direct / redirected / timeout / source unavailable`,如使用 fallback 则补记具体 official page;如 `MOJ` 自动抓取失败则写 `source unavailable (automation)` + `manual browser verification` | 至少一句话写清每个锚点是 `no-change` 还是 `change` | `不是` / `是,按 L2` / `是,按 L3` | `不会` / `可能` / `会` | `memo / checklist / matrix / court-facing tracker / 无` | `Research / Legal / Product` | `YYYY-MM-DD` | + +最低填写要求: + +- 如果结论是 `不是 meaningful change`,也必须写明为什么仍是 `L1 no-change` +- 如果任一锚点出现重定向、超时或不可达,必须在“本轮检查过的官方锚点”或“每个锚点的判断”中明确写出 `source unavailable` 与 fallback official page used;如果没有 fallback,同样要写明并把补刷日期放进 `下次动作` +- 如果 `司法部律师工作条线` 或 `司法部公共法律服务条线` 使用了 fallback official page,还应同时记下栏目根路径和实际采用的 `pub/sfbgw/...` 或 `pub/sfbgwapp/...` 页面,避免后续执行人无法复核替代路径是否一致 +- 如果 `MOJ` 根路径和两类 article-page family 都无法在自动抓取中打开,必须明确写出 `source unavailable (automation)`,并把人工浏览器复核写入 `下次动作` +- 最低表格粒度要求是:`source access / fallback used` 这一列不能留空;如果本轮没有 fallback,也要明确写 `direct` 或 `redirected` +- 如果 `MOJ`、`12348` 或 `中华全国律师协会` 官方域名下本轮新增的 AI 相关材料只是地方实践报道、业务进阶文章、行业动态或项目宣传,必须在“每个锚点的判断”里明确写出 `signal-only / L1 no-change`,避免把 official-domain article 误记为全国性正式规则 +- 如果 `MOJ`、`12348` 或 `中华全国律师协会` 的 `signal-only / L1 no-change` 判断来自 official-domain article example,至少要记下一条具体页名或路径,避免只有抽象口径却无法复核 +- 如果结论是 `是,按 L2` 或 `是,按 L3`,必须写明触发点属于: + - 标识 + - 备案 / 登记 + - 拟人化互动 / 专业领域附加义务 + - 行业专项 + - 法院 / 公共法律服务边界 +- 如果决定“不回写任何文档”,也要明确写 `无`,避免事后无法判断是“无需回写”还是“漏写” +- 如果本轮只检查了 `11.1` 以外的新官方源,必须把新源补进 `11.1` 或在本轮记录中说明为什么可以替代既有锚点 diff --git a/research/legal-ai-research-package-baseline-receipt-2026-03-20.md b/research/legal-ai-research-package-baseline-receipt-2026-03-20.md new file mode 100644 index 0000000..a78b356 --- /dev/null +++ b/research/legal-ai-research-package-baseline-receipt-2026-03-20.md @@ -0,0 +1,56 @@ +# 法律 AI 研究包基线凭证 + +日期:`2026-03-20` + +目的:为 `research/` 下的 `legal-ai-research-package-2026-03-20-v1` 建立首个 git-tracked delivery baseline,使后续 dated tracker / memo / matrix 更新可以回溯到一个固定的、可复核的交付起点。 + +## 1. 基线身份 + +- package id:`legal-ai-research-package-2026-03-20-v1` +- canonical version:`v1.0` +- snapshot date:`2026-03-20` +- source of truth:`/home/v-boxiuli/PPT/ArgusBot/research` +- baseline tag target:`legal-ai-research-package-2026-03-20-v1.0-baseline` +- repository branch at lock time:`ppt-feature` +- parent repo commit before baseline lock:`121c29d` +- baseline evidence carrier:`git commit + git tag + 本凭证文件` + +## 2. 锁定时点说明 + +- 下表覆盖的是纳入版本控制时锁定的 `16` 份核心研究文档;连同本凭证文件在内,当前 baseline inventory 为 `17` 份 Markdown 文档。 +- 本凭证文件本身不做自哈希;它的完整性由承载它的 git baseline commit 和 baseline tag 提供。 +- `wc -l research/*.md` 在 baseline lock 后记录为 `17` 份 Markdown、`4263` 总行数。 +- unresolved-marker 扫描在锁定前无命中。 + +## 3. 文件清单与 SHA-256 + +| 文件 | 行数 | SHA-256 | +| --- | ---: | --- | +| `research/README.md` | `98` | `02eeb1bf1c13e2fc9164c52b772abcf3d10da79bcafd9be8f70d444354c0d9a9` | +| `research/china-contract-compliance-copilot-execution-tracker-2026-03-20.md` | `309` | `4c51f4e1fe438074fbf4c4d110e7c6f6a6018a4b73fb41784d089b1e7eab6349` | +| `research/china-contract-compliance-copilot-management-brief-2026-03-20.md` | `132` | `3373aa12ae5921132c75c45c7db313010987d64bce9c0cd194002abd39028ff5` | +| `research/china-contract-compliance-copilot-management-memo-2026-03-20.md` | `190` | `a28cc39f53c13c61ee8e01fcf104187e7d91b2ddaa64ce3d5f3dfc813924a763` | +| `research/china-contract-compliance-copilot-ops-checklist-2026-03-20.md` | `136` | `7ee68ce1217998baed3a8995a136b5c4fa3029576815a447e0b60c84723c7345` | +| `research/china-contract-compliance-copilot-validation-plan-2026-03-20.md` | `343` | `60570b1fb0e2c10038a3f352898fb5d3e3a1dabf862fb4a15f0e15a128b202f7` | +| `research/china-legal-ai-go-no-go-memo-2026-03-20.md` | `435` | `c27881f5ac59ea88b27a19ad6a37bb93fd2cd3c512e7b9435cc1d6ab7551bb21` | +| `research/court-facing-ai-rules-sanction-risk-tracker-2026-03-20.md` | `431` | `1382dbef46eed1fe9e13863489bf2869658ea24d8863d93ccc48ae12d1a76128` | +| `research/hong-kong-legal-ai-go-no-go-memo-2026-03-20.md` | `331` | `71f0f259197e5a9a3774c8b4a514cf16f1c8d5612a908f3c1d0785d970c4ec24` | +| `research/hong-kong-legal-ai-management-brief-2026-03-20.md` | `148` | `99a4dfc98e79186e97768b090d0b9f967f4fd57ac8b663340c16915d03f1cf92` | +| `research/legal-ai-opportunity-risk-matrix-2026-03-20.md` | `211` | `a22d76aa27306afa3a993c23ce9d73553fcf67cfbc62f58d4b87dc998ff128b9` | +| `research/legal-ai-regulatory-monitoring-tracker-2026-03-20.md` | `429` | `b4cfc796cf0efb987ed21c29dcefd3e8fd58584de37def34765ff7c585ad8c05` | +| `research/legal-ai-research-package-register-2026-03-20.md` | `136` | `61705437deed619a9d14e0789f1c28d8ea37f76959de4aacb0252525449b5cad` | +| `research/legal-llm-law-intersections-2026-03-20.md` | `246` | `e2f9599b99c1d124e68a071e1477f2a474e1f8a7a474eb425d25bb6f57889125` | +| `research/singapore-legal-ai-go-no-go-memo-2026-03-20.md` | `296` | `ddded14d34a38a32505e3f3d5bfdd393548ecb45a42c15da2765be4c78d54e75` | +| `research/uk-australia-uae-legal-ai-market-comparison-2026-03-20.md` | `336` | `aca590867c6f4e16993b122ab44d0eef2d648f90b12cf26cd62ed9704736b6e3` | + +## 4. 复核命令 + +下面这些命令可以在后续审计时复核这份基线凭证: + +```bash +find research -maxdepth 1 -name '*.md' ! -name 'legal-ai-research-package-baseline-receipt-2026-03-20.md' -print0 | sort -z | xargs -0 sha256sum +find research -maxdepth 1 -name '*.md' ! -name 'legal-ai-research-package-baseline-receipt-2026-03-20.md' -print0 | sort -z | xargs -0 wc -l +rg -n '' research/*.md +git tag -l 'legal-ai-research-package-2026-03-20-v1.0-baseline' +git show --stat 'legal-ai-research-package-2026-03-20-v1.0-baseline' +``` diff --git a/research/legal-ai-research-package-register-2026-03-20.md b/research/legal-ai-research-package-register-2026-03-20.md new file mode 100644 index 0000000..5121fe5 --- /dev/null +++ b/research/legal-ai-research-package-register-2026-03-20.md @@ -0,0 +1,136 @@ +# 法律 AI 研究包登记表 + +日期:2026-03-20 + +目的:把当前 `research/` 下已完成的法律 AI 研究包登记为**仓库内可审计的运营资产**,明确 source of truth、版本边界、owner-of-record、监测回写路径和首轮落账记录,避免研究包只停留在“已写完”而没有后续运营留痕。 + +适用范围: + +- `research/README.md` +- `research/legal-llm-law-intersections-2026-03-20.md` +- `research/legal-ai-opportunity-risk-matrix-2026-03-20.md` +- `research/legal-ai-regulatory-monitoring-tracker-2026-03-20.md` +- `research/court-facing-ai-rules-sanction-risk-tracker-2026-03-20.md` +- `research/china-legal-ai-go-no-go-memo-2026-03-20.md` +- `research/china-contract-compliance-copilot-management-memo-2026-03-20.md` +- `research/china-contract-compliance-copilot-management-brief-2026-03-20.md` +- `research/china-contract-compliance-copilot-ops-checklist-2026-03-20.md` +- `research/china-contract-compliance-copilot-execution-tracker-2026-03-20.md` +- `research/china-contract-compliance-copilot-validation-plan-2026-03-20.md` +- `research/singapore-legal-ai-go-no-go-memo-2026-03-20.md` +- `research/hong-kong-legal-ai-go-no-go-memo-2026-03-20.md` +- `research/hong-kong-legal-ai-management-brief-2026-03-20.md` +- `research/uk-australia-uae-legal-ai-market-comparison-2026-03-20.md` + +这不是法律意见,而是 `2026-03-20` 的研究资产登记文档。 + +## 1. 包 ID 与 source of truth + +- package id:`legal-ai-research-package-2026-03-20-v1` +- canonical package role:`current legal-AI research main package` +- canonical version:`v1.0` +- source of truth:`/home/v-boxiuli/PPT/ArgusBot/research` +- repository entrypoint:`/home/v-boxiuli/PPT/ArgusBot/research/README.md` +- 当前状态:`Active baseline` +- 版本控制状态:`git-tracked delivery baseline` +- baseline tag:`legal-ai-research-package-2026-03-20-v1.0-baseline` +- baseline receipt:`/home/v-boxiuli/PPT/ArgusBot/research/legal-ai-research-package-baseline-receipt-2026-03-20.md` +- 载体形式:repository-side Markdown knowledge base +- 时间边界:本登记表对应 `2026-03-20` 的研究快照;后续变化必须以 dated tracker 记录为准 + +## 2. 已登记的交付范围 + +按本次登记时点,这个研究包包含下面四层内容: + +- 总览层:法律 LLM / AI 与法律交叉的主结论与后续研究路线 +- 优先级层:机会 / 风险矩阵、法域进入顺序、go / no-go 判断 +- 法域层:中国、新加坡、香港、英格兰及威尔士 / 澳大利亚 / 阿联酋等 memo 与管理 brief +- 运营层:监管监测 tracker、court-facing sanctions tracker、中国合同 / 合规 copilot 的管理、执行与验证材料 + +## 3. Owner-of-record 分配 + +当前上下文未提供具体人员姓名,因此本研究包先按**角色 owner** 登记;后续如有实际姓名,应在不改角色分工逻辑的前提下补到本表或对应 tracker。 + +| 范围 | primary owner of record | secondary owner | 最低留痕要求 | +| --- | --- | --- | --- | +| 研究包总登记与仓库归档 | `Research owner` | `Product owner` | 每月至少一次回看;任一 `L3` 事件 `48` 小时内补登记 | +| `legal-ai-regulatory-monitoring-tracker` | `Research owner` | `Legal / compliance owner` | 每周至少一次 dated update 或 no-change 记录 | +| `court-facing-ai-rules-sanction-risk-tracker` | `Research owner` | `Legal / compliance owner + Product owner` | 每周 court-facing source sweep;`L2 / L3` 事件必须落账 | +| 中国法域 memo / checklist / brief 回写 | `Legal / compliance owner` | `Research owner` | 命中中国 `L2 / L3` 事件时同步回写 | +| 矩阵与总览结论回写 | `Research owner` | `Product owner + Legal / compliance owner` | 只有当 go / no-go、优先级或统一边界变化时才更新 | + +## 3.1 维护节奏基线 + +为避免“知道谁负责,但不知道什么时候必须落账”,本研究包按下面节奏运行: + +| 资产 / 动作 | owner of record | 默认节奏 | 硬触发条件 | 默认落账位置 | +| --- | --- | --- | --- | --- | +| 研究包 register 回看与 change-log 同步 | `Research owner` | 每月一次 | 任一 `L3` 事件 `48` 小时内;任一 owner / cadence / versioning / change-log 规则变化同日补记 | 本文件 section `5` | +| `research/README.md` canonical package 状态回看 | `Research owner` | 每月一次 | package id、canonical role、canonical version 或维护规则发生变化时同日改写 | `research/README.md` + 本文件 | +| `legal-ai-regulatory-monitoring-tracker` 周度 sweep | `Research owner` | 每周至少一次 | 任一中国或跨法域监管 / 标识 / 备案 / 行业专项官方变化 | `legal-ai-regulatory-monitoring-tracker` section `11` / `12`;如命中本表规则变更或 `L2 / L3`,同步本文件 | +| `court-facing-ai-rules-sanction-risk-tracker` 周度 sweep | `Research owner` | 每周至少一次 | 任一 filing / disclosure / evidence / sanction 相关 court-facing 官方变化 | `court-facing-ai-rules-sanction-risk-tracker` section `15` / `17`;如命中本表规则变更或 `L2 / L3`,同步本文件 | +| memo / matrix / synthesis 边界复看 | `Research owner + Legal / compliance owner + Product owner` | 每月一次边界判断 | 任一变化影响 go / no-go、court-facing filing 边界、最低 safeguard 或法域优先级 | 受影响的 memo / matrix / synthesis + 本文件 | + +## 4. 版本规则 + +- `v1.0`:基线登记版本,对应 `2026-03-20` 完整研究包 +- `v1.x`:不改变总包边界的增量更新,例如 `L1` 记录、`L2` 分析补充、owner 调整、source list 补强 +- `v2.0`:改变了 go / no-go、产品边界、court-facing 规则判断、首批法域排序或中国场景统一边界 +- 任何 `L2 / L3` 事件,至少要在相关 tracker 里留下:日期、来源、变化类型、影响判断、owner、截止时间 +- 任何 `L3` 事件,除更新相关 tracker 外,还必须同步回写本登记表 + +## 4.1 Change-log 规则 + +- `L1 no-change` 周度 sweep:至少落账到对应 tracker;如果没有改变 package metadata、owner、cadence、versioning 或模板规则,可以不单独新增 register 行。 +- `L1` 级别的包治理 / 模板 / owner / cadence / source-access 留痕规则补强:同日新增 register 行,因为这类变化会改变后续执行方式。 +- `L2` 事件:同日落账到对应 tracker;如果它改变了默认 checkpoint、判断标准、回写路径或维护规则,也要同日补到本文件 section `5`。 +- `L3` 事件或任何版本升级:必须在同一次更新中同步改 tracker、`research/README.md`、本文件,以及所有受影响的 memo / matrix / synthesis。 +- section `5` 的每一条 change-log 最低要写清:日期、变化类型、记录内容、owner、证据;如果同时改变 version boundary、inventory 或 canonical role,也要在记录内容或证据中明确写出。 + +## 5. 基线落账记录 + +| 日期 | 类型 | 记录内容 | owner | 证据 | +| --- | --- | --- | --- | --- | +| `2026-03-20` | knowledge-base entrypoint | 在 `research/` 下建立 package index,把主综述、矩阵、register、tracker 和中国执行层材料整理成正式仓库入口 | `Research owner` | `research/README.md` + 本文件 | +| `2026-03-20` | canonical main-package designation | 将该研究包明确标记为当前 legal-AI 研究主包,并把 canonical role / version 写入 README 与 register,避免后续并行 memo 被误当成主包 | `Research owner` | `research/README.md` + 本文件 | +| `2026-03-20` | baseline registration | 将已完成的 `research/` 研究包登记为 repo-side operating asset;明确 tracker owner-of-record;要求后续变化先回写 tracker,再决定是否改 memo / matrix / 总览 | `Research owner` | 本文件 + `legal-ai-regulatory-monitoring-tracker` + `court-facing-ai-rules-sanction-risk-tracker` | +| `2026-03-20` | first dated monitoring update | 中国治理 / 标识 / 备案 / 行业专项与 court-facing 基线刷源已落账到总 tracker;court-facing 风险基线已落账到专项 tracker | `Research owner + Legal / compliance owner` | `legal-ai-regulatory-monitoring-tracker` section 11 + `court-facing-ai-rules-sanction-risk-tracker` section 15 | +| `2026-03-20` | audit-trail hardening | 为仓库入口补充 `Audit Trail` 导航;将中国重点源清单与首个 dated update 的最小官方锚点对齐,并要求后续中国 `L1 / L2 / L3` 更新至少回链到 `section 11.1` 或新增同级别官方源 | `Research owner` | `research/README.md` + `legal-ai-regulatory-monitoring-tracker` section 3 + section 11.1 | +| `2026-03-20` | China anchor-role split hardening | 将中国监测中的 `备案 / 登记` 锚点与 `标识执法` 锚点明确拆分:`2026-01-09` `CAC` 公告作为备案 / 登记锚点,`2025-11-25` `CAC` 通报作为标识执法锚点,并同步写入 dated update 与下次 sweep 的最低留痕包 | `Research owner` | `legal-ai-regulatory-monitoring-tracker` section 3 + section 11 + section 11.1 + section 12.1 | +| `2026-03-20` | China meaningful-change rubric hardening | 为中国周度 sweep 增加明确的 meaningful-change 判定规则,要求后续标识、备案 / 登记、行业专项和法院 / 公共法律服务变化都先按 `section 12.2` 判断是否真的触发 `L2 / L3` 或文档回写,而不是只按“有新材料”升级 | `Research owner` | `legal-ai-regulatory-monitoring-tracker` section 12.1 + section 12.2 | +| `2026-03-20` | README monitoring-rule surfacing | 将中国周度 sweep 的 `section 12.1 / 12.2` 规则显式挂到仓库入口,避免操作方从 `README` 进入时遗漏“先分 no-change 与 meaningful change,再决定是否改 memo / matrix”的维护顺序 | `Research owner` | `research/README.md` + `legal-ai-regulatory-monitoring-tracker` section 12.1 + section 12.2 | +| `2026-03-20` | China sweep output-template hardening | 为中国周度 sweep 增加统一输出模板,并同步把 `README` 的操作规则升级到 `section 12.1 / 12.2 / 12.3`,避免不同执行人留下的 dated update 颗粒度不一致 | `Research owner` | `legal-ai-regulatory-monitoring-tracker` section 12.3 + `research/README.md` | +| `2026-03-20` | README court-facing checkpoint surfacing | 将 `2026-03-25` 的英格兰及威尔士 `CJC` court-facing `L2` 影响分析节点和 court-facing sweep 的默认入口规则显式挂到 `README`,避免最近的 court-facing 截止日期只留在专项 tracker 内部 | `Research owner` | `research/README.md` + `court-facing-ai-rules-sanction-risk-tracker` section 17 | +| `2026-03-20` | court-facing sweep hardening | 为 court-facing 周度 sweep 增加最低留痕包、meaningful-change 判定规则和统一输出模板,并同步把 `README` 的默认操作路径升级到 `section 17 / 17.1 / 17.2 / 17.3` | `Research owner` | `court-facing-ai-rules-sanction-risk-tracker` section 17.1 + section 17.2 + section 17.3 + `research/README.md` | +| `2026-03-20` | README China second-wave checkpoint surfacing | 将监管 tracker 里已排队的 `2026-04-03` 中国行业专项规则与 court-facing / public-legal-service 规则复核节点显式挂到 `README`,避免第二波中国检查点只留在专项 tracker 内部 | `Research owner` | `research/README.md` + `legal-ai-regulatory-monitoring-tracker` section 12 | +| `2026-03-20` | first-sweep evidence-pack backfill | 补齐中国与 court-facing 首轮 `2026-03-20` sweep 的逐锚点 `no-change / change` 判断,使首轮基线记录本身也满足 `section 12.1 / 12.3` 与 `section 17.1 / 17.3` 的最低留痕格式要求,避免“后续周更规则更严格、首轮基线却只有摘要”的审计断层 | `Research owner` | `legal-ai-regulatory-monitoring-tracker` section 11.2 + `court-facing-ai-rules-sanction-risk-tracker` section 15.1 | +| `2026-03-20` | China anthropomorphic-interaction anchor hardening | 将 `2025-12-27` `CAC` 《人工智能拟人化互动服务管理暂行办法(征求意见稿)》补入中国监测源、dated update、逐锚点 evidence pack 与开放队列,并把该节点显式挂到 `README`;原因是其已出现面向拟人化互动服务的单独治理框架,并明确卫生健康、金融、法律等专业领域服务需同时符合主管部门规定 | `Research owner` | `legal-ai-regulatory-monitoring-tracker` section 3 + section 11 + section 11.1 + section 11.2 + section 12 + `research/README.md` | +| `2026-03-20` | China legal-sector follow-on source hardening | 将 `司法部律师工作条线`、`司法部公共法律服务条线`、`12348` 与 `中华全国律师协会` 明确补入中国监测源、`2026-03-27` 开放队列和 evidence-pack / meaningful-change 规则,使 `CAC` 专业领域条款落到法律服务场景时有清晰的主管部门 follow-on 刷源路径,而不是只盯 `CAC` 本身 | `Research owner` | `legal-ai-regulatory-monitoring-tracker` section 3 + section 12 + section 12.1 + section 12.2 + section 12.3 + `research/README.md` | +| `2026-03-20` | China source-availability logging hardening | 为中国周度 sweep 增加 `source unavailable / fallback official page used` 规则;原因是本轮对 `司法部` 两个法律服务栏目根路径的直接抓取都出现自循环重定向,而 `12348` 与 `中华全国律师协会` 仍可直接访问,后续必须把“源不可达”与“无变化”区分记录 | `Research owner` | `legal-ai-regulatory-monitoring-tracker` section 12 + `research/README.md` | +| `2026-03-20` | China `MOJ` fallback-path mapping hardening | 将 `司法部律师工作条线` 与 `司法部公共法律服务条线` 的默认 fallback official page 路径进一步具体化为同域 `pub/sfbgw/...` article-page family,并要求后续 dated sweep 同时记录栏目根路径与实际使用的 fallback 页面,避免 `source unavailable` 规则仍然过于抽象 | `Research owner` | `legal-ai-regulatory-monitoring-tracker` section 3 + section 12.1 + section 12.3 + `research/README.md` | +| `2026-03-20` | China `MOJ` automation-failure escalation hardening | 将 `MOJ` fallback order 继续细化为 `栏目根路径 -> pub/sfbgw/... -> pub/sfbgwapp/... -> manual browser verification`,并要求后续 dated sweep 在自动抓取连续失败时明确记为 `source unavailable (automation)`,避免把抓取环境问题误当成法规无变化 | `Research owner` | `legal-ai-regulatory-monitoring-tracker` section 3 + section 12.1 + section 12.2 + section 12.3 + `research/README.md` | +| `2026-03-20` | China source-access column hardening | 将中国周度 sweep 的标准输出模板补齐 `source access / fallback used` 专门列,使 `MOJ` 的 `direct / redirected / timeout / source unavailable / source unavailable (automation)` 与具体 fallback page 能直接落在 dated event 表格里,而不是只留在文字说明里 | `Research owner` | `legal-ai-regulatory-monitoring-tracker` section 12.3 + `research/README.md` | +| `2026-03-20` | first-sweep source-access baseline backfill | 将首轮 `2026-03-20` 中国逐锚点表格补齐 `source access / fallback used` 列,使首轮 dated baseline 与 section `12.3` 的当前输出模板对齐,避免“后续周更有该字段、首轮基线却缺列”的审计断层 | `Research owner` | `legal-ai-regulatory-monitoring-tracker` section 11.2 + `research/README.md` | +| `2026-03-20` | court-facing source-access hardening | 为 court-facing tracker 增加 `source access / fallback used` 列和最小填写规则,并把首轮 `2026-03-20` court-facing 逐锚点表按当前官方访问状态回填为 `direct`、`source unavailable` 或 `source unavailable (automation)`,避免 court-facing 基线仍落后于中国 tracker 的审计颗粒度 | `Research owner` | `court-facing-ai-rules-sanction-risk-tracker` section 15.1 + section 17.1 + section 17.2 + section 17.3 + `research/README.md` | +| `2026-03-20` | dated-summary source-access rollup hardening | 将监管 tracker section `11` 与 court-facing tracker section `15` 的摘要级 dated 表格也补齐 source-access 汇总列,并把中国 court-facing 行的 `court.gov.cn` 自动抓取状态与专项 tracker 对齐为 `source unavailable (automation)`;这样摘要层与逐锚点 evidence pack 不再出现访问状态口径不一致 | `Research owner` | `legal-ai-regulatory-monitoring-tracker` section 11 + section 11.2 + `court-facing-ai-rules-sanction-risk-tracker` section 15 + section 15.1 + `research/README.md` | +| `2026-03-20` | package-governance rule hardening | 将 owner、maintenance cadence、versioning 与 change-log 规则进一步显式化:新增 section `3.1` 和 section `4.1`,把 register / README / tracker / memo 的默认节奏、何时必须进 register,以及何时需要同次更新完成 version-boundary 同步写清,避免后续维护仍靠隐含约定 | `Research owner` | 本文件 section `3.1` + section `4.1` + `research/README.md` | +| `2026-03-20` | England-and-Wales `CJC` `L2` impact-analysis backwrite | 提前完成原排队到 `2026-03-25` 的 `CJC` court-facing `L2` 影响分析;结论是当前 consultation 已足以上调英格兰及威尔士 evidence / disclosure 风险描述的精度,但仍不足以改写统一 `NO-GO / no auto-file / no court-ready final output` 边界,因此回写 court-facing tracker、总监管 tracker、`uk-australia-uae` memo,并关闭 README / tracker 中过期的 `2026-03-25` 开放动作 | `Research owner + Legal / compliance owner` | `court-facing-ai-rules-sanction-risk-tracker` section `15.2` + `legal-ai-regulatory-monitoring-tracker` section `6.1` + `uk-australia-uae-legal-ai-market-comparison` section `3.1.1` + `research/README.md` | +| `2026-03-20` | China legal-sector signal-only filter hardening | 将中国拟人化互动 / 专业领域附加义务的 meaningful-change 规则进一步细化:如果 `MOJ`、`12348` 或 `中华全国律师协会` 官方域名下出现的 AI 相关材料只是地方实践报道、业务进阶文章、行业动态或项目宣传,而不是全国性正式规则 / 指引 / 纪律规范,默认按 `L1 no-change` 的 signal-only 处理,不单独据此改写统一边界;同时把这一执行规则挂到 `README`,并补到中国周度标准输出模板的最低填写要求里 | `Research owner` | `legal-ai-regulatory-monitoring-tracker` section `12.2` + section `12.3` + `research/README.md` | +| `2026-03-20` | England-and-Wales `CJC` status-only recheck rule hardening | 基于同日对 `CJC current-work`、`latest-news` 与 consultation / interim material 的 live recheck,补充一条执行规则:在 `2026-04-14 23:59` closing-time 之前,如果官方材料只是继续确认 consultation 仍 open 且 evidence-stage proposal 没有实质推进,默认只记为 tracker-level `L1 no-change` status verification,不重复开启新的 `L2` 分析或 memo 回写;只有 official material、closing-time、final output 或 evidence-stage judgement 实质变化时才重新升级 | `Research owner` | `court-facing-ai-rules-sanction-risk-tracker` section `17.2` + `legal-ai-regulatory-monitoring-tracker` section `6.1` + `research/README.md` | +| `2026-03-20` | England-and-Wales `CJC` live source-path refresh | 将包内仍指向旧 `/2026/02/` interim-report PDF 的英格兰及威尔士 `CJC` 源清单,统一刷新为当前 official current-work page 暴露的 `/2026/03/` consultation PDF 路径;这次更新只修正 source anchor,不改变既有 `L2` 结论或统一边界 | `Research owner` | `legal-ai-regulatory-monitoring-tracker` source list + `court-facing-ai-rules-sanction-risk-tracker` source list + `uk-australia-uae-legal-ai-market-comparison` source list | +| `2026-03-20` | England-and-Wales `CJC` no-change verification backwrite | 将同日 live recheck 的 no-change 结论回写到英格兰及威尔士 `CJC` 的三处分析文本:official materials 仍确认 consultation open through `2026-04-14 23:59`,而 `/2026/03/` PDF path refresh 只属于 source-anchor 更新,不改变既有 `L2` judgement、`P0` 排序或统一 `NO-GO` 边界 | `Research owner` | `legal-ai-regulatory-monitoring-tracker` section `6.1` + `court-facing-ai-rules-sanction-risk-tracker` section `15.2` + `uk-australia-uae-legal-ai-market-comparison` section `3.1.1` | +| `2026-03-20` | China legal-sector signal-only example-anchor hardening | 将中国拟人化互动 / 专业领域附加义务的 dated baseline 补上同日 official-domain example page anchors,使 `MOJ` / `中华全国律师协会` 的 `signal-only / L1 no-change` 判断不只停留在抽象口径;同时把“如按 official-domain article 判为 signal-only,至少记录一条具体页名或路径”的要求写入中国 tracker 模板与 `README`,减少后续审计争议 | `Research owner` | `legal-ai-regulatory-monitoring-tracker` section `11` + section `11.1` + section `11.2` + section `12.3` + `research/README.md` | +| `2026-03-20` | China `ACLA` signal-only example refresh | 将中国 dated baseline 中的 `中华全国律师协会` `signal-only` example anchor 从泛合规文章刷新为更直接的 AI/律师业务 `业务进阶` 页面,以便 future sweep 在说明“official-domain article != national formal guidance”时使用更贴近主题的 same-day example,而不改变既有 `L1 no-change` / `signal-only` 判断 | `Research owner` | `legal-ai-regulatory-monitoring-tracker` section `11.1` + section `11.2` | +| `2026-03-20` | version-control baseline lock | 将 `research/` 下 `2026-03-20` 法律 AI 研究包纳入 git 版本控制;新增 baseline receipt,明确 baseline tag `legal-ai-research-package-2026-03-20-v1.0-baseline`,使后续 dated tracker / memo / matrix 更新可以回溯到固定的 repository baseline | `Research owner` | `research/README.md` + `research/legal-ai-research-package-baseline-receipt-2026-03-20.md` + `git tag -l` / `git show` | +| `2026-03-20` | inventory snapshot | 记录当前 repo-side 研究包指纹:`17` 个 Markdown 文件、`4263` 总行数,且开放工作标记扫描无命中;其中新增 `legal-ai-research-package-baseline-receipt-2026-03-20.md` 作为基线凭证文件 | `Research owner` | `wc -l research/*.md` + unresolved-marker sweep recorded in execution log | + +## 6. 最低运行规则 + +- 每周至少在一个 tracker 中新增一条 dated no-change 或 change event,避免“只建 tracker 不落账”。 +- 中国法域发生标识、备案 / 登记、行业专项规则或 court-facing 规则变化时,优先更新总 tracker,再决定是否回写中国 memo 家族。 +- court-facing 相关 `L2 / L3` 事件,必须同时考虑是否回写 `legal-ai-opportunity-risk-matrix` 和法域比较 memo。 +- 每月底至少做一次“总包是否仍保持 `v1.x`”判断;如果结论边界发生变化,就升级主版本而不是只补零散说明。 +- 如需把其他 dated 包替换为新的 canonical main package,必须在同一次更新中同时改: + - `research/README.md` 的 canonical role / version + - 本登记表的 canonical package role / version + - 本登记表新增一条 dated 变更落账记录 diff --git a/research/legal-llm-law-intersections-2026-03-20.md b/research/legal-llm-law-intersections-2026-03-20.md new file mode 100644 index 0000000..ba75e0c --- /dev/null +++ b/research/legal-llm-law-intersections-2026-03-20.md @@ -0,0 +1,246 @@ +# 法律 LLM / AI 与法律交叉调研 + +日期:2026-03-20 + +## 一句话结论 + +法律 LLM / AI 和法律的交叉,已经不只是“用 AI 做法律问答”这么简单,而是同时落在四条线: + +1. AI 本身被法律规制 +2. AI 被律师、法务、法院、监管机构当作工作工具 +3. AI 引发新的责任、证据、版权、隐私与职业伦理问题 +4. 法律行业反过来推动“可引用、可审计、可追责”的 LLM 技术路线 + +## 1. 现在这个领域主要怎么和法律交叉 + +### 1. AI 作为“被监管对象” + +- 欧盟 AI Act 已进入分阶段适用期,通用模型(GPAI)义务、AI literacy、禁用场景、高风险系统等要求都在逐步落地。 +- 中国已经形成“生成式服务 + 深度合成 + 内容标识 + 数据/网安/个保法衔接”的治理框架。 +- 美国没有统一联邦 AI 综合法,但在版权、法院规则、行业伦理、风险管理框架上推进很快。 + +这意味着:做法律 LLM,不只是做模型能力,还要做合规能力。 + +### 2. AI 作为“法律服务工具” + +目前最成熟的落地方向不是“完全自动给法律意见”,而是辅助型工作流: + +- 法律检索 +- 合同审查与条款比对 +- 尽调 +- e-discovery / 文档审阅 +- 诉状、备忘录、合规文档起草 +- 法规/案例摘要与问答 +- 法官画像、诉讼分析、结果预测等 analytics + +也就是说,法律行业对 LLM 的真实需求,本质上是“高成本文本劳动自动化 + 可核验结论输出”。 + +### 3. AI 作为“职业伦理与程序法风险源” + +LLM 一旦进入律师、法院或法务流程,就会直接撞上法律职业义务: + +- competence:律师是否理解工具能力和边界 +- confidentiality:客户信息能不能输入模型,输入后会不会被保留、再训练或泄露 +- candor to tribunal:向法院提交的内容是否真实、是否含虚构案例 +- supervision:律所如何管理律师、助理和外部 AI 供应商 +- reasonable fees:AI 提效后如何收费才合理 + +这也是为什么法律行业比很多别的行业更强调“human in the loop”。 + +### 4. AI 作为“知识产权与数据治理问题制造者” + +这个方向目前至少有四个高频法问题: + +- 训练数据是否涉及版权与许可 +- 输出内容是否可版权化 +- 生成内容是否需要标识和溯源 +- 涉及个人信息、商业秘密、律师保密义务时如何处理 + +对法律行业来说,这些不是外围问题,而是产品能否上线、能否进入律所/法务采购名单的核心问题。 + +### 5. AI 进入法院与公共法律服务 + +法院和司法机构并不是只在“审查 AI”,也开始直接管理或吸收 AI: + +- 一些美国法院已经对生成式 AI 参与诉讼材料提出专门规则或认证要求 +- 中国在政务与治理场景中已开始推动大模型部署应用,法律服务和治理辅助是典型落地场景之一 + +所以,这个赛道并非只属于 law firm tech,也在延伸到 court tech、gov tech、reg tech。 + +## 2. 当前最值得关注的五个子方向 + +### 1. Legal RAG + +相比直接让模型“裸答”,法律场景更强调: + +- 基于法规、判例、合同库检索 +- 输出时附引用 +- 控制上下文污染 +- 保证版本与时效 + +这条路线很可能长期比“纯大模型记忆法律知识”更稳。 + +### 2. Citation-safe drafting + +法律 LLM 不是只要会写,而是要: + +- 引用不造假 +- 引用可追溯 +- 论证链清晰 +- 能够被律师/法官快速复核 + +以后法律 AI 产品的分水岭,很可能就在“能不能安全进入提交法院/监管机关的文档链路”。 + +### 3. Contract / compliance copilot + +企业更愿意为这些场景付费: + +- 合同红线比对 +- 监管义务抽取 +- 内部政策与外部法规映射 +- 跨法域合规检查 + +因为这里 ROI 更明确,也更适合“先辅助、后半自动化”。 + +### 4. 面向中文法域的 legal LLM + +中文法律场景并不只是把英文 legal LLM 翻译过来: + +- 法条结构 +- 司法解释与指导案例体系 +- 裁判文书表达 +- 行政监管文本 +- 地方法规与部门规章 + +都要求单独的数据、评测和工作流设计。 + +### 5. AI governance / audit / provenance + +未来法律 AI 不是只有“回答对不对”,还会越来越看重: + +- 审计日志 +- 模型/版本管理 +- 数据来源 +- 风险分级 +- 输出标识 +- 人工复核责任链 + +这部分实际上就是法律、合规、平台治理和工程系统设计的交叉点。 + +## 3. 这个方向最难的点 + +### 1. 法律不是通用问答 + +- 强 jurisdiction dependence +- 强时效性 +- 强程序性 +- 强格式约束 + +同一个问题,在不同法域、不同时间点、不同程序阶段,答案都可能变。 + +### 2. Benchmark 好,不等于能上生产 + +公开 benchmark 往往能说明“模型懂不懂法律文本”,但不等于能直接用于: + +- 对客户出正式建议 +- 向法院提交材料 +- 做监管申报 +- 处理机密文件 + +真实落地难点在 workflow,不只在 base model。 + +### 3. 幻觉在法律场景代价极高 + +普通行业幻觉可能只是“答错”; +法律场景幻觉可能变成: + +- 错误诉讼策略 +- 虚构案例引用 +- 错误合规建议 +- 客户损失 +- 律师职业责任风险 + +### 4. 数据与保密边界很难处理 + +法律行业的高价值数据通常也是最敏感的数据: + +- 客户案卷 +- 未公开交易文件 +- 内部调查材料 +- 个人敏感信息 +- 特权 / 保密通信 + +因此很多 legal AI 产品的核心竞争力其实是部署方式、权限控制、日志和不留痕能力。 + +## 4. 如果你要继续深挖,可以沿这三条线做 + +### 路线 A:研究型 + +- 做中文 legal benchmark +- 做 Legal RAG 评测 +- 研究法律推理链和可解释性 +- 研究“法规更新 -> 知识库更新 -> 输出纠偏”机制 + +### 路线 B:产品型 + +- 面向律所的 citation-safe drafting +- 面向法务的合同/合规 copilot +- 面向公共法律服务的法规问答助手 +- 面向法院/监管机构的文档辅助与结构化审查 + +### 路线 C:治理/合规型 + +- AI 使用政策 +- law firm / in-house AI governance checklist +- 供应商尽调模板 +- 敏感数据进入模型前的分类与脱敏机制 +- AI 输出标识、审计和责任分配 + +## 5. 我对这个领域的判断 + +如果你问“法律 LLM / AI 现在最重要的交叉点是什么”,我的判断是: + +不是“模型会不会答法律题”,而是下面这三个问题: + +1. 能不能把法律知识检索、引用、更新和审计做成工程系统 +2. 能不能把律师伦理、法院要求、版权/隐私/标识义务做进产品设计 +3. 能不能在高风险法律场景里,把 AI 从“玩具”变成“可签字、可采购、可追责”的基础设施 + +## 6. 本次调研用到的一手/核心参考 + +### 官方/监管 + +- 欧盟委员会 AI Act 页面与 FAQ +- 美国律师协会(ABA)Formal Opinion 512 +- NIST AI Risk Management Framework 1.0 +- 美国版权局 Copyright and Artificial Intelligence 项目与分册报告 +- 中国《生成式人工智能服务管理暂行办法》 +- 中国《人工智能生成合成内容标识办法》及配套标准解读 + +### 研究/技术 + +- LegalBench +- LegalBench-RAG +- LawBench +- InternLM-Law +- 2025 年 legal LLM survey(用于看模型、框架、数据、benchmark 全景) + +### 可直接继续打开的链接 + +- 欧盟委员会 AI Act FAQ: https://digital-strategy.ec.europa.eu/en/faqs/navigating-ai-act +- 欧盟委员会 GPAI obligations: https://digital-strategy.ec.europa.eu/en/faqs/guidelines-obligations-general-purpose-ai-providers +- ABA Formal Opinion 512: https://www.americanbar.org/content/dam/aba/administrative/professional_responsibility/ethics-opinions/aba-formal-opinion-512.pdf +- NIST AI RMF: https://www.nist.gov/itl/ai-risk-management-framework +- U.S. Copyright Office AI project: https://www.copyright.gov/ai/ +- 中国《生成式人工智能服务管理暂行办法》: https://www.miit.gov.cn/zcfg/qtl/art/2023/art_f4e8f71ae1dc43b0980b962907b7738f.html +- 中国《人工智能生成合成内容标识办法》: https://www.gov.cn/zhengce/zhengceku/202503/content_7014286.htm +- LegalBench: https://arxiv.org/abs/2308.11462 +- LegalBench-RAG: https://arxiv.org/abs/2408.10343 +- InternLM-Law: https://arxiv.org/abs/2406.14887 + +## 7. 适合下一步继续做的输出 + +- 做一版“法律 LLM / AI 赛道图谱”PPT +- 做一版“中美欧法律 AI 监管对比表” +- 做一版“给律所/法务团队的 AI 使用清单” +- 做一版“中文法律 RAG 产品架构草图” diff --git a/research/singapore-legal-ai-go-no-go-memo-2026-03-20.md b/research/singapore-legal-ai-go-no-go-memo-2026-03-20.md new file mode 100644 index 0000000..428e6c6 --- /dev/null +++ b/research/singapore-legal-ai-go-no-go-memo-2026-03-20.md @@ -0,0 +1,296 @@ +# 新加坡法域法律 AI go / no-go 决策备忘录 + +日期:2026-03-20 + +目的:基于现有《法律 LLM / AI 与法律交叉调研》和《法律 AI 机会 / 风险矩阵(可执行版)》,在中国之后补一份更适合亚洲扩张判断的法域备忘录。本备忘录对比本轮额外检查的**新加坡**与**香港**官方材料,选定一个更适合优先进入的目标司法辖区,并对 2 个高价值法律 AI 场景给出 go / no-go 判断、目标用户、服务模式、合规边界、必要 safeguard、证据缺口和进入优先级。 + +结论先行: + +- 本轮对比的潜在扩张法域:**新加坡**、**香港** +- 本轮选定的目标司法辖区:**新加坡** +- 评估场景 1:**Singapore-law Legal RAG / citation-safe drafting** +- 评估场景 2:**企业合同 / matter workflow copilot(先做企业内法务 / 合规团队)** +- 总体建议: + - 场景 1:**GO(P0)** + - 场景 2:**CONDITIONAL GO(P1)** + - 当前明确不建议优先切入:**面向公众的开放式法律意见 bot**、**直接进入法院提交链路的自动化工具** + +这不是法律意见,而是 `2026-03-20` 的产品、合规和市场进入研究快照。 + +## 1. 为什么这轮先选新加坡,而不是香港 + +本轮两地都值得继续跟,但如果只能先选一个作为中国之后的近邻扩张法域,我会先选**新加坡**。 + +| 法域 | 本轮额外检查到的官方 / 行业材料 | 更适合的第一种服务模式 | 当前判断 | +| --- | --- | --- | --- | +| 新加坡 | Ministry of Law 最终版《Guide for Using Generative AI in the Legal Sector》、Singapore Courts《Guide on the Use of Generative AI Tools by Court Users》、LawNet AI / GPT-Legal、LTP + Copilot for SG law firms | 专业人士内部工具、source-grounded research / drafting、受控合同 / workflow copilot | **先进入** | +| 香港 | PCPD《Model Personal Data Protection Framework》、PCPD 员工使用 GenAI 指南、Judiciary 内部生成式 AI 指南 | 企业内、可审计、强权限控制的文档 / 合规工具 | **下一站候选法域** | + +为什么先做新加坡: + +- 我本轮看到的新加坡官方材料更直接覆盖了**法律行业使用 GenAI**、**法院用户如何使用 GenAI**、以及**法律工作流数字化基础设施**三层。 +- 这意味着产品更容易被定义成“专业人士内部工作流工具”,而不是模糊的公开 AI 助手。 +- 香港并不是不值得做,而是我本轮读到的官方材料更偏**AI / 数据治理框架**和**员工使用规范**,更适合后续作为第二阶段扩张法域,特别是双语文档处理、跨境合规和仲裁周边场景。 + +## 2. 目标用户与服务模式 + +### 场景 1:Singapore-law Legal RAG / citation-safe drafting + +目标用户: + +- 新加坡律所的研究 / 起草团队 +- 企业法务研究和知识管理团队 +- 区域总部的法务 / 合规团队 + +建议服务模式: + +- **专业人士内部工具**,而不是面向公众开放 +- 优先 `tenant-isolated SaaS`、专属实例或 VPC +- 默认“先检索、再生成”,每次输出都带**来源、日期、法域和引用** +- 只用于研究、备忘录草稿、内部问答和监管变化跟踪 + +### 场景 2:企业合同 / matter workflow copilot + +目标用户: + +- 新加坡区域总部的企业法务 / 合规团队 +- 中大型企业的合同管理和审批团队 +- 第二阶段再考虑小中型律所 + +建议服务模式: + +- **企业内 workflow copilot** +- 嵌入 CLM、DMS、matter 管理或审批系统 +- 优先 firm / company-approved secure tool,而不是开放式公众模型 +- 先做 intake、条款比对、风险提示、审批前预筛查,不做自动批准 + +## 3. 决策摘要 + +| 场景 | 决策 | 进入优先级 | 推荐服务模式 | 当前不建议的做法 | +| --- | --- | --- | --- | --- | +| Singapore-law Legal RAG / citation-safe drafting | GO | P0 | 专业人士内部 research / drafting assistant,必须回链来源并保留人工复核 | 无来源裸答;对公众直接输出个案化法律意见;未经律师 / 法务核验直接用于法院或监管提交 | +| 企业合同 / matter workflow copilot | CONDITIONAL GO | P1 | 企业内、受控、可审计的 workflow copilot,先接企业法务 / 合规团队 | 自动批准合同;跨客户 / 跨 matter 混用数据;用公众模型直接处理 confidential / highly confidential 信息 | + +## 4. 场景 1:Singapore-law Legal RAG / citation-safe drafting + +### 4.1 为什么是 GO + +这是新加坡最适合作为第一落地场景的原因: + +- 新加坡 Ministry of Law 的最终版法律行业 GenAI 指南,已经把**legal research、document drafting、contract review**等任务放入明确的法律行业使用场景。 +- Singapore Courts 已明确给法院用户一套可执行的使用边界:并不禁止使用 GenAI,但使用者仍然要对事实、引文、法律依据和提交内容负责。 +- LawNet AI / GPT-Legal 说明新加坡本地法律检索和 source-grounded AI 已经进入真实法律工作流,市场教育成本比很多法域更低。 + +### 4.2 合规边界 + +可以做: + +- 新加坡法律、判例、法规和监管材料的检索增强问答 +- research memo 草稿 +- 内部监管监测和变化摘要 +- 引用插入、出处回链、版本和日期标注 + +不要做: + +- 面向公众提供开放式、结论型、个案化法律意见 +- 输出无来源裸答 +- 未经律师 / 法务审查直接进入法院或监管提交链路 +- 把生成结果包装成“权威结论”而不暴露检索来源 + +### 4.3 必要 safeguard + +- 检索和引用: + - 只从可授权、可追溯的新加坡法律语料中检索 + - 默认显示来源、日期、法域和 citation + - 校验引用是否真实存在,并尽可能检查相关判例是否仍是 good law +- 人工复核: + - 研究 memo、客户交付件、法院材料一律人工签发 + - 输出标记为 draft / internal use unless reviewed +- 供应商与数据: + - 评估 retention、training、sub-processor 和删除策略 + - confidential / highly confidential 数据只进入 approved secure tools +- 过程与留痕: + - 查询、检索结果、模型版本和审批结果都留日志 + - 可回放关键草稿的来源与修改历史 + +### 4.4 证据缺口 + +- 需要一套**新加坡法域专用 benchmark**: + - 判例和法规定位 + - 引用准确率 + - 过期 / 失效材料识别 + - jurisdiction mix-up 误差 +- 需要明确语料方案: + - 是直接接入已授权法律数据库,还是自建受许可语料层 + - 覆盖 primary law、subordinate legislation、监管材料和实践指引到什么程度 +- 需要 pilot 证据: + - 用户是否真的点击来源 + - 引用展示是否足够快 + - 研究岗和起草岗是否愿意把它放进日常工作流 + +### 4.5 go / no-go 触发条件 + +保持 GO 的条件: + +- 能稳定回链到新加坡权威来源 +- 有 citation verifier、日期过滤和法域过滤 +- 高风险输出必须进入律师 / 法务复核 +- confidential / highly confidential 数据不落到未批准工具 + +转为 NO-GO 的条件: + +- 只能做无来源裸答 +- 无法说明知识来源是否可授权、可更新、可追溯 +- 产品被定位成公众法律意见机器人 +- 计划直接进入法院提交链路而不做逐项核验 + +## 5. 场景 2:企业合同 / matter workflow copilot + +### 5.1 为什么是 CONDITIONAL GO + +这个场景价值很高,但比场景 1 更依赖部署、集成和数据治理,所以我把它放在 `P1`。 + +原因: + +- 新加坡法律行业 GenAI 指南已经把 practice management、matter / case management、contract lifecycle management、document review、contract analysis / review 列为明确使用场景。 +- 但一旦进入合同和 matter 工作流,数据敏感度、客户保密、审批责任和供应商尽调要求都会显著上升。 +- Ministry of Law 指南对 confidential / highly confidential 数据的工具边界更明确,这使得“能不能做 secure deployment”变成 go / no-go 的关键。 + +### 5.2 合规边界 + +可以做: + +- 合同 intake 和 triage +- 条款提取、模板比对、风险提示 +- matter 级任务分派、状态跟踪和审批前预筛查 +- 内部政策与合同条款的一致性检查 + +不要做: + +- 自动批准合同 +- 自动替代律师 / 法务做最终签发 +- 用公众模型直接处理 confidential / highly confidential 信息 +- 跨客户、跨 matter 混合上下文 + +### 5.3 必要 safeguard + +- 部署: + - confidential 数据仅使用 firm / company-approved secure GenAI tools + - 高敏感场景优先专属实例、VPC 或本地受控环境 +- 权限与隔离: + - matter / 客户 / 业务单元级权限隔离 + - least-privilege access + - information barrier 和 DLP +- 输出与流程: + - 所有风险提示都可回链到条款、模板、政策或规则来源 + - 版本历史、审批日志和 reviewer identity 全量留存 + - 高风险合同必须升级到人工复核 +- 供应商治理: + - 合同明确 retention、training、删除、事故通报和子处理商 + - 评估模型对 confidential / highly confidential 数据的适用边界 + +### 5.4 证据缺口 + +- 需要客户级 / matter 级隔离是否可落地的技术验证 +- 需要至少一套面向新加坡和区域总部场景的合同评测集: + - 条款提取 + - deviation detection + - fallback clause suggestion + - 升级命中率 +- 需要产品侧证据: + - 审批流怎么接 + - review queue 怎么设计 + - 人工是否真的愿意采纳风险提示 +- 需要法律和采购侧证据: + - 客户 / 内部业务方是否接受 GenAI 参与合同流程 + - 是否需要 client opt-out 或额外披露 + +### 5.5 go / no-go 触发条件 + +保持 CONDITIONAL GO 并进入 pilot 的条件: + +- 能做到 approved secure tool + 数据隔离 + 审批流接入 +- 风险提示可以被人工复核和回退 +- 供应商可明确承诺 retention / training / deletion 边界 + +转为 NO-GO 的条件: + +- 客户或业务方要求“自动批准” +- 无法证明 matter / 客户隔离 +- 供应商默认保留或再训练 confidential 数据 +- 无法保留版本历史、审批日志和 reviewer 痕迹 + +## 6. 新加坡法域下的统一 no-go 边界 + +在当前阶段,不建议优先做: + +- 面向公众的开放式法律意见 bot +- 直接进入法院提交链路的自动化工具 +- 无来源、无引用、无人工签发的法律起草工具 +- 默认把客户 / matter 数据送入未批准的公众模型 + +原因不是“新加坡不能做 legal AI”,而是: + +- 新加坡官方材料已经把责任明确压回使用者和专业人士 +- 这使得“source-grounded + human-reviewed + secure deployment”成为更自然的第一进入方式 +- 任何跳过这些基础控制的产品,都会更早撞上责任、信任和采购门槛 + +## 7. 如果从亚洲扩张进入顺序来排,我会这样做 + +1. **中国:企业合同 / 合规 copilot** +2. **中国:中文 Legal RAG / citation-safe drafting** +3. **新加坡:Singapore-law Legal RAG / citation-safe drafting** +4. **新加坡:企业合同 / matter workflow copilot** +5. **香港:企业内合规 / 文档工作流工具** + +判断逻辑: + +- 先做内部专业人士工具,再做公众工具 +- 先做 source-grounded research / drafting,再做更高敏感的数据工作流 +- 在新加坡先做 professional-internal 模式,再把香港作为第二阶段的 common law + bilingual 扩张法域 + +## 8. 本备忘录依赖的当前规则和市场锚点 + +新加坡: + +- Ministry of Law《Guide for Using Generative AI in the Legal Sector》: + - 面向法律从业者、law firms、in-house teams 和其他相关组织 + - 明确列出 legal research、document drafting、contract review、matter / case management 等使用场景 + - 强调人类验证、专业责任、保密和工具尽调 + - 官方链接:https://www.mlaw.gov.sg/files/Guide_for_using_Generative_AI_in_the_Legal_Sector__Published_on_6_Mar_2026_.pdf +- Singapore Courts《Guide on the Use of Generative AI Tools by Court Users》: + - 使用者仍须对提交内容、法律依据、引文、引述和事实负责 + - 官方链接:https://www.judiciary.gov.sg/docs/default-source/news-and-resources-docs/guide-on-the-use-of-generative-ai-tools-by-court-users.pdf?sfvrsn=3900c814_1 +- Ministry of Law / Legal Technology Platform with Copilot: + - 说明新加坡 law firms 已有更贴近真实工作流的 legaltech 进入路径 + - 官方链接:https://www.mlaw.gov.sg/enhanced-productivity-for-law-firms-in-singapore-with-the-legal-technology-platform/ +- Singapore Academy of Law / LawNet AI and GPT-Legal: + - 说明新加坡法域的 AI 检索和 source-grounded 入口已经进入真实使用场景 + - 官方链接:https://sal.org.sg/articles/singapore-academy-of-law-signs-global-content-partnerships-to-expand-worldwide-access-of-singapore-law-and-unveils-ai-powered-lawnet-4-0-at-techlaw-fest-2025/ + +香港: + +- PCPD《Artificial Intelligence: Model Personal Data Protection Framework》 + - 官方链接:https://www.pcpd.org.hk/english/resources_centre/publications/files/ai_protection_framework.pdf +- PCPD《Checklist on Guidelines for the Use of Generative AI by Employees》 + - 官方链接:https://www.pcpd.org.hk/english/resources_centre/publications/files/guidelines_ai_employees.pdf +- Hong Kong Judiciary《Guidelines on the Use of Generative Artificial Intelligence》 + - 官方链接:https://www.judiciary.hk/doc/en/court_services_facilities/guidelines_on_the_use_of_generative_ai.pdf + +## 9. 最终判断 + +如果中国之后只选一个近邻法域继续扩张,我会先选**新加坡**。 + +如果在新加坡只选两个高价值场景起步,我的建议是: + +- **先做:Singapore-law Legal RAG / citation-safe drafting** +- **再做:企业合同 / matter workflow copilot** + +原因不是新加坡风险更低,而是: + +- 法律行业使用边界更清楚 +- 法院用户责任边界更明确 +- legaltech 基础设施更成熟 +- 更适合把产品限定在**专业人士内部、source-grounded、可审计、可人工签发**的服务模式 + +香港仍值得继续做,但以本轮看到的官方材料结构,更适合放在新加坡之后,作为第二阶段扩张法域。 diff --git a/research/uk-australia-uae-legal-ai-market-comparison-2026-03-20.md b/research/uk-australia-uae-legal-ai-market-comparison-2026-03-20.md new file mode 100644 index 0000000..66a69d4 --- /dev/null +++ b/research/uk-australia-uae-legal-ai-market-comparison-2026-03-20.md @@ -0,0 +1,336 @@ +# 英格兰及威尔士 / 澳大利亚 / 阿联酋 common-law hubs 法律 AI 扩张市场对比备忘录 + +日期:2026-03-20 + +目的:在现有中国、新加坡、香港判断基础上,补一轮**香港、新加坡之外**的潜在扩张市场对比。本轮只看更适合复制“专业人士内部工具 / 企业内受控工具”路线的法域,不把公众法律意见 bot 或自动进入法院提交链路当成首发模式。 + +这不是法律意见,而是 `2026-03-20` 的产品、合规和市场进入研究快照。 + +## 1. 结论先行 + +- 本轮对比法域: + - **英格兰及威尔士** + - **澳大利亚** + - **阿联酋 common-law hubs(DIFC / ADGM)** +- 推荐进入顺序: + 1. **英格兰及威尔士:GO(P0)** + 2. **澳大利亚:CONDITIONAL GO(P1)** + 3. **阿联酋 common-law hubs(DIFC / ADGM):CONDITIONAL GO(P2 / 观察名单)** +- 最适合先复制的场景: + - **Legal RAG / citation-safe drafting** + - 其次才是**企业合同 / 合规 copilot** +- 当前不建议优先复制的场景: + - 面向公众的开放式法律意见 bot + - 未经强复核的法院提交链路自动化 + - 默认把客户数据送入共享训练池的“通用助手” + +## 2. 横向比较:为什么这样排序 + +| 法域 | 官方材料成熟度 | 执业边界清晰度 | 更适合先做的第一种服务模式 | 当前不应先做 | 当前判断 | 关键原因 | +| --- | --- | --- | --- | --- | --- | --- | +| 英格兰及威尔士 | 高 | 高 | England & Wales Legal RAG、citation-safe drafting、受监管的人类签发 drafting assistant | 面向公众的开放式法律意见 bot、自动进入 court filing 链路 | GO(P0) | 有 Law Society、SRA、Judiciary、Civil Justice Council 四层材料;既能看到 professional-internal 路线,也能看到 court-facing 责任边界 | +| 澳大利亚 | 中高 | 中 | professional-internal Legal RAG、合同 / 合规 copilot、due diligence / drafting assistant | 跨州一把梭的 court-facing automation、公众结论型 bot | CONDITIONAL GO(P1) | 职业团体和法院都在出材料,但联邦 / 州法院协议和执业材料更碎片化,适合先在单州或单工作流切入 | +| 阿联酋 common-law hubs(DIFC / ADGM) | 中 | 中低 | 双语 regulatory / knowledge assistant、企业内合同 / 合规 copilot、争议解决前置 research assistant | 面向 onshore 大众的一般法律意见 bot、全国统一 court automation | CONDITIONAL GO(P2 / 观察名单) | AI 治理和 common-law hub 基础设施强,但 legal services licensing、court guidance、数据与执业边界更依赖 DIFC / ADGM 等具体区域,不是单一全国框架 | + +## 3. 为什么英格兰及威尔士排第一 + +### 3.1 这轮最强的信号 + +- **Law Society** 已经把 generative AI 放进清晰的执业治理框架:要求遵守 SRA 规则、复核供应商数据管理、处理 confidentiality / data governance / liability / insurance,并明确输出要做准确性校验。 +- **SRA** 已经批准首家 AI-driven law firm,但批准条件不是“放开做”,而是: + - 不是 autonomous 模式 + - 客户批准后才会推进步骤 + - 有 supervision 和 monitoring + - named regulated solicitors 对系统输出和后果承担最终责任 + - regulated law firms 仍需维持最低保险 +- **Judiciary / Civil Justice Council** 已经把“AI 用于 court documents”放到明确的规则讨论轨道上;而且 `2026-03-20` 本轮刷新看到,`CJC` current-work page 显示 eight-week consultation 正在进行,`latest-news` page 进一步写明截止到 `2026-04-14 23:59`,而 interim report 的初步 proposal 已经开始触及**在特定情形下就 AI 使用作 declaration**,核心落点是 AI 被用于生成法院拟采信的证据,说明 court-facing 边界正在从“原则提醒”往更具体规则推进。 + +### 3.1.1 提前完成 `2026-03-25` 的 `CJC` `L2` 影响分析 + +- 当前 `CJC` consultation 的近端风险并不是“所有 court documents 都要统一 disclosure”,而是把最可能落地的变化收窄到了 evidence-stage 文档。 +- 当前最值得按 potential formal-declaration 方向看待的是: + - `trial witness statements`:方向是要求 legal representatives 声明 AI 没有被用于生成、改写、加强、弱化或重述证人证据 + - `expert reports`:方向是要求 experts 识别并说明被用于报告的 AI(纯行政用途除外) +- 对 `skeleton arguments / advocacy documents / disclosure lists and statements`,当前 consultation 反而没有 pressing case 增加额外 court rule。 +- 这意味着英格兰及威尔士仍然是 **professional-internal P0**,但 court-adjacent 产品必须更明确地把 `evidence-stage workflows` 单独打成高风险红线,而不是只写成泛化的“人工复核更强”。 +- 同日 live recheck 还确认:`CJC current-work` page 当前挂出的 consultation PDF 已切换到 `/2026/03/` official path;这只是 source-anchor refresh,不改变上面的 evidence-stage judgement、`P0` 排序或 court-facing 红线。 + +### 3.2 更适合先做什么 + +- **England & Wales Legal RAG / citation-safe drafting** +- 面向律所研究岗、起草岗、知识管理团队和大型企业法务 +- 默认“先检索、再生成”,每次输出都带: + - 来源 + - 日期 + - 法域 + - citation + +### 3.3 这里的执业边界是什么 + +可以先做: + +- 内部 research memo 草稿 +- citation-safe drafting +- 法规 / 判例更新提醒 +- firm-approved 或 in-house-approved 的内部工具 + +不要先做: + +- 公众开放式个案法律意见 bot +- 无来源裸答 +- 未经律师逐项核验直接进入 pleadings、witness statements、expert reports 的自动化流程 +- 任何会生成、改写、加强、弱化或重述 `trial witness evidence` 的 workflow +- 任何没有 `AI-use disclosure readiness` 的 expert-report drafting / review workflow +- 默认用客户数据测试、模板化或再训练 + +### 3.4 必须守住的最低 safeguards + +- England & Wales jurisdiction filter +- source pinning + citation verifier +- client / matter / document 级权限隔离 +- 不默认用客户数据再训练 +- 人类最终签发 +- 输出、引用、模型版本和审批日志可追溯 +- 对 court-facing 文档默认加更高一级复核,而不是自动提交 +- `document-type gating`:把一般 drafting / research 与 `trial witness statements`、`expert reports`、其他 evidence-stage 文档分开;后者默认关闭或只保留 checklists / verifier +- `evidence-chain provenance log`:保留输入材料范围、AI 使用位置、引用来源、人工修改和 sign-off +- `disclosure readiness pack`:如法院、对方或 expert-governance 规则要求说明 AI 使用,团队应能快速说明用途、范围、人工核验与最终责任人 + +### 3.5 当前还缺什么证据 + +- England & Wales 法域专用 benchmark: + - case citation accuracy + - statute / regulation pinning + - overruled / outdated authority 识别 +- 至少 3 家目标客户访谈: + - 中型律所 + - 大型律所知识管理团队 + - 企业法务 +- 采购和合规证据: + - client-data reuse 是否一票否决 + - 是否要求 VPC / dedicated tenant + - 保险和责任条款如何落合同 + +## 4. 为什么澳大利亚排第二 + +### 4.1 这轮看到的关键信号 + +- **Law Council of Australia** 已经把 AI 放进全国职业议程,并维护一个汇总各州 / 各团体资源的 portal。 +- **Federal Court of Australia** 明确表示: + - AI 使用需要与既有对法院和对方当事人的义务一致 + - 如法官或 Registrar 要求,party / practitioner 应披露 AI 使用 + - 法院在继续考虑是否需要 Guidelines 或 Practice Note +- **Queensland Courts** 已经把“references in submissions 的准确性”写进 Practice Direction,并要求**具体负责的个人法律从业者**具名、核验来源,否则可能面临 costs 或监管转介。 +- **OAIC** 对 generative AI 训练和微调的隐私义务写得很清楚:如果涉及个人信息,不能只写模糊的“research”,需要明确说明是否用于 AI 训练。 + +### 4.2 更适合先做什么 + +- professional-internal Legal RAG +- 单州或单联邦工作流下的 drafting assistant +- 企业法务 / 合规团队的合同 / 合规 copilot + +更现实的进入方式不是“一上来就做全澳通用法院机器人”,而是: + +- 先选一条 jurisdiction slice +- 先选一类 professional-internal workflow +- 先在 strong human review 模式下证明价值 + +### 4.3 这里的执业边界是什么 + +可以先做: + +- 内部 research / drafting +- 合同条款抽取与风险提示 +- 合规义务映射 +- due diligence / data room 摘要 + +不要先做: + +- 跨州 / 跨法院一把梭的 court-facing automation +- 面向公众的结论型法律意见 bot +- 把 confidential / privileged 数据送进公众模型 +- 不做法域切片就输出“澳大利亚法”统一答案 + +### 4.4 必须守住的最低 safeguards + +- Federal / state / territory jurisdiction filter +- 引用校验和日期过滤 +- matter / 客户 / 文档级隔离 +- public tool 禁止处理 confidential / privileged 材料 +- training、retention、notification 条款前置核清 +- 法院链路默认人工确认和具名负责 + +### 4.5 当前还缺什么证据 + +- 先从哪个州 / 哪个法院体系切入: + - NSW + - Victoria + - Queensland + - Federal Court +- 哪些客户更愿意先买: + - 企业法务 + - 中大型律所 + - ALSP / litigation support +- 澳大利亚本地 benchmark: + - 州 / 联邦法律混用错误率 + - citation 准确率 + - public / secure tool 边界下的用户采纳率 + +## 5. 为什么阿联酋 common-law hubs 先放观察名单 + +### 5.1 先说明范围 + +这轮不是在评估“整个阿联酋 onshore 法律服务市场”。 + +我这轮只把**DIFC / ADGM 这类 common-law hubs**当作潜在扩张入口,因为我能核到的官方材料主要集中在这里: + +- **DIFC Courts** 对 AI 在诉讼中的使用有明确 guidance +- **ADGM** 对 legal service providers、English common law、数据保护和治理要求更清晰 +- **UAE Cabinet / UAE Legislation** 在更高层面释放出 AI-native regulation 和 regulatory intelligence 的治理信号 + +### 5.2 为什么它值得看,但不排第一 + +值得看,是因为: + +- ADGM 直接适用 English common law,法律系统对国际法律科技产品更熟悉 +- DIFC Courts 已经明确: + - proceedings 中使用 AI 要尽早披露 + - 问题最好在 CMC 前解决 + - witness statements 仍应是证人自己的话 +- ADGM 对 legal service providers 的 licence 条件更明确,要求: + - managing partner 资历 + - professional indemnity insurance + - annual return + - defined principles +- ADGM 的 DPIA 指南已经把 AI、machine learning、automated decision making 列为高风险处理触发条件之一 +- UAE 2026 年的 regulatory intelligence whitepaper 说明这里对“AI + regulation / law”有明显的政策推动意愿 + +不排第一,是因为: + +- 更适合进入的是 **DIFC / ADGM 这类特定 common-law hub** +- 不是一个“全国统一、执业边界完全明确”的法律 AI 市场 +- free zone / onshore、英语 / 阿语、court / non-court、regulated legal services / enterprise tooling 之间边界更复杂 + +### 5.3 更适合先做什么 + +- 双语 regulatory / knowledge assistant +- 企业内合同 / 合规 copilot +- 争议解决前置 research / drafting assistant + +也就是说,更像: + +- 国际律所或区域总部内部工具 +- DIFC / ADGM 工作流周边工具 + +而不像: + +- 面向阿联酋大众的全国统一法律意见 bot + +### 5.4 这里的执业边界是什么 + +可以先做: + +- bilingual research / drafting +- internal compliance / regulatory monitoring +- dispute-support research + +不要先做: + +- onshore 大众法律意见机器人 +- 未经披露和复核的 court-facing automation +- 不区分 DIFC / ADGM / onshore 边界的统一“UAE legal AI” + +### 5.5 必须守住的最低 safeguards + +- DIFC / ADGM / onshore jurisdiction gating +- Arabic + English source grounding +- 对 AI / 新技术 / 高风险个人数据处理先做 DPIA +- 不默认用客户数据再训练 +- local counsel / local regulated partner 参与 +- proceedings 相关使用默认提前披露、强日志和人工签发 + +### 5.6 当前还缺什么证据 + +- 先走 DIFC 还是 ADGM +- 本地执业伙伴如何配置 +- 双语语料授权能否拿到 +- 企业客户更关心 privacy / hosting,还是更关心执业责任 / insurance +- 哪些场景能留在 internal tooling,哪些会被视为 regulated legal services + +## 6. 建议的扩张顺序 + +如果香港、新加坡之后要继续往外走,我会这样排: + +1. **英格兰及威尔士** +2. **澳大利亚** +3. **阿联酋 common-law hubs(DIFC / ADGM)** + +背后的逻辑是: + +- 先去**职业规则、法院边界、供应商治理要求都更清楚**的市场 +- 先做**professional-internal** 工具,再碰面向公众或法院提交链路 +- 先选**单一法域更清晰**的市场,再进**多层边界并存**的市场 + +## 7. 跨市场统一的 no-go 边界 + +下面这些边界,在这三组法域里都不适合作为早期进入方式: + +- 面向公众的开放式个案法律意见 bot +- 自动进入法院 / 仲裁 / 监管提交链路 +- 无来源、无引用、无人工签发的法律起草 +- 默认把客户数据送入共享训练池 +- 无法解释 retention、deletion、sub-processor、logs 的工具 + +## 8. 如果继续推进,下一步最值得做什么 + +- 为英格兰及威尔士做一份单独 go / no-go memo +- 为澳大利亚做一份“单州切入”备忘录,而不是先做全国版本 +- 为 DIFC / ADGM 做一张“free zone / onshore / court / legal-services licensing”边界图 +- 给这三组法域分别补 benchmark 规范和客户访谈提纲 + +## 9. 本轮用到的核心官方锚点 + +英格兰及威尔士: + +- Law Society《Generative AI – the essentials》 + https://www.lawsociety.org.uk/en/Topics/AI-and-lawtech/Guides/Generative-AI-the-essentials +- SRA《SRA approves first AI-driven law firm》 + https://media.sra.org.uk/sra/news/press/garfield-ai-authorised/ +- Courts and Tribunals Judiciary《Artificial Intelligence (AI) – Judicial Guidance (October 2025)》 + https://www.judiciary.uk/guidance-and-resources/artificial-intelligence-ai-judicial-guidance-october-2025/ +- Civil Justice Council《Use of AI in preparing court documents》 + https://www.judiciary.uk/related-offices-and-bodies/advisory-bodies/cjc/current-work/use-of-ai-in-preparing-court-documents/ +- Civil Justice Council《Latest news from the Civil Justice Council》 + https://www.judiciary.uk/related-offices-and-bodies/advisory-bodies/cjc/latest-news/ +- Civil Justice Council《Interim Report and Consultation - Use of AI for Preparing Court Documents》 + https://www.judiciary.uk/wp-content/uploads/2026/03/Interim-Report-and-Consultation-Use-of-AI-for-Preparing-Court-Docume.pdf + +澳大利亚: + +- Law Council of Australia《Artificial Intelligence and the Legal Profession》 + https://lawcouncil.au/policy-agenda/advancing-the-profession/artificial-intelligence-and-the-legal-profession +- Federal Court of Australia《Notice to the Profession: Artificial intelligence use in the Federal Court of Australia》 + https://www.fedcourt.gov.au/news-and-events/29-april-2025 +- Queensland Courts《Practice Direction 4 of 2025 - Accuracy of References in Submissions》 + https://www.courts.qld.gov.au/__data/assets/pdf_file/0011/882875/lc-pd-4-of-2025-Accuracy-of-References-in-Submissions.pdf +- Queensland Courts《Guidelines for Responsible Use of Generative AI by Non-Lawyers》 + https://www.courts.qld.gov.au/going-to-court/using-generative-ai +- OAIC《Guidance on privacy and developing and training generative AI models》 + https://www.oaic.gov.au/privacy/privacy-guidance-for-organisations-and-government-agencies/guidance-on-privacy-and-developing-and-training-generative-ai-models + +阿联酋 common-law hubs: + +- DIFC Courts《Practical Guidance Note No. 2 of 2023 Guidelines on the use of large language models and generative AI in proceedings before the DIFC Courts》 + https://www.difccourts.ae/rules-decisions/practice-directions/practical-guidance-note-no-2-2023-guidelines-use-large-language-models-and-generative-ai-proceedings-difc-courts +- ADGM《The ADGM Legal Framework》 + https://www.adgm.com/legal-framework +- ADGM《English Common Law》 + https://www.adgm.com/adgm-courts/english-common-law +- ADGM《Enhanced Controls for Legal, Tax and Company Service Providers》 + https://www.adgm.com/media/announcements/adgm-registration-authority-publishes-enhanced-controls-for-legal-tax-and-company-service-providers +- ADGM ODP《How to Conduct a Data Protection Impact Assessment (DPIA)》 + https://assets.adgm.com/download/assets/ADGM%2B-%2BHow%2Bto%2BConduct%2Ba%2BData%2BProtection%2BImpact%2BAssessment%2B%28DPIA%29%2B%28Explainer%29.pdf/16a7bedc58ad11efa80cb2570a3a6e3c +- UAE Legislation《Federal Decree by Law No. (45) of 2021 Concerning the Protection of Personal Data》 + https://uaelegislation.gov.ae/en/legislations/1972 +- UAE Cabinet / UAE Legislation《UAE Government launches its 1st Whitepaper on shaping future of regulatory intelligence》 + https://uaelegislation.gov.ae/en/news/uae-government-launches-at-world-economic-forum-in-davos-its-1st-whitepaper-on-shaping-future-of-regulatory-intelligence diff --git a/skills/pptx-run-report/LICENSE.txt b/skills/pptx-run-report/LICENSE.txt new file mode 100644 index 0000000..c55ab42 --- /dev/null +++ b/skills/pptx-run-report/LICENSE.txt @@ -0,0 +1,30 @@ +© 2025 Anthropic, PBC. All rights reserved. + +LICENSE: Use of these materials (including all code, prompts, assets, files, +and other components of this Skill) is governed by your agreement with +Anthropic regarding use of Anthropic's services. If no separate agreement +exists, use is governed by Anthropic's Consumer Terms of Service or +Commercial Terms of Service, as applicable: +https://www.anthropic.com/legal/consumer-terms +https://www.anthropic.com/legal/commercial-terms +Your applicable agreement is referred to as the "Agreement." "Services" are +as defined in the Agreement. + +ADDITIONAL RESTRICTIONS: Notwithstanding anything in the Agreement to the +contrary, users may not: + +- Extract these materials from the Services or retain copies of these + materials outside the Services +- Reproduce or copy these materials, except for temporary copies created + automatically during authorized use of the Services +- Create derivative works based on these materials +- Distribute, sublicense, or transfer these materials to any third party +- Make, offer to sell, sell, or import any inventions embodied in these + materials +- Reverse engineer, decompile, or disassemble these materials + +The receipt, viewing, or possession of these materials does not convey or +imply any license or right beyond those expressly granted above. + +Anthropic retains all right, title, and interest in these materials, +including all copyrights, patents, and other intellectual property rights. diff --git a/skills/pptx-run-report/SKILL.md b/skills/pptx-run-report/SKILL.md new file mode 100644 index 0000000..a00f308 --- /dev/null +++ b/skills/pptx-run-report/SKILL.md @@ -0,0 +1,301 @@ +--- +name: pptx-run-report +description: "Use this skill any time a .pptx file is involved — creating, reading, editing, or generating a run report presentation. This includes: creating slide decks for mentors/colleagues/classmates, generating automated run report presentations, reading or parsing .pptx files, editing existing presentations, and using PptxGenJS. Trigger whenever the user mentions 'deck', 'slides', 'presentation', 'PPT', or references a .pptx filename." +license: Proprietary. LICENSE.txt has complete terms +--- + +# PPTX Skill + Run Report Presentation + +## Quick Reference + +| Task | Guide | +|------|-------| +| Read/analyze content | `python -m markitdown presentation.pptx` | +| Edit or create from template | Read [editing.md](editing.md) | +| Create from scratch | Read [pptxgenjs.md](pptxgenjs.md) | +| Generate run report presentation | See [Run Report Presentation](#run-report-presentation) below | + +--- + +## Reading Content + +```bash +# Text extraction +python -m markitdown presentation.pptx + +# Visual overview +python scripts/thumbnail.py presentation.pptx + +# Raw XML +python scripts/office/unpack.py presentation.pptx unpacked/ +``` + +--- + +## Editing Workflow + +**Read [editing.md](editing.md) for full details.** + +1. Analyze template with `thumbnail.py` +2. Unpack → manipulate slides → edit content → clean → pack + +--- + +## Creating from Scratch + +**Read [pptxgenjs.md](pptxgenjs.md) for full details.** + +Use when no template or reference presentation is available. + +--- + +## Run Report Presentation + +When generating a PPTX after an ArgusBot run completes, you are making a **work presentation** — something suitable for showing to a mentor, colleague, or classmate. + +### Mindset + +You are NOT generating a "loop run report." Nobody cares about Round 1/2/3, reviewer confidence scores, or acceptance check exit codes. + +You are generating a **work presentation** — what was the problem, what did we do, what are the results. + +Think: "If I had to present this to my mentor in 5 minutes, what would the slides say?" + +### Input + +The caller provides: +- The **Markdown final task report** (already written) — this is your primary content source +- The original objective +- Key metadata (session ID, date, checks summary) + +### Slide Structure (adapt to content) + +Not every run needs the same slides. Choose what fits: + +#### For research / experiment runs: +1. **Title** — project name, date, one-line goal +2. **Background / Motivation** — why this work matters +3. **Approach** — what method/strategy was used +4. **Key Changes** — what was actually modified (files, configs, code) +5. **Results** — experiment outcomes, test results, metrics +6. **Findings / Takeaways** — what we learned +7. **Next Steps** — follow-up work, open questions + +#### For feature / implementation runs: +1. **Title** — feature name, date +2. **Problem Statement** — what gap was being filled +3. **Solution Overview** — architecture or approach summary +4. **Implementation Details** — key code changes, design decisions +5. **Validation** — tests passed, checks cleared, demo-ready status +6. **Next Steps** — remaining work, deployment notes + +#### For bug fix / investigation runs: +1. **Title** — issue summary, date +2. **Symptom** — what was broken +3. **Root Cause** — what caused it +4. **Fix** — what was changed +5. **Verification** — how we confirmed the fix +6. **Lessons Learned** — what to watch for in the future + +### Language + +Match the language of the original task and final report: +- Chinese task → Chinese presentation +- English task → English presentation +- Mixed → bilingual headings are OK + +### Write Contract + +1. Use PptxGenJS via a Node.js script (see [pptxgenjs.md](pptxgenjs.md)). +2. Write to the exact file path provided. +3. Generate 6-10 slides. Quality over quantity. +4. After writing, reply with: + ``` + PPTX_REPORT_PATH: + PPTX_REPORT_STATUS: written + ``` + +--- + +## Design Ideas + +**Don't create boring slides.** Plain bullets on a white background won't impress anyone. Consider ideas from this list for each slide. + +### Before Starting + +- **Pick a bold, content-informed color palette**: The palette should feel designed for THIS topic. If swapping your colors into a completely different presentation would still "work," you haven't made specific enough choices. +- **Dominance over equality**: One color should dominate (60-70% visual weight), with 1-2 supporting tones and one sharp accent. Never give all colors equal weight. +- **Dark/light contrast**: Dark backgrounds for title + conclusion slides, light for content ("sandwich" structure). Or commit to dark throughout for a premium feel. +- **Commit to a visual motif**: Pick ONE distinctive element and repeat it — rounded image frames, icons in colored circles, thick single-side borders. Carry it across every slide. + +### Color Palettes + +Choose colors that match your topic — don't default to generic blue. Use these palettes as inspiration: + +| Theme | Primary | Secondary | Accent | +|-------|---------|-----------|--------| +| **Midnight Executive** | `1E2761` (navy) | `CADCFC` (ice blue) | `FFFFFF` (white) | +| **Forest & Moss** | `2C5F2D` (forest) | `97BC62` (moss) | `F5F5F5` (cream) | +| **Coral Energy** | `F96167` (coral) | `F9E795` (gold) | `2F3C7E` (navy) | +| **Warm Terracotta** | `B85042` (terracotta) | `E7E8D1` (sand) | `A7BEAE` (sage) | +| **Ocean Gradient** | `065A82` (deep blue) | `1C7293` (teal) | `21295C` (midnight) | +| **Charcoal Minimal** | `36454F` (charcoal) | `F2F2F2` (off-white) | `212121` (black) | +| **Teal Trust** | `028090` (teal) | `00A896` (seafoam) | `02C39A` (mint) | +| **Berry & Cream** | `6D2E46` (berry) | `A26769` (dusty rose) | `ECE2D0` (cream) | +| **Sage Calm** | `84B59F` (sage) | `69A297` (eucalyptus) | `50808E` (slate) | +| **Cherry Bold** | `990011` (cherry) | `FCF6F5` (off-white) | `2F3C7E` (navy) | + +### For Each Slide + +**Every slide needs a visual element** — image, chart, icon, or shape. Text-only slides are forgettable. + +**Layout options:** +- Two-column (text left, illustration on right) +- Icon + text rows (icon in colored circle, bold header, description below) +- 2x2 or 2x3 grid (image on one side, grid of content blocks on other) +- Half-bleed image (full left or right side) with content overlay + +**Data display:** +- Large stat callouts (big numbers 60-72pt with small labels below) +- Comparison columns (before/after, pros/cons, side-by-side options) +- Timeline or process flow (numbered steps, arrows) + +**Visual polish:** +- Icons in small colored circles next to section headers +- Italic accent text for key stats or taglines + +### Typography + +**Choose an interesting font pairing** — don't default to Arial. Pick a header font with personality and pair it with a clean body font. + +| Header Font | Body Font | +|-------------|-----------| +| Georgia | Calibri | +| Arial Black | Arial | +| Calibri | Calibri Light | +| Cambria | Calibri | +| Trebuchet MS | Calibri | +| Impact | Arial | +| Palatino | Garamond | +| Consolas | Calibri | + +| Element | Size | +|---------|------| +| Slide title | 36-44pt bold | +| Section header | 20-24pt bold | +| Body text | 14-16pt | +| Captions | 10-12pt muted | + +### Spacing + +- 0.5" minimum margins +- 0.3-0.5" between content blocks +- Leave breathing room—don't fill every inch + +### Avoid (Common Mistakes) + +- **Don't repeat the same layout** — vary columns, cards, and callouts across slides +- **Don't center body text** — left-align paragraphs and lists; center only titles +- **Don't skimp on size contrast** — titles need 36pt+ to stand out from 14-16pt body +- **Don't default to blue** — pick colors that reflect the specific topic +- **Don't mix spacing randomly** — choose 0.3" or 0.5" gaps and use consistently +- **Don't style one slide and leave the rest plain** — commit fully or keep it simple throughout +- **Don't create text-only slides** — add images, icons, charts, or visual elements; avoid plain title + bullets +- **Don't forget text box padding** — when aligning lines or shapes with text edges, set `margin: 0` on the text box or offset the shape to account for padding +- **Don't use low-contrast elements** — icons AND text need strong contrast against the background; avoid light text on light backgrounds or dark text on dark backgrounds +- **NEVER use accent lines under titles** — these are a hallmark of AI-generated slides; use whitespace or background color instead + +--- + +## QA (Required) + +**Assume there are problems. Your job is to find them.** + +Your first render is almost never correct. Approach QA as a bug hunt, not a confirmation step. If you found zero issues on first inspection, you weren't looking hard enough. + +### Content QA + +```bash +python -m markitdown output.pptx +``` + +Check for missing content, typos, wrong order. + +**When using templates, check for leftover placeholder text:** + +```bash +python -m markitdown output.pptx | grep -iE "xxxx|lorem|ipsum|this.*(page|slide).*layout" +``` + +If grep returns results, fix them before declaring success. + +### Visual QA + +**⚠️ USE SUBAGENTS** — even for 2-3 slides. You've been staring at the code and will see what you expect, not what's there. Subagents have fresh eyes. + +Convert slides to images (see [Converting to Images](#converting-to-images)), then use this prompt: + +``` +Visually inspect these slides. Assume there are issues — find them. + +Look for: +- Overlapping elements (text through shapes, lines through words, stacked elements) +- Text overflow or cut off at edges/box boundaries +- Decorative lines positioned for single-line text but title wrapped to two lines +- Source citations or footers colliding with content above +- Elements too close (< 0.3" gaps) or cards/sections nearly touching +- Uneven gaps (large empty area in one place, cramped in another) +- Insufficient margin from slide edges (< 0.5") +- Columns or similar elements not aligned consistently +- Low-contrast text (e.g., light gray text on cream-colored background) +- Low-contrast icons (e.g., dark icons on dark backgrounds without a contrasting circle) +- Text boxes too narrow causing excessive wrapping +- Leftover placeholder content + +For each slide, list issues or areas of concern, even if minor. + +Read and analyze these images: +1. /path/to/slide-01.jpg (Expected: [brief description]) +2. /path/to/slide-02.jpg (Expected: [brief description]) + +Report ALL issues found, including minor ones. +``` + +### Verification Loop + +1. Generate slides → Convert to images → Inspect +2. **List issues found** (if none found, look again more critically) +3. Fix issues +4. **Re-verify affected slides** — one fix often creates another problem +5. Repeat until a full pass reveals no new issues + +**Do not declare success until you've completed at least one fix-and-verify cycle.** + +--- + +## Converting to Images + +Convert presentations to individual slide images for visual inspection: + +```bash +python scripts/office/soffice.py --headless --convert-to pdf output.pptx +pdftoppm -jpeg -r 150 output.pdf slide +``` + +This creates `slide-01.jpg`, `slide-02.jpg`, etc. + +To re-render specific slides after fixes: + +```bash +pdftoppm -jpeg -r 150 -f N -l N output.pdf slide-fixed +``` + +--- + +## Dependencies + +- `pip install "markitdown[pptx]"` - text extraction +- `pip install Pillow` - thumbnail grids +- `npm install -g pptxgenjs` - creating from scratch +- LibreOffice (`soffice`) - PDF conversion (auto-configured for sandboxed environments via `scripts/office/soffice.py`) +- Poppler (`pdftoppm`) - PDF to images diff --git a/skills/pptx-run-report/editing.md b/skills/pptx-run-report/editing.md new file mode 100644 index 0000000..f873e8a --- /dev/null +++ b/skills/pptx-run-report/editing.md @@ -0,0 +1,205 @@ +# Editing Presentations + +## Template-Based Workflow + +When using an existing presentation as a template: + +1. **Analyze existing slides**: + ```bash + python scripts/thumbnail.py template.pptx + python -m markitdown template.pptx + ``` + Review `thumbnails.jpg` to see layouts, and markitdown output to see placeholder text. + +2. **Plan slide mapping**: For each content section, choose a template slide. + + ⚠️ **USE VARIED LAYOUTS** — monotonous presentations are a common failure mode. Don't default to basic title + bullet slides. Actively seek out: + - Multi-column layouts (2-column, 3-column) + - Image + text combinations + - Full-bleed images with text overlay + - Quote or callout slides + - Section dividers + - Stat/number callouts + - Icon grids or icon + text rows + + **Avoid:** Repeating the same text-heavy layout for every slide. + + Match content type to layout style (e.g., key points → bullet slide, team info → multi-column, testimonials → quote slide). + +3. **Unpack**: `python scripts/office/unpack.py template.pptx unpacked/` + +4. **Build presentation** (do this yourself, not with subagents): + - Delete unwanted slides (remove from ``) + - Duplicate slides you want to reuse (`add_slide.py`) + - Reorder slides in `` + - **Complete all structural changes before step 5** + +5. **Edit content**: Update text in each `slide{N}.xml`. + **Use subagents here if available** — slides are separate XML files, so subagents can edit in parallel. + +6. **Clean**: `python scripts/clean.py unpacked/` + +7. **Pack**: `python scripts/office/pack.py unpacked/ output.pptx --original template.pptx` + +--- + +## Scripts + +| Script | Purpose | +|--------|---------| +| `unpack.py` | Extract and pretty-print PPTX | +| `add_slide.py` | Duplicate slide or create from layout | +| `clean.py` | Remove orphaned files | +| `pack.py` | Repack with validation | +| `thumbnail.py` | Create visual grid of slides | + +### unpack.py + +```bash +python scripts/office/unpack.py input.pptx unpacked/ +``` + +Extracts PPTX, pretty-prints XML, escapes smart quotes. + +### add_slide.py + +```bash +python scripts/add_slide.py unpacked/ slide2.xml # Duplicate slide +python scripts/add_slide.py unpacked/ slideLayout2.xml # From layout +``` + +Prints `` to add to `` at desired position. + +### clean.py + +```bash +python scripts/clean.py unpacked/ +``` + +Removes slides not in ``, unreferenced media, orphaned rels. + +### pack.py + +```bash +python scripts/office/pack.py unpacked/ output.pptx --original input.pptx +``` + +Validates, repairs, condenses XML, re-encodes smart quotes. + +### thumbnail.py + +```bash +python scripts/thumbnail.py input.pptx [output_prefix] [--cols N] +``` + +Creates `thumbnails.jpg` with slide filenames as labels. Default 3 columns, max 12 per grid. + +**Use for template analysis only** (choosing layouts). For visual QA, use `soffice` + `pdftoppm` to create full-resolution individual slide images—see SKILL.md. + +--- + +## Slide Operations + +Slide order is in `ppt/presentation.xml` → ``. + +**Reorder**: Rearrange `` elements. + +**Delete**: Remove ``, then run `clean.py`. + +**Add**: Use `add_slide.py`. Never manually copy slide files—the script handles notes references, Content_Types.xml, and relationship IDs that manual copying misses. + +--- + +## Editing Content + +**Subagents:** If available, use them here (after completing step 4). Each slide is a separate XML file, so subagents can edit in parallel. In your prompt to subagents, include: +- The slide file path(s) to edit +- **"Use the Edit tool for all changes"** +- The formatting rules and common pitfalls below + +For each slide: +1. Read the slide's XML +2. Identify ALL placeholder content—text, images, charts, icons, captions +3. Replace each placeholder with final content + +**Use the Edit tool, not sed or Python scripts.** The Edit tool forces specificity about what to replace and where, yielding better reliability. + +### Formatting Rules + +- **Bold all headers, subheadings, and inline labels**: Use `b="1"` on ``. This includes: + - Slide titles + - Section headers within a slide + - Inline labels like (e.g.: "Status:", "Description:") at the start of a line +- **Never use unicode bullets (•)**: Use proper list formatting with `` or `` +- **Bullet consistency**: Let bullets inherit from the layout. Only specify `` or ``. + +--- + +## Common Pitfalls + +### Template Adaptation + +When source content has fewer items than the template: +- **Remove excess elements entirely** (images, shapes, text boxes), don't just clear text +- Check for orphaned visuals after clearing text content +- Run visual QA to catch mismatched counts + +When replacing text with different length content: +- **Shorter replacements**: Usually safe +- **Longer replacements**: May overflow or wrap unexpectedly +- Test with visual QA after text changes +- Consider truncating or splitting content to fit the template's design constraints + +**Template slots ≠ Source items**: If template has 4 team members but source has 3 users, delete the 4th member's entire group (image + text boxes), not just the text. + +### Multi-Item Content + +If source has multiple items (numbered lists, multiple sections), create separate `` elements for each — **never concatenate into one string**. + +**❌ WRONG** — all items in one paragraph: +```xml + + Step 1: Do the first thing. Step 2: Do the second thing. + +``` + +**✅ CORRECT** — separate paragraphs with bold headers: +```xml + + + Step 1 + + + + Do the first thing. + + + + Step 2 + + +``` + +Copy `` from the original paragraph to preserve line spacing. Use `b="1"` on headers. + +### Smart Quotes + +Handled automatically by unpack/pack. But the Edit tool converts smart quotes to ASCII. + +**When adding new text with quotes, use XML entities:** + +```xml +the “Agreement” +``` + +| Character | Name | Unicode | XML Entity | +|-----------|------|---------|------------| +| `“` | Left double quote | U+201C | `“` | +| `”` | Right double quote | U+201D | `”` | +| `‘` | Left single quote | U+2018 | `‘` | +| `’` | Right single quote | U+2019 | `’` | + +### Other + +- **Whitespace**: Use `xml:space="preserve"` on `` with leading/trailing spaces +- **XML parsing**: Use `defusedxml.minidom`, not `xml.etree.ElementTree` (corrupts namespaces) diff --git a/skills/pptx-run-report/generate_run_report.js b/skills/pptx-run-report/generate_run_report.js new file mode 100644 index 0000000..d226367 --- /dev/null +++ b/skills/pptx-run-report/generate_run_report.js @@ -0,0 +1,548 @@ +#!/usr/bin/env node +/** + * generate_run_report.js + * + * Reads a JSON data file produced by pptx_report.py and generates a + * styled PPTX run report using PptxGenJS. + * + * Usage: node pptx/generate_run_report.js + */ + +const fs = require("fs"); +const path = require("path"); +const pptxgen = require("pptxgenjs"); + +// ── Color Palette (Teal Trust — matches existing ArgusBot presentation) ── +const C = { + navy: "1B2A4A", + ice: "C8E6F5", + white: "FFFFFF", + dark: "0D1B2E", + accent: "0891B2", + muted: "64748B", + lightBg: "F0F9FF", + cardBg: "FFFFFF", + green: "059669", + orange: "D97706", + red: "DC2626", + teal: "0D9488", + purple: "7C3AED", + yellow: "EAB308", +}; + +const FONT = "Calibri"; +const FONT_BOLD = "Calibri"; +const FONT_MONO = "Consolas"; + +function mkShadow() { + return { type: "outer", color: "000000", blur: 4, offset: 2, angle: 135, opacity: 0.10 }; +} + +function defineMasters(pres) { + pres.defineSlideMaster({ title: "DARK", background: { color: C.dark } }); + pres.defineSlideMaster({ title: "LIGHT", background: { color: C.lightBg } }); + pres.defineSlideMaster({ title: "WHITE", background: { color: C.white } }); +} + +function addSlideNum(slide, num, total, dark = false) { + slide.addText(`${num} / ${total}`, { + x: 8.8, y: 5.2, w: 1, h: 0.3, + fontSize: 9, fontFace: FONT, color: dark ? "7BA3C4" : "94A3B8", + align: "right", + }); +} + +function truncate(str, maxLen) { + if (!str) return ""; + return str.length > maxLen ? str.slice(0, maxLen - 3) + "..." : str; +} + +function statusColor(status) { + if (status === "done") return C.green; + if (status === "blocked") return C.red; + return C.orange; +} + +function main() { + const args = process.argv.slice(2); + if (args.length < 2) { + console.error("Usage: node generate_run_report.js "); + process.exit(1); + } + + const dataPath = args[0]; + const outputPath = args[1]; + const raw = fs.readFileSync(dataPath, "utf-8"); + const data = JSON.parse(raw); + + const pres = new pptxgen(); + pres.layout = "LAYOUT_16x9"; + pres.author = "ArgusBot"; + pres.title = `ArgusBot Run Report — ${data.objective_short || "Run Report"}`; + defineMasters(pres); + + const hasOperatorMessages = data.operator_messages && data.operator_messages.length > 0; + const TOTAL = hasOperatorMessages ? 8 : 7; + let sn = 0; + + // ════════════════════════════════════════ + // SLIDE 1: Title + // ════════════════════════════════════════ + sn++; + { + const s = pres.addSlide({ masterName: "DARK" }); + addSlideNum(s, sn, TOTAL, true); + s.addText("ArgusBot Run Report", { + x: 0.8, y: 1.2, w: 8.4, h: 1.0, + fontSize: 42, fontFace: FONT_BOLD, color: C.white, bold: true, + }); + s.addText(truncate(data.objective || "", 200), { + x: 0.8, y: 2.3, w: 8.4, h: 0.8, + fontSize: 16, fontFace: FONT, color: C.ice, + }); + const meta = []; + if (data.date) meta.push(data.date); + if (data.session_id) meta.push(`Session: ${truncate(data.session_id, 40)}`); + s.addText(meta.join(" | "), { + x: 0.8, y: 3.5, w: 8.4, h: 0.4, + fontSize: 12, fontFace: FONT, color: C.muted, + }); + // Status badge + const badge = data.success ? "SUCCESS" : "INCOMPLETE"; + const badgeColor = data.success ? C.green : C.orange; + s.addShape(pres.shapes.ROUNDED_RECTANGLE, { + x: 0.8, y: 4.2, w: 1.8, h: 0.45, + fill: { color: badgeColor }, rectRadius: 0.1, + }); + s.addText(badge, { + x: 0.8, y: 4.2, w: 1.8, h: 0.45, + fontSize: 14, fontFace: FONT_BOLD, color: C.white, bold: true, + align: "center", valign: "middle", + }); + } + + // ════════════════════════════════════════ + // SLIDE 2: Executive Summary + // ════════════════════════════════════════ + sn++; + { + const s = pres.addSlide({ masterName: "LIGHT" }); + addSlideNum(s, sn, TOTAL); + s.addText("Executive Summary", { + x: 0.8, y: 0.3, w: 8.4, h: 0.7, + fontSize: 32, fontFace: FONT_BOLD, color: C.navy, bold: true, + }); + + // Stat boxes + const stats = [ + ["Total Rounds", String(data.total_rounds || 0), C.accent], + ["Final Status", data.success ? "Success" : "Incomplete", data.success ? C.green : C.orange], + ["Checks Passed", `${data.checks_passed || 0}/${data.checks_total || 0}`, data.checks_passed === data.checks_total ? C.green : C.red], + ["Reviewer", data.reviewer_verdict || "N/A", statusColor(data.reviewer_verdict)], + ]; + stats.forEach(([label, value, color], i) => { + const xx = 0.5 + i * 2.3; + s.addShape(pres.shapes.RECTANGLE, { + x: xx, y: 1.3, w: 2.05, h: 1.4, + fill: { color: C.cardBg }, shadow: mkShadow(), + }); + s.addShape(pres.shapes.RECTANGLE, { + x: xx, y: 1.3, w: 2.05, h: 0.08, + fill: { color }, + }); + s.addText(value, { + x: xx, y: 1.5, w: 2.05, h: 0.7, + fontSize: 22, fontFace: FONT_BOLD, color, bold: true, + align: "center", valign: "middle", + }); + s.addText(label, { + x: xx, y: 2.2, w: 2.05, h: 0.4, + fontSize: 12, fontFace: FONT, color: C.muted, + align: "center", + }); + }); + + // Stop reason + s.addShape(pres.shapes.RECTANGLE, { + x: 0.8, y: 3.0, w: 8.4, h: 0.7, + fill: { color: C.cardBg }, shadow: mkShadow(), + }); + s.addText("Stop Reason", { + x: 1.0, y: 3.05, w: 2, h: 0.3, + fontSize: 11, fontFace: FONT_BOLD, color: C.muted, bold: true, + }); + s.addText(truncate(data.stop_reason || "N/A", 200), { + x: 1.0, y: 3.35, w: 8.0, h: 0.3, + fontSize: 12, fontFace: FONT, color: "444444", + }); + + // Objective recap + s.addShape(pres.shapes.RECTANGLE, { + x: 0.8, y: 3.9, w: 8.4, h: 1.3, + fill: { color: C.cardBg }, shadow: mkShadow(), + }); + s.addText("Objective", { + x: 1.0, y: 3.95, w: 2, h: 0.3, + fontSize: 11, fontFace: FONT_BOLD, color: C.muted, bold: true, + }); + s.addText(truncate(data.objective || "", 400), { + x: 1.0, y: 4.25, w: 8.0, h: 0.85, + fontSize: 11, fontFace: FONT, color: "444444", + }); + } + + // ════════════════════════════════════════ + // SLIDE 3: Round Timeline + // ════════════════════════════════════════ + sn++; + { + const s = pres.addSlide({ masterName: "WHITE" }); + addSlideNum(s, sn, TOTAL); + s.addText("Round Timeline", { + x: 0.8, y: 0.3, w: 8.4, h: 0.7, + fontSize: 28, fontFace: FONT_BOLD, color: C.navy, bold: true, + }); + s.addText("Visual pipeline of round execution and reviewer decisions", { + x: 0.8, y: 0.95, w: 8.4, h: 0.35, + fontSize: 14, fontFace: FONT, color: C.muted, italic: true, + }); + + const rounds = data.rounds || []; + // Show up to 12 rounds in a grid (3 rows x 4 cols) + const maxDisplay = 12; + const displayed = rounds.slice(0, maxDisplay); + const cols = 4; + displayed.forEach((r, i) => { + const col = i % cols; + const row = Math.floor(i / cols); + const xx = 0.5 + col * 2.35; + const yy = 1.5 + row * 1.15; + const color = statusColor(r.review_status); + + s.addShape(pres.shapes.RECTANGLE, { + x: xx, y: yy, w: 2.1, h: 0.95, + fill: { color: C.cardBg }, shadow: mkShadow(), + }); + s.addShape(pres.shapes.RECTANGLE, { + x: xx, y: yy, w: 2.1, h: 0.06, + fill: { color }, + }); + s.addText(`Round ${r.round_index}`, { + x: xx + 0.1, y: yy + 0.1, w: 1.2, h: 0.3, + fontSize: 12, fontFace: FONT_BOLD, color: C.navy, bold: true, margin: 0, + }); + // Status badge + s.addShape(pres.shapes.ROUNDED_RECTANGLE, { + x: xx + 1.2, y: yy + 0.12, w: 0.8, h: 0.25, + fill: { color }, rectRadius: 0.05, + }); + s.addText(r.review_status || "?", { + x: xx + 1.2, y: yy + 0.12, w: 0.8, h: 0.25, + fontSize: 9, fontFace: FONT_BOLD, color: C.white, bold: true, + align: "center", valign: "middle", + }); + // Check result + const checkText = r.checks_passed ? "checks: pass" : "checks: fail"; + const checkColor = r.checks_passed ? C.green : C.red; + s.addText(checkText, { + x: xx + 0.1, y: yy + 0.5, w: 1.9, h: 0.2, + fontSize: 9, fontFace: FONT, color: checkColor, margin: 0, + }); + // Confidence + if (r.review_confidence !== undefined) { + s.addText(`confidence: ${(r.review_confidence * 100).toFixed(0)}%`, { + x: xx + 0.1, y: yy + 0.7, w: 1.9, h: 0.2, + fontSize: 9, fontFace: FONT, color: C.muted, margin: 0, + }); + } + }); + + if (rounds.length > maxDisplay) { + s.addText(`+ ${rounds.length - maxDisplay} more rounds not shown`, { + x: 0.8, y: 5.0, w: 8.4, h: 0.3, + fontSize: 11, fontFace: FONT, color: C.muted, italic: true, align: "center", + }); + } + } + + // ════════════════════════════════════════ + // SLIDE 4: Acceptance Checks + // ════════════════════════════════════════ + sn++; + { + const s = pres.addSlide({ masterName: "LIGHT" }); + addSlideNum(s, sn, TOTAL); + s.addText("Acceptance Checks", { + x: 0.8, y: 0.3, w: 8.4, h: 0.7, + fontSize: 28, fontFace: FONT_BOLD, color: C.navy, bold: true, + }); + s.addText("Final round check command results", { + x: 0.8, y: 0.95, w: 8.4, h: 0.35, + fontSize: 14, fontFace: FONT, color: C.muted, italic: true, + }); + + const checks = data.final_checks || []; + if (checks.length === 0) { + s.addText("No acceptance checks configured for this run.", { + x: 0.8, y: 2.0, w: 8.4, h: 0.5, + fontSize: 16, fontFace: FONT, color: C.muted, align: "center", + }); + } else { + // Table header + s.addShape(pres.shapes.RECTANGLE, { + x: 0.8, y: 1.5, w: 8.4, h: 0.45, + fill: { color: C.navy }, + }); + s.addText("Command", { + x: 0.8, y: 1.5, w: 5.5, h: 0.45, + fontSize: 12, fontFace: FONT_BOLD, color: C.white, bold: true, + valign: "middle", margin: [0, 0, 0, 10], + }); + s.addText("Exit Code", { + x: 6.3, y: 1.5, w: 1.4, h: 0.45, + fontSize: 12, fontFace: FONT_BOLD, color: C.white, bold: true, + align: "center", valign: "middle", + }); + s.addText("Result", { + x: 7.7, y: 1.5, w: 1.5, h: 0.45, + fontSize: 12, fontFace: FONT_BOLD, color: C.white, bold: true, + align: "center", valign: "middle", + }); + + checks.slice(0, 8).forEach((check, i) => { + const yy = 2.0 + i * 0.45; + const bg = i % 2 === 0 ? C.cardBg : C.lightBg; + s.addShape(pres.shapes.RECTANGLE, { + x: 0.8, y: yy, w: 8.4, h: 0.42, + fill: { color: bg }, + }); + s.addText(truncate(check.command || "", 60), { + x: 0.8, y: yy, w: 5.5, h: 0.42, + fontSize: 11, fontFace: FONT_MONO, color: "444444", + valign: "middle", margin: [0, 0, 0, 10], + }); + s.addText(String(check.exit_code), { + x: 6.3, y: yy, w: 1.4, h: 0.42, + fontSize: 11, fontFace: FONT_MONO, color: "444444", + align: "center", valign: "middle", + }); + const passed = check.passed; + s.addShape(pres.shapes.ROUNDED_RECTANGLE, { + x: 7.9, y: yy + 0.06, w: 1.1, h: 0.3, + fill: { color: passed ? C.green : C.red }, rectRadius: 0.05, + }); + s.addText(passed ? "PASS" : "FAIL", { + x: 7.9, y: yy + 0.06, w: 1.1, h: 0.3, + fontSize: 10, fontFace: FONT_BOLD, color: C.white, bold: true, + align: "center", valign: "middle", + }); + }); + } + } + + // ════════════════════════════════════════ + // SLIDE 5: Reviewer & Planner Summary + // ════════════════════════════════════════ + sn++; + { + const s = pres.addSlide({ masterName: "WHITE" }); + addSlideNum(s, sn, TOTAL); + s.addText("Reviewer & Planner Summary", { + x: 0.8, y: 0.3, w: 8.4, h: 0.7, + fontSize: 28, fontFace: FONT_BOLD, color: C.navy, bold: true, + }); + + // Reviewer card + s.addShape(pres.shapes.RECTANGLE, { + x: 0.8, y: 1.2, w: 4.0, h: 3.5, + fill: { color: C.cardBg }, shadow: mkShadow(), + }); + s.addShape(pres.shapes.RECTANGLE, { + x: 0.8, y: 1.2, w: 4.0, h: 0.08, + fill: { color: C.orange }, + }); + s.addText("Reviewer", { + x: 1.1, y: 1.35, w: 3.4, h: 0.4, + fontSize: 18, fontFace: FONT_BOLD, color: C.navy, bold: true, + }); + const reviewerLines = [ + `Verdict: ${data.reviewer_verdict || "N/A"}`, + `Reason: ${truncate(data.reviewer_reason || "N/A", 150)}`, + `Next Action: ${truncate(data.reviewer_next_action || "N/A", 150)}`, + ]; + s.addText(reviewerLines.join("\n\n"), { + x: 1.1, y: 1.85, w: 3.4, h: 2.6, + fontSize: 11, fontFace: FONT, color: "555555", + }); + + // Planner card + s.addShape(pres.shapes.RECTANGLE, { + x: 5.2, y: 1.2, w: 4.0, h: 3.5, + fill: { color: C.cardBg }, shadow: mkShadow(), + }); + s.addShape(pres.shapes.RECTANGLE, { + x: 5.2, y: 1.2, w: 4.0, h: 0.08, + fill: { color: C.purple }, + }); + s.addText("Planner", { + x: 5.5, y: 1.35, w: 3.4, h: 0.4, + fontSize: 18, fontFace: FONT_BOLD, color: C.navy, bold: true, + }); + const plannerLines = [ + `Follow-up Required: ${data.planner_follow_up_required !== undefined ? data.planner_follow_up_required : "N/A"}`, + `Next Explore: ${truncate(data.planner_next_explore || "N/A", 150)}`, + `Main Instruction: ${truncate(data.planner_main_instruction || "N/A", 150)}`, + ]; + s.addText(plannerLines.join("\n\n"), { + x: 5.5, y: 1.85, w: 3.4, h: 2.6, + fontSize: 11, fontFace: FONT, color: "555555", + }); + } + + // ════════════════════════════════════════ + // SLIDE 6: Key Metrics + // ════════════════════════════════════════ + sn++; + { + const s = pres.addSlide({ masterName: "LIGHT" }); + addSlideNum(s, sn, TOTAL); + s.addText("Key Metrics", { + x: 0.8, y: 0.3, w: 8.4, h: 0.7, + fontSize: 28, fontFace: FONT_BOLD, color: C.navy, bold: true, + }); + + const metrics = [ + [String(data.total_rounds || 0), "Total Rounds", C.accent], + [String(data.checks_passed || 0), "Checks Passed", C.green], + [String(data.checks_failed || 0), "Checks Failed", data.checks_failed > 0 ? C.red : C.muted], + [data.plan_mode || "off", "Plan Mode", C.purple], + ]; + metrics.forEach(([value, label, color], i) => { + const xx = 0.5 + i * 2.3; + s.addShape(pres.shapes.RECTANGLE, { + x: xx, y: 1.3, w: 2.05, h: 2.0, + fill: { color: C.cardBg }, shadow: mkShadow(), + }); + s.addText(value, { + x: xx, y: 1.5, w: 2.05, h: 1.0, + fontSize: 36, fontFace: FONT_BOLD, color, bold: true, + align: "center", valign: "middle", + }); + s.addText(label, { + x: xx, y: 2.5, w: 2.05, h: 0.5, + fontSize: 14, fontFace: FONT, color: C.muted, + align: "center", valign: "middle", + }); + }); + + // Duration if available + if (data.duration_display) { + s.addShape(pres.shapes.RECTANGLE, { + x: 0.8, y: 3.6, w: 8.4, h: 0.6, + fill: { color: C.cardBg }, shadow: mkShadow(), + }); + s.addText(`Duration: ${data.duration_display}`, { + x: 0.8, y: 3.6, w: 8.4, h: 0.6, + fontSize: 16, fontFace: FONT, color: C.navy, + align: "center", valign: "middle", + }); + } + } + + // ════════════════════════════════════════ + // SLIDE 7 (optional): Operator Messages + // ════════════════════════════════════════ + if (hasOperatorMessages) { + sn++; + const s = pres.addSlide({ masterName: "WHITE" }); + addSlideNum(s, sn, TOTAL); + s.addText("Operator Messages", { + x: 0.8, y: 0.3, w: 8.4, h: 0.7, + fontSize: 28, fontFace: FONT_BOLD, color: C.navy, bold: true, + }); + s.addText("Messages sent during the run via control channels", { + x: 0.8, y: 0.95, w: 8.4, h: 0.35, + fontSize: 14, fontFace: FONT, color: C.muted, italic: true, + }); + + const msgs = data.operator_messages.slice(0, 10); + msgs.forEach((msg, i) => { + const yy = 1.5 + i * 0.42; + const bg = i % 2 === 0 ? C.cardBg : C.lightBg; + s.addShape(pres.shapes.RECTANGLE, { + x: 0.8, y: yy, w: 8.4, h: 0.38, + fill: { color: bg }, + }); + s.addText(truncate(msg, 120), { + x: 1.0, y: yy, w: 8.0, h: 0.38, + fontSize: 10, fontFace: FONT, color: "444444", + valign: "middle", + }); + }); + + if (data.operator_messages.length > 10) { + s.addText(`+ ${data.operator_messages.length - 10} more messages not shown`, { + x: 0.8, y: 5.0, w: 8.4, h: 0.3, + fontSize: 11, fontFace: FONT, color: C.muted, italic: true, align: "center", + }); + } + } + + // ════════════════════════════════════════ + // SLIDE 8: Conclusion + // ════════════════════════════════════════ + sn++; + { + const s = pres.addSlide({ masterName: "DARK" }); + addSlideNum(s, sn, TOTAL, true); + s.addText("Conclusion", { + x: 0.8, y: 0.5, w: 8.4, h: 0.8, + fontSize: 36, fontFace: FONT_BOLD, color: C.white, bold: true, + }); + + const finalStatus = data.success ? "Run Completed Successfully" : "Run Did Not Complete"; + const finalColor = data.success ? C.green : C.orange; + s.addShape(pres.shapes.ROUNDED_RECTANGLE, { + x: 0.8, y: 1.5, w: 8.4, h: 0.6, + fill: { color: finalColor }, rectRadius: 0.1, + }); + s.addText(finalStatus, { + x: 0.8, y: 1.5, w: 8.4, h: 0.6, + fontSize: 20, fontFace: FONT_BOLD, color: C.white, bold: true, + align: "center", valign: "middle", + }); + + // Summary items + const items = [ + `Stop Reason: ${truncate(data.stop_reason || "N/A", 100)}`, + `Rounds Executed: ${data.total_rounds || 0}`, + `Final Reviewer Status: ${data.reviewer_verdict || "N/A"}`, + `Checks: ${data.checks_passed || 0} passed, ${data.checks_failed || 0} failed`, + ]; + if (data.planner_next_explore && data.planner_next_explore !== "N/A") { + items.push(`Planner Next Explore: ${truncate(data.planner_next_explore, 80)}`); + } + + s.addText(items.map((t, i) => ({ + text: t, options: { bullet: true, breakLine: i < items.length - 1, color: "C8E6F5" } + })), { + x: 0.8, y: 2.4, w: 8.4, h: 2.5, + fontSize: 14, fontFace: FONT, color: C.ice, paraSpaceAfter: 10, + }); + + s.addText("Generated by ArgusBot", { + x: 0.8, y: 5.0, w: 8.4, h: 0.3, + fontSize: 10, fontFace: FONT, color: C.muted, align: "center", + }); + } + + // ── Save ── + pres.writeFile({ fileName: outputPath }).then(() => { + console.log("PPTX report generated:", outputPath); + }).catch(err => { + console.error("Failed to write PPTX:", err); + process.exit(1); + }); +} + +main(); diff --git a/skills/pptx-run-report/pptxgenjs.md b/skills/pptx-run-report/pptxgenjs.md new file mode 100644 index 0000000..6bfed90 --- /dev/null +++ b/skills/pptx-run-report/pptxgenjs.md @@ -0,0 +1,420 @@ +# PptxGenJS Tutorial + +## Setup & Basic Structure + +```javascript +const pptxgen = require("pptxgenjs"); + +let pres = new pptxgen(); +pres.layout = 'LAYOUT_16x9'; // or 'LAYOUT_16x10', 'LAYOUT_4x3', 'LAYOUT_WIDE' +pres.author = 'Your Name'; +pres.title = 'Presentation Title'; + +let slide = pres.addSlide(); +slide.addText("Hello World!", { x: 0.5, y: 0.5, fontSize: 36, color: "363636" }); + +pres.writeFile({ fileName: "Presentation.pptx" }); +``` + +## Layout Dimensions + +Slide dimensions (coordinates in inches): +- `LAYOUT_16x9`: 10" × 5.625" (default) +- `LAYOUT_16x10`: 10" × 6.25" +- `LAYOUT_4x3`: 10" × 7.5" +- `LAYOUT_WIDE`: 13.3" × 7.5" + +--- + +## Text & Formatting + +```javascript +// Basic text +slide.addText("Simple Text", { + x: 1, y: 1, w: 8, h: 2, fontSize: 24, fontFace: "Arial", + color: "363636", bold: true, align: "center", valign: "middle" +}); + +// Character spacing (use charSpacing, not letterSpacing which is silently ignored) +slide.addText("SPACED TEXT", { x: 1, y: 1, w: 8, h: 1, charSpacing: 6 }); + +// Rich text arrays +slide.addText([ + { text: "Bold ", options: { bold: true } }, + { text: "Italic ", options: { italic: true } } +], { x: 1, y: 3, w: 8, h: 1 }); + +// Multi-line text (requires breakLine: true) +slide.addText([ + { text: "Line 1", options: { breakLine: true } }, + { text: "Line 2", options: { breakLine: true } }, + { text: "Line 3" } // Last item doesn't need breakLine +], { x: 0.5, y: 0.5, w: 8, h: 2 }); + +// Text box margin (internal padding) +slide.addText("Title", { + x: 0.5, y: 0.3, w: 9, h: 0.6, + margin: 0 // Use 0 when aligning text with other elements like shapes or icons +}); +``` + +**Tip:** Text boxes have internal margin by default. Set `margin: 0` when you need text to align precisely with shapes, lines, or icons at the same x-position. + +--- + +## Lists & Bullets + +```javascript +// ✅ CORRECT: Multiple bullets +slide.addText([ + { text: "First item", options: { bullet: true, breakLine: true } }, + { text: "Second item", options: { bullet: true, breakLine: true } }, + { text: "Third item", options: { bullet: true } } +], { x: 0.5, y: 0.5, w: 8, h: 3 }); + +// ❌ WRONG: Never use unicode bullets +slide.addText("• First item", { ... }); // Creates double bullets + +// Sub-items and numbered lists +{ text: "Sub-item", options: { bullet: true, indentLevel: 1 } } +{ text: "First", options: { bullet: { type: "number" }, breakLine: true } } +``` + +--- + +## Shapes + +```javascript +slide.addShape(pres.shapes.RECTANGLE, { + x: 0.5, y: 0.8, w: 1.5, h: 3.0, + fill: { color: "FF0000" }, line: { color: "000000", width: 2 } +}); + +slide.addShape(pres.shapes.OVAL, { x: 4, y: 1, w: 2, h: 2, fill: { color: "0000FF" } }); + +slide.addShape(pres.shapes.LINE, { + x: 1, y: 3, w: 5, h: 0, line: { color: "FF0000", width: 3, dashType: "dash" } +}); + +// With transparency +slide.addShape(pres.shapes.RECTANGLE, { + x: 1, y: 1, w: 3, h: 2, + fill: { color: "0088CC", transparency: 50 } +}); + +// Rounded rectangle (rectRadius only works with ROUNDED_RECTANGLE, not RECTANGLE) +// ⚠️ Don't pair with rectangular accent overlays — they won't cover rounded corners. Use RECTANGLE instead. +slide.addShape(pres.shapes.ROUNDED_RECTANGLE, { + x: 1, y: 1, w: 3, h: 2, + fill: { color: "FFFFFF" }, rectRadius: 0.1 +}); + +// With shadow +slide.addShape(pres.shapes.RECTANGLE, { + x: 1, y: 1, w: 3, h: 2, + fill: { color: "FFFFFF" }, + shadow: { type: "outer", color: "000000", blur: 6, offset: 2, angle: 135, opacity: 0.15 } +}); +``` + +Shadow options: + +| Property | Type | Range | Notes | +|----------|------|-------|-------| +| `type` | string | `"outer"`, `"inner"` | | +| `color` | string | 6-char hex (e.g. `"000000"`) | No `#` prefix, no 8-char hex — see Common Pitfalls | +| `blur` | number | 0-100 pt | | +| `offset` | number | 0-200 pt | **Must be non-negative** — negative values corrupt the file | +| `angle` | number | 0-359 degrees | Direction the shadow falls (135 = bottom-right, 270 = upward) | +| `opacity` | number | 0.0-1.0 | Use this for transparency, never encode in color string | + +To cast a shadow upward (e.g. on a footer bar), use `angle: 270` with a positive offset — do **not** use a negative offset. + +**Note**: Gradient fills are not natively supported. Use a gradient image as a background instead. + +--- + +## Images + +### Image Sources + +```javascript +// From file path +slide.addImage({ path: "images/chart.png", x: 1, y: 1, w: 5, h: 3 }); + +// From URL +slide.addImage({ path: "https://example.com/image.jpg", x: 1, y: 1, w: 5, h: 3 }); + +// From base64 (faster, no file I/O) +slide.addImage({ data: "image/png;base64,iVBORw0KGgo...", x: 1, y: 1, w: 5, h: 3 }); +``` + +### Image Options + +```javascript +slide.addImage({ + path: "image.png", + x: 1, y: 1, w: 5, h: 3, + rotate: 45, // 0-359 degrees + rounding: true, // Circular crop + transparency: 50, // 0-100 + flipH: true, // Horizontal flip + flipV: false, // Vertical flip + altText: "Description", // Accessibility + hyperlink: { url: "https://example.com" } +}); +``` + +### Image Sizing Modes + +```javascript +// Contain - fit inside, preserve ratio +{ sizing: { type: 'contain', w: 4, h: 3 } } + +// Cover - fill area, preserve ratio (may crop) +{ sizing: { type: 'cover', w: 4, h: 3 } } + +// Crop - cut specific portion +{ sizing: { type: 'crop', x: 0.5, y: 0.5, w: 2, h: 2 } } +``` + +### Calculate Dimensions (preserve aspect ratio) + +```javascript +const origWidth = 1978, origHeight = 923, maxHeight = 3.0; +const calcWidth = maxHeight * (origWidth / origHeight); +const centerX = (10 - calcWidth) / 2; + +slide.addImage({ path: "image.png", x: centerX, y: 1.2, w: calcWidth, h: maxHeight }); +``` + +### Supported Formats + +- **Standard**: PNG, JPG, GIF (animated GIFs work in Microsoft 365) +- **SVG**: Works in modern PowerPoint/Microsoft 365 + +--- + +## Icons + +Use react-icons to generate SVG icons, then rasterize to PNG for universal compatibility. + +### Setup + +```javascript +const React = require("react"); +const ReactDOMServer = require("react-dom/server"); +const sharp = require("sharp"); +const { FaCheckCircle, FaChartLine } = require("react-icons/fa"); + +function renderIconSvg(IconComponent, color = "#000000", size = 256) { + return ReactDOMServer.renderToStaticMarkup( + React.createElement(IconComponent, { color, size: String(size) }) + ); +} + +async function iconToBase64Png(IconComponent, color, size = 256) { + const svg = renderIconSvg(IconComponent, color, size); + const pngBuffer = await sharp(Buffer.from(svg)).png().toBuffer(); + return "image/png;base64," + pngBuffer.toString("base64"); +} +``` + +### Add Icon to Slide + +```javascript +const iconData = await iconToBase64Png(FaCheckCircle, "#4472C4", 256); + +slide.addImage({ + data: iconData, + x: 1, y: 1, w: 0.5, h: 0.5 // Size in inches +}); +``` + +**Note**: Use size 256 or higher for crisp icons. The size parameter controls the rasterization resolution, not the display size on the slide (which is set by `w` and `h` in inches). + +### Icon Libraries + +Install: `npm install -g react-icons react react-dom sharp` + +Popular icon sets in react-icons: +- `react-icons/fa` - Font Awesome +- `react-icons/md` - Material Design +- `react-icons/hi` - Heroicons +- `react-icons/bi` - Bootstrap Icons + +--- + +## Slide Backgrounds + +```javascript +// Solid color +slide.background = { color: "F1F1F1" }; + +// Color with transparency +slide.background = { color: "FF3399", transparency: 50 }; + +// Image from URL +slide.background = { path: "https://example.com/bg.jpg" }; + +// Image from base64 +slide.background = { data: "image/png;base64,iVBORw0KGgo..." }; +``` + +--- + +## Tables + +```javascript +slide.addTable([ + ["Header 1", "Header 2"], + ["Cell 1", "Cell 2"] +], { + x: 1, y: 1, w: 8, h: 2, + border: { pt: 1, color: "999999" }, fill: { color: "F1F1F1" } +}); + +// Advanced with merged cells +let tableData = [ + [{ text: "Header", options: { fill: { color: "6699CC" }, color: "FFFFFF", bold: true } }, "Cell"], + [{ text: "Merged", options: { colspan: 2 } }] +]; +slide.addTable(tableData, { x: 1, y: 3.5, w: 8, colW: [4, 4] }); +``` + +--- + +## Charts + +```javascript +// Bar chart +slide.addChart(pres.charts.BAR, [{ + name: "Sales", labels: ["Q1", "Q2", "Q3", "Q4"], values: [4500, 5500, 6200, 7100] +}], { + x: 0.5, y: 0.6, w: 6, h: 3, barDir: 'col', + showTitle: true, title: 'Quarterly Sales' +}); + +// Line chart +slide.addChart(pres.charts.LINE, [{ + name: "Temp", labels: ["Jan", "Feb", "Mar"], values: [32, 35, 42] +}], { x: 0.5, y: 4, w: 6, h: 3, lineSize: 3, lineSmooth: true }); + +// Pie chart +slide.addChart(pres.charts.PIE, [{ + name: "Share", labels: ["A", "B", "Other"], values: [35, 45, 20] +}], { x: 7, y: 1, w: 5, h: 4, showPercent: true }); +``` + +### Better-Looking Charts + +Default charts look dated. Apply these options for a modern, clean appearance: + +```javascript +slide.addChart(pres.charts.BAR, chartData, { + x: 0.5, y: 1, w: 9, h: 4, barDir: "col", + + // Custom colors (match your presentation palette) + chartColors: ["0D9488", "14B8A6", "5EEAD4"], + + // Clean background + chartArea: { fill: { color: "FFFFFF" }, roundedCorners: true }, + + // Muted axis labels + catAxisLabelColor: "64748B", + valAxisLabelColor: "64748B", + + // Subtle grid (value axis only) + valGridLine: { color: "E2E8F0", size: 0.5 }, + catGridLine: { style: "none" }, + + // Data labels on bars + showValue: true, + dataLabelPosition: "outEnd", + dataLabelColor: "1E293B", + + // Hide legend for single series + showLegend: false, +}); +``` + +**Key styling options:** +- `chartColors: [...]` - hex colors for series/segments +- `chartArea: { fill, border, roundedCorners }` - chart background +- `catGridLine/valGridLine: { color, style, size }` - grid lines (`style: "none"` to hide) +- `lineSmooth: true` - curved lines (line charts) +- `legendPos: "r"` - legend position: "b", "t", "l", "r", "tr" + +--- + +## Slide Masters + +```javascript +pres.defineSlideMaster({ + title: 'TITLE_SLIDE', background: { color: '283A5E' }, + objects: [{ + placeholder: { options: { name: 'title', type: 'title', x: 1, y: 2, w: 8, h: 2 } } + }] +}); + +let titleSlide = pres.addSlide({ masterName: "TITLE_SLIDE" }); +titleSlide.addText("My Title", { placeholder: "title" }); +``` + +--- + +## Common Pitfalls + +⚠️ These issues cause file corruption, visual bugs, or broken output. Avoid them. + +1. **NEVER use "#" with hex colors** - causes file corruption + ```javascript + color: "FF0000" // ✅ CORRECT + color: "#FF0000" // ❌ WRONG + ``` + +2. **NEVER encode opacity in hex color strings** - 8-char colors (e.g., `"00000020"`) corrupt the file. Use the `opacity` property instead. + ```javascript + shadow: { type: "outer", blur: 6, offset: 2, color: "00000020" } // ❌ CORRUPTS FILE + shadow: { type: "outer", blur: 6, offset: 2, color: "000000", opacity: 0.12 } // ✅ CORRECT + ``` + +3. **Use `bullet: true`** - NEVER unicode symbols like "•" (creates double bullets) + +4. **Use `breakLine: true`** between array items or text runs together + +5. **Avoid `lineSpacing` with bullets** - causes excessive gaps; use `paraSpaceAfter` instead + +6. **Each presentation needs fresh instance** - don't reuse `pptxgen()` objects + +7. **NEVER reuse option objects across calls** - PptxGenJS mutates objects in-place (e.g. converting shadow values to EMU). Sharing one object between multiple calls corrupts the second shape. + ```javascript + const shadow = { type: "outer", blur: 6, offset: 2, color: "000000", opacity: 0.15 }; + slide.addShape(pres.shapes.RECTANGLE, { shadow, ... }); // ❌ second call gets already-converted values + slide.addShape(pres.shapes.RECTANGLE, { shadow, ... }); + + const makeShadow = () => ({ type: "outer", blur: 6, offset: 2, color: "000000", opacity: 0.15 }); + slide.addShape(pres.shapes.RECTANGLE, { shadow: makeShadow(), ... }); // ✅ fresh object each time + slide.addShape(pres.shapes.RECTANGLE, { shadow: makeShadow(), ... }); + ``` + +8. **Don't use `ROUNDED_RECTANGLE` with accent borders** - rectangular overlay bars won't cover rounded corners. Use `RECTANGLE` instead. + ```javascript + // ❌ WRONG: Accent bar doesn't cover rounded corners + slide.addShape(pres.shapes.ROUNDED_RECTANGLE, { x: 1, y: 1, w: 3, h: 1.5, fill: { color: "FFFFFF" } }); + slide.addShape(pres.shapes.RECTANGLE, { x: 1, y: 1, w: 0.08, h: 1.5, fill: { color: "0891B2" } }); + + // ✅ CORRECT: Use RECTANGLE for clean alignment + slide.addShape(pres.shapes.RECTANGLE, { x: 1, y: 1, w: 3, h: 1.5, fill: { color: "FFFFFF" } }); + slide.addShape(pres.shapes.RECTANGLE, { x: 1, y: 1, w: 0.08, h: 1.5, fill: { color: "0891B2" } }); + ``` + +--- + +## Quick Reference + +- **Shapes**: RECTANGLE, OVAL, LINE, ROUNDED_RECTANGLE +- **Charts**: BAR, LINE, PIE, DOUGHNUT, SCATTER, BUBBLE, RADAR +- **Layouts**: LAYOUT_16x9 (10"×5.625"), LAYOUT_16x10, LAYOUT_4x3, LAYOUT_WIDE +- **Alignment**: "left", "center", "right" +- **Chart data labels**: "outEnd", "inEnd", "center" diff --git a/skills/pptx-run-report/scripts/__init__.py b/skills/pptx-run-report/scripts/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/skills/pptx-run-report/scripts/add_slide.py b/skills/pptx-run-report/scripts/add_slide.py new file mode 100755 index 0000000..13700df --- /dev/null +++ b/skills/pptx-run-report/scripts/add_slide.py @@ -0,0 +1,195 @@ +"""Add a new slide to an unpacked PPTX directory. + +Usage: python add_slide.py + +The source can be: + - A slide file (e.g., slide2.xml) - duplicates the slide + - A layout file (e.g., slideLayout2.xml) - creates from layout + +Examples: + python add_slide.py unpacked/ slide2.xml + # Duplicates slide2, creates slide5.xml + + python add_slide.py unpacked/ slideLayout2.xml + # Creates slide5.xml from slideLayout2.xml + +To see available layouts: ls unpacked/ppt/slideLayouts/ + +Prints the element to add to presentation.xml. +""" + +import re +import shutil +import sys +from pathlib import Path + + +def get_next_slide_number(slides_dir: Path) -> int: + existing = [int(m.group(1)) for f in slides_dir.glob("slide*.xml") + if (m := re.match(r"slide(\d+)\.xml", f.name))] + return max(existing) + 1 if existing else 1 + + +def create_slide_from_layout(unpacked_dir: Path, layout_file: str) -> None: + slides_dir = unpacked_dir / "ppt" / "slides" + rels_dir = slides_dir / "_rels" + layouts_dir = unpacked_dir / "ppt" / "slideLayouts" + + layout_path = layouts_dir / layout_file + if not layout_path.exists(): + print(f"Error: {layout_path} not found", file=sys.stderr) + sys.exit(1) + + next_num = get_next_slide_number(slides_dir) + dest = f"slide{next_num}.xml" + dest_slide = slides_dir / dest + dest_rels = rels_dir / f"{dest}.rels" + + slide_xml = ''' + + + + + + + + + + + + + + + + + + + + + +''' + dest_slide.write_text(slide_xml, encoding="utf-8") + + rels_dir.mkdir(exist_ok=True) + rels_xml = f''' + + +''' + dest_rels.write_text(rels_xml, encoding="utf-8") + + _add_to_content_types(unpacked_dir, dest) + + rid = _add_to_presentation_rels(unpacked_dir, dest) + + next_slide_id = _get_next_slide_id(unpacked_dir) + + print(f"Created {dest} from {layout_file}") + print(f'Add to presentation.xml : ') + + +def duplicate_slide(unpacked_dir: Path, source: str) -> None: + slides_dir = unpacked_dir / "ppt" / "slides" + rels_dir = slides_dir / "_rels" + + source_slide = slides_dir / source + + if not source_slide.exists(): + print(f"Error: {source_slide} not found", file=sys.stderr) + sys.exit(1) + + next_num = get_next_slide_number(slides_dir) + dest = f"slide{next_num}.xml" + dest_slide = slides_dir / dest + + source_rels = rels_dir / f"{source}.rels" + dest_rels = rels_dir / f"{dest}.rels" + + shutil.copy2(source_slide, dest_slide) + + if source_rels.exists(): + shutil.copy2(source_rels, dest_rels) + + rels_content = dest_rels.read_text(encoding="utf-8") + rels_content = re.sub( + r'\s*]*Type="[^"]*notesSlide"[^>]*/>\s*', + "\n", + rels_content, + ) + dest_rels.write_text(rels_content, encoding="utf-8") + + _add_to_content_types(unpacked_dir, dest) + + rid = _add_to_presentation_rels(unpacked_dir, dest) + + next_slide_id = _get_next_slide_id(unpacked_dir) + + print(f"Created {dest} from {source}") + print(f'Add to presentation.xml : ') + + +def _add_to_content_types(unpacked_dir: Path, dest: str) -> None: + content_types_path = unpacked_dir / "[Content_Types].xml" + content_types = content_types_path.read_text(encoding="utf-8") + + new_override = f'' + + if f"/ppt/slides/{dest}" not in content_types: + content_types = content_types.replace("", f" {new_override}\n") + content_types_path.write_text(content_types, encoding="utf-8") + + +def _add_to_presentation_rels(unpacked_dir: Path, dest: str) -> str: + pres_rels_path = unpacked_dir / "ppt" / "_rels" / "presentation.xml.rels" + pres_rels = pres_rels_path.read_text(encoding="utf-8") + + rids = [int(m) for m in re.findall(r'Id="rId(\d+)"', pres_rels)] + next_rid = max(rids) + 1 if rids else 1 + rid = f"rId{next_rid}" + + new_rel = f'' + + if f"slides/{dest}" not in pres_rels: + pres_rels = pres_rels.replace("", f" {new_rel}\n") + pres_rels_path.write_text(pres_rels, encoding="utf-8") + + return rid + + +def _get_next_slide_id(unpacked_dir: Path) -> int: + pres_path = unpacked_dir / "ppt" / "presentation.xml" + pres_content = pres_path.read_text(encoding="utf-8") + slide_ids = [int(m) for m in re.findall(r']*id="(\d+)"', pres_content)] + return max(slide_ids) + 1 if slide_ids else 256 + + +def parse_source(source: str) -> tuple[str, str | None]: + if source.startswith("slideLayout") and source.endswith(".xml"): + return ("layout", source) + + return ("slide", None) + + +if __name__ == "__main__": + if len(sys.argv) != 3: + print("Usage: python add_slide.py ", file=sys.stderr) + print("", file=sys.stderr) + print("Source can be:", file=sys.stderr) + print(" slide2.xml - duplicate an existing slide", file=sys.stderr) + print(" slideLayout2.xml - create from a layout template", file=sys.stderr) + print("", file=sys.stderr) + print("To see available layouts: ls /ppt/slideLayouts/", file=sys.stderr) + sys.exit(1) + + unpacked_dir = Path(sys.argv[1]) + source = sys.argv[2] + + if not unpacked_dir.exists(): + print(f"Error: {unpacked_dir} not found", file=sys.stderr) + sys.exit(1) + + source_type, layout_file = parse_source(source) + + if source_type == "layout" and layout_file is not None: + create_slide_from_layout(unpacked_dir, layout_file) + else: + duplicate_slide(unpacked_dir, source) diff --git a/skills/pptx-run-report/scripts/clean.py b/skills/pptx-run-report/scripts/clean.py new file mode 100755 index 0000000..3d13994 --- /dev/null +++ b/skills/pptx-run-report/scripts/clean.py @@ -0,0 +1,286 @@ +"""Remove unreferenced files from an unpacked PPTX directory. + +Usage: python clean.py + +Example: + python clean.py unpacked/ + +This script removes: +- Orphaned slides (not in sldIdLst) and their relationships +- [trash] directory (unreferenced files) +- Orphaned .rels files for deleted resources +- Unreferenced media, embeddings, charts, diagrams, drawings, ink files +- Unreferenced theme files +- Unreferenced notes slides +- Content-Type overrides for deleted files +""" + +import sys +from pathlib import Path + +import defusedxml.minidom + + +import re + + +def get_slides_in_sldidlst(unpacked_dir: Path) -> set[str]: + pres_path = unpacked_dir / "ppt" / "presentation.xml" + pres_rels_path = unpacked_dir / "ppt" / "_rels" / "presentation.xml.rels" + + if not pres_path.exists() or not pres_rels_path.exists(): + return set() + + rels_dom = defusedxml.minidom.parse(str(pres_rels_path)) + rid_to_slide = {} + for rel in rels_dom.getElementsByTagName("Relationship"): + rid = rel.getAttribute("Id") + target = rel.getAttribute("Target") + rel_type = rel.getAttribute("Type") + if "slide" in rel_type and target.startswith("slides/"): + rid_to_slide[rid] = target.replace("slides/", "") + + pres_content = pres_path.read_text(encoding="utf-8") + referenced_rids = set(re.findall(r']*r:id="([^"]+)"', pres_content)) + + return {rid_to_slide[rid] for rid in referenced_rids if rid in rid_to_slide} + + +def remove_orphaned_slides(unpacked_dir: Path) -> list[str]: + slides_dir = unpacked_dir / "ppt" / "slides" + slides_rels_dir = slides_dir / "_rels" + pres_rels_path = unpacked_dir / "ppt" / "_rels" / "presentation.xml.rels" + + if not slides_dir.exists(): + return [] + + referenced_slides = get_slides_in_sldidlst(unpacked_dir) + removed = [] + + for slide_file in slides_dir.glob("slide*.xml"): + if slide_file.name not in referenced_slides: + rel_path = slide_file.relative_to(unpacked_dir) + slide_file.unlink() + removed.append(str(rel_path)) + + rels_file = slides_rels_dir / f"{slide_file.name}.rels" + if rels_file.exists(): + rels_file.unlink() + removed.append(str(rels_file.relative_to(unpacked_dir))) + + if removed and pres_rels_path.exists(): + rels_dom = defusedxml.minidom.parse(str(pres_rels_path)) + changed = False + + for rel in list(rels_dom.getElementsByTagName("Relationship")): + target = rel.getAttribute("Target") + if target.startswith("slides/"): + slide_name = target.replace("slides/", "") + if slide_name not in referenced_slides: + if rel.parentNode: + rel.parentNode.removeChild(rel) + changed = True + + if changed: + with open(pres_rels_path, "wb") as f: + f.write(rels_dom.toxml(encoding="utf-8")) + + return removed + + +def remove_trash_directory(unpacked_dir: Path) -> list[str]: + trash_dir = unpacked_dir / "[trash]" + removed = [] + + if trash_dir.exists() and trash_dir.is_dir(): + for file_path in trash_dir.iterdir(): + if file_path.is_file(): + rel_path = file_path.relative_to(unpacked_dir) + removed.append(str(rel_path)) + file_path.unlink() + trash_dir.rmdir() + + return removed + + +def get_slide_referenced_files(unpacked_dir: Path) -> set: + referenced = set() + slides_rels_dir = unpacked_dir / "ppt" / "slides" / "_rels" + + if not slides_rels_dir.exists(): + return referenced + + for rels_file in slides_rels_dir.glob("*.rels"): + dom = defusedxml.minidom.parse(str(rels_file)) + for rel in dom.getElementsByTagName("Relationship"): + target = rel.getAttribute("Target") + if not target: + continue + target_path = (rels_file.parent.parent / target).resolve() + try: + referenced.add(target_path.relative_to(unpacked_dir.resolve())) + except ValueError: + pass + + return referenced + + +def remove_orphaned_rels_files(unpacked_dir: Path) -> list[str]: + resource_dirs = ["charts", "diagrams", "drawings"] + removed = [] + slide_referenced = get_slide_referenced_files(unpacked_dir) + + for dir_name in resource_dirs: + rels_dir = unpacked_dir / "ppt" / dir_name / "_rels" + if not rels_dir.exists(): + continue + + for rels_file in rels_dir.glob("*.rels"): + resource_file = rels_dir.parent / rels_file.name.replace(".rels", "") + try: + resource_rel_path = resource_file.resolve().relative_to(unpacked_dir.resolve()) + except ValueError: + continue + + if not resource_file.exists() or resource_rel_path not in slide_referenced: + rels_file.unlink() + rel_path = rels_file.relative_to(unpacked_dir) + removed.append(str(rel_path)) + + return removed + + +def get_referenced_files(unpacked_dir: Path) -> set: + referenced = set() + + for rels_file in unpacked_dir.rglob("*.rels"): + dom = defusedxml.minidom.parse(str(rels_file)) + for rel in dom.getElementsByTagName("Relationship"): + target = rel.getAttribute("Target") + if not target: + continue + target_path = (rels_file.parent.parent / target).resolve() + try: + referenced.add(target_path.relative_to(unpacked_dir.resolve())) + except ValueError: + pass + + return referenced + + +def remove_orphaned_files(unpacked_dir: Path, referenced: set) -> list[str]: + resource_dirs = ["media", "embeddings", "charts", "diagrams", "tags", "drawings", "ink"] + removed = [] + + for dir_name in resource_dirs: + dir_path = unpacked_dir / "ppt" / dir_name + if not dir_path.exists(): + continue + + for file_path in dir_path.glob("*"): + if not file_path.is_file(): + continue + rel_path = file_path.relative_to(unpacked_dir) + if rel_path not in referenced: + file_path.unlink() + removed.append(str(rel_path)) + + theme_dir = unpacked_dir / "ppt" / "theme" + if theme_dir.exists(): + for file_path in theme_dir.glob("theme*.xml"): + rel_path = file_path.relative_to(unpacked_dir) + if rel_path not in referenced: + file_path.unlink() + removed.append(str(rel_path)) + theme_rels = theme_dir / "_rels" / f"{file_path.name}.rels" + if theme_rels.exists(): + theme_rels.unlink() + removed.append(str(theme_rels.relative_to(unpacked_dir))) + + notes_dir = unpacked_dir / "ppt" / "notesSlides" + if notes_dir.exists(): + for file_path in notes_dir.glob("*.xml"): + if not file_path.is_file(): + continue + rel_path = file_path.relative_to(unpacked_dir) + if rel_path not in referenced: + file_path.unlink() + removed.append(str(rel_path)) + + notes_rels_dir = notes_dir / "_rels" + if notes_rels_dir.exists(): + for file_path in notes_rels_dir.glob("*.rels"): + notes_file = notes_dir / file_path.name.replace(".rels", "") + if not notes_file.exists(): + file_path.unlink() + removed.append(str(file_path.relative_to(unpacked_dir))) + + return removed + + +def update_content_types(unpacked_dir: Path, removed_files: list[str]) -> None: + ct_path = unpacked_dir / "[Content_Types].xml" + if not ct_path.exists(): + return + + dom = defusedxml.minidom.parse(str(ct_path)) + changed = False + + for override in list(dom.getElementsByTagName("Override")): + part_name = override.getAttribute("PartName").lstrip("/") + if part_name in removed_files: + if override.parentNode: + override.parentNode.removeChild(override) + changed = True + + if changed: + with open(ct_path, "wb") as f: + f.write(dom.toxml(encoding="utf-8")) + + +def clean_unused_files(unpacked_dir: Path) -> list[str]: + all_removed = [] + + slides_removed = remove_orphaned_slides(unpacked_dir) + all_removed.extend(slides_removed) + + trash_removed = remove_trash_directory(unpacked_dir) + all_removed.extend(trash_removed) + + while True: + removed_rels = remove_orphaned_rels_files(unpacked_dir) + referenced = get_referenced_files(unpacked_dir) + removed_files = remove_orphaned_files(unpacked_dir, referenced) + + total_removed = removed_rels + removed_files + if not total_removed: + break + + all_removed.extend(total_removed) + + if all_removed: + update_content_types(unpacked_dir, all_removed) + + return all_removed + + +if __name__ == "__main__": + if len(sys.argv) != 2: + print("Usage: python clean.py ", file=sys.stderr) + print("Example: python clean.py unpacked/", file=sys.stderr) + sys.exit(1) + + unpacked_dir = Path(sys.argv[1]) + + if not unpacked_dir.exists(): + print(f"Error: {unpacked_dir} not found", file=sys.stderr) + sys.exit(1) + + removed = clean_unused_files(unpacked_dir) + + if removed: + print(f"Removed {len(removed)} unreferenced files:") + for f in removed: + print(f" {f}") + else: + print("No unreferenced files found") diff --git a/skills/pptx-run-report/scripts/office/helpers/__init__.py b/skills/pptx-run-report/scripts/office/helpers/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/skills/pptx-run-report/scripts/office/helpers/merge_runs.py b/skills/pptx-run-report/scripts/office/helpers/merge_runs.py new file mode 100644 index 0000000..ad7c25e --- /dev/null +++ b/skills/pptx-run-report/scripts/office/helpers/merge_runs.py @@ -0,0 +1,199 @@ +"""Merge adjacent runs with identical formatting in DOCX. + +Merges adjacent elements that have identical properties. +Works on runs in paragraphs and inside tracked changes (, ). + +Also: +- Removes rsid attributes from runs (revision metadata that doesn't affect rendering) +- Removes proofErr elements (spell/grammar markers that block merging) +""" + +from pathlib import Path + +import defusedxml.minidom + + +def merge_runs(input_dir: str) -> tuple[int, str]: + doc_xml = Path(input_dir) / "word" / "document.xml" + + if not doc_xml.exists(): + return 0, f"Error: {doc_xml} not found" + + try: + dom = defusedxml.minidom.parseString(doc_xml.read_text(encoding="utf-8")) + root = dom.documentElement + + _remove_elements(root, "proofErr") + _strip_run_rsid_attrs(root) + + containers = {run.parentNode for run in _find_elements(root, "r")} + + merge_count = 0 + for container in containers: + merge_count += _merge_runs_in(container) + + doc_xml.write_bytes(dom.toxml(encoding="UTF-8")) + return merge_count, f"Merged {merge_count} runs" + + except Exception as e: + return 0, f"Error: {e}" + + + + +def _find_elements(root, tag: str) -> list: + results = [] + + def traverse(node): + if node.nodeType == node.ELEMENT_NODE: + name = node.localName or node.tagName + if name == tag or name.endswith(f":{tag}"): + results.append(node) + for child in node.childNodes: + traverse(child) + + traverse(root) + return results + + +def _get_child(parent, tag: str): + for child in parent.childNodes: + if child.nodeType == child.ELEMENT_NODE: + name = child.localName or child.tagName + if name == tag or name.endswith(f":{tag}"): + return child + return None + + +def _get_children(parent, tag: str) -> list: + results = [] + for child in parent.childNodes: + if child.nodeType == child.ELEMENT_NODE: + name = child.localName or child.tagName + if name == tag or name.endswith(f":{tag}"): + results.append(child) + return results + + +def _is_adjacent(elem1, elem2) -> bool: + node = elem1.nextSibling + while node: + if node == elem2: + return True + if node.nodeType == node.ELEMENT_NODE: + return False + if node.nodeType == node.TEXT_NODE and node.data.strip(): + return False + node = node.nextSibling + return False + + + + +def _remove_elements(root, tag: str): + for elem in _find_elements(root, tag): + if elem.parentNode: + elem.parentNode.removeChild(elem) + + +def _strip_run_rsid_attrs(root): + for run in _find_elements(root, "r"): + for attr in list(run.attributes.values()): + if "rsid" in attr.name.lower(): + run.removeAttribute(attr.name) + + + + +def _merge_runs_in(container) -> int: + merge_count = 0 + run = _first_child_run(container) + + while run: + while True: + next_elem = _next_element_sibling(run) + if next_elem and _is_run(next_elem) and _can_merge(run, next_elem): + _merge_run_content(run, next_elem) + container.removeChild(next_elem) + merge_count += 1 + else: + break + + _consolidate_text(run) + run = _next_sibling_run(run) + + return merge_count + + +def _first_child_run(container): + for child in container.childNodes: + if child.nodeType == child.ELEMENT_NODE and _is_run(child): + return child + return None + + +def _next_element_sibling(node): + sibling = node.nextSibling + while sibling: + if sibling.nodeType == sibling.ELEMENT_NODE: + return sibling + sibling = sibling.nextSibling + return None + + +def _next_sibling_run(node): + sibling = node.nextSibling + while sibling: + if sibling.nodeType == sibling.ELEMENT_NODE: + if _is_run(sibling): + return sibling + sibling = sibling.nextSibling + return None + + +def _is_run(node) -> bool: + name = node.localName or node.tagName + return name == "r" or name.endswith(":r") + + +def _can_merge(run1, run2) -> bool: + rpr1 = _get_child(run1, "rPr") + rpr2 = _get_child(run2, "rPr") + + if (rpr1 is None) != (rpr2 is None): + return False + if rpr1 is None: + return True + return rpr1.toxml() == rpr2.toxml() + + +def _merge_run_content(target, source): + for child in list(source.childNodes): + if child.nodeType == child.ELEMENT_NODE: + name = child.localName or child.tagName + if name != "rPr" and not name.endswith(":rPr"): + target.appendChild(child) + + +def _consolidate_text(run): + t_elements = _get_children(run, "t") + + for i in range(len(t_elements) - 1, 0, -1): + curr, prev = t_elements[i], t_elements[i - 1] + + if _is_adjacent(prev, curr): + prev_text = prev.firstChild.data if prev.firstChild else "" + curr_text = curr.firstChild.data if curr.firstChild else "" + merged = prev_text + curr_text + + if prev.firstChild: + prev.firstChild.data = merged + else: + prev.appendChild(run.ownerDocument.createTextNode(merged)) + + if merged.startswith(" ") or merged.endswith(" "): + prev.setAttribute("xml:space", "preserve") + elif prev.hasAttribute("xml:space"): + prev.removeAttribute("xml:space") + + run.removeChild(curr) diff --git a/skills/pptx-run-report/scripts/office/helpers/simplify_redlines.py b/skills/pptx-run-report/scripts/office/helpers/simplify_redlines.py new file mode 100644 index 0000000..db963bb --- /dev/null +++ b/skills/pptx-run-report/scripts/office/helpers/simplify_redlines.py @@ -0,0 +1,197 @@ +"""Simplify tracked changes by merging adjacent w:ins or w:del elements. + +Merges adjacent elements from the same author into a single element. +Same for elements. This makes heavily-redlined documents easier to +work with by reducing the number of tracked change wrappers. + +Rules: +- Only merges w:ins with w:ins, w:del with w:del (same element type) +- Only merges if same author (ignores timestamp differences) +- Only merges if truly adjacent (only whitespace between them) +""" + +import xml.etree.ElementTree as ET +import zipfile +from pathlib import Path + +import defusedxml.minidom + +WORD_NS = "http://schemas.openxmlformats.org/wordprocessingml/2006/main" + + +def simplify_redlines(input_dir: str) -> tuple[int, str]: + doc_xml = Path(input_dir) / "word" / "document.xml" + + if not doc_xml.exists(): + return 0, f"Error: {doc_xml} not found" + + try: + dom = defusedxml.minidom.parseString(doc_xml.read_text(encoding="utf-8")) + root = dom.documentElement + + merge_count = 0 + + containers = _find_elements(root, "p") + _find_elements(root, "tc") + + for container in containers: + merge_count += _merge_tracked_changes_in(container, "ins") + merge_count += _merge_tracked_changes_in(container, "del") + + doc_xml.write_bytes(dom.toxml(encoding="UTF-8")) + return merge_count, f"Simplified {merge_count} tracked changes" + + except Exception as e: + return 0, f"Error: {e}" + + +def _merge_tracked_changes_in(container, tag: str) -> int: + merge_count = 0 + + tracked = [ + child + for child in container.childNodes + if child.nodeType == child.ELEMENT_NODE and _is_element(child, tag) + ] + + if len(tracked) < 2: + return 0 + + i = 0 + while i < len(tracked) - 1: + curr = tracked[i] + next_elem = tracked[i + 1] + + if _can_merge_tracked(curr, next_elem): + _merge_tracked_content(curr, next_elem) + container.removeChild(next_elem) + tracked.pop(i + 1) + merge_count += 1 + else: + i += 1 + + return merge_count + + +def _is_element(node, tag: str) -> bool: + name = node.localName or node.tagName + return name == tag or name.endswith(f":{tag}") + + +def _get_author(elem) -> str: + author = elem.getAttribute("w:author") + if not author: + for attr in elem.attributes.values(): + if attr.localName == "author" or attr.name.endswith(":author"): + return attr.value + return author + + +def _can_merge_tracked(elem1, elem2) -> bool: + if _get_author(elem1) != _get_author(elem2): + return False + + node = elem1.nextSibling + while node and node != elem2: + if node.nodeType == node.ELEMENT_NODE: + return False + if node.nodeType == node.TEXT_NODE and node.data.strip(): + return False + node = node.nextSibling + + return True + + +def _merge_tracked_content(target, source): + while source.firstChild: + child = source.firstChild + source.removeChild(child) + target.appendChild(child) + + +def _find_elements(root, tag: str) -> list: + results = [] + + def traverse(node): + if node.nodeType == node.ELEMENT_NODE: + name = node.localName or node.tagName + if name == tag or name.endswith(f":{tag}"): + results.append(node) + for child in node.childNodes: + traverse(child) + + traverse(root) + return results + + +def get_tracked_change_authors(doc_xml_path: Path) -> dict[str, int]: + if not doc_xml_path.exists(): + return {} + + try: + tree = ET.parse(doc_xml_path) + root = tree.getroot() + except ET.ParseError: + return {} + + namespaces = {"w": WORD_NS} + author_attr = f"{{{WORD_NS}}}author" + + authors: dict[str, int] = {} + for tag in ["ins", "del"]: + for elem in root.findall(f".//w:{tag}", namespaces): + author = elem.get(author_attr) + if author: + authors[author] = authors.get(author, 0) + 1 + + return authors + + +def _get_authors_from_docx(docx_path: Path) -> dict[str, int]: + try: + with zipfile.ZipFile(docx_path, "r") as zf: + if "word/document.xml" not in zf.namelist(): + return {} + with zf.open("word/document.xml") as f: + tree = ET.parse(f) + root = tree.getroot() + + namespaces = {"w": WORD_NS} + author_attr = f"{{{WORD_NS}}}author" + + authors: dict[str, int] = {} + for tag in ["ins", "del"]: + for elem in root.findall(f".//w:{tag}", namespaces): + author = elem.get(author_attr) + if author: + authors[author] = authors.get(author, 0) + 1 + return authors + except (zipfile.BadZipFile, ET.ParseError): + return {} + + +def infer_author(modified_dir: Path, original_docx: Path, default: str = "Claude") -> str: + modified_xml = modified_dir / "word" / "document.xml" + modified_authors = get_tracked_change_authors(modified_xml) + + if not modified_authors: + return default + + original_authors = _get_authors_from_docx(original_docx) + + new_changes: dict[str, int] = {} + for author, count in modified_authors.items(): + original_count = original_authors.get(author, 0) + diff = count - original_count + if diff > 0: + new_changes[author] = diff + + if not new_changes: + return default + + if len(new_changes) == 1: + return next(iter(new_changes)) + + raise ValueError( + f"Multiple authors added new changes: {new_changes}. " + "Cannot infer which author to validate." + ) diff --git a/skills/pptx-run-report/scripts/office/pack.py b/skills/pptx-run-report/scripts/office/pack.py new file mode 100755 index 0000000..db29ed8 --- /dev/null +++ b/skills/pptx-run-report/scripts/office/pack.py @@ -0,0 +1,159 @@ +"""Pack a directory into a DOCX, PPTX, or XLSX file. + +Validates with auto-repair, condenses XML formatting, and creates the Office file. + +Usage: + python pack.py [--original ] [--validate true|false] + +Examples: + python pack.py unpacked/ output.docx --original input.docx + python pack.py unpacked/ output.pptx --validate false +""" + +import argparse +import sys +import shutil +import tempfile +import zipfile +from pathlib import Path + +import defusedxml.minidom + +from validators import DOCXSchemaValidator, PPTXSchemaValidator, RedliningValidator + +def pack( + input_directory: str, + output_file: str, + original_file: str | None = None, + validate: bool = True, + infer_author_func=None, +) -> tuple[None, str]: + input_dir = Path(input_directory) + output_path = Path(output_file) + suffix = output_path.suffix.lower() + + if not input_dir.is_dir(): + return None, f"Error: {input_dir} is not a directory" + + if suffix not in {".docx", ".pptx", ".xlsx"}: + return None, f"Error: {output_file} must be a .docx, .pptx, or .xlsx file" + + if validate and original_file: + original_path = Path(original_file) + if original_path.exists(): + success, output = _run_validation( + input_dir, original_path, suffix, infer_author_func + ) + if output: + print(output) + if not success: + return None, f"Error: Validation failed for {input_dir}" + + with tempfile.TemporaryDirectory() as temp_dir: + temp_content_dir = Path(temp_dir) / "content" + shutil.copytree(input_dir, temp_content_dir) + + for pattern in ["*.xml", "*.rels"]: + for xml_file in temp_content_dir.rglob(pattern): + _condense_xml(xml_file) + + output_path.parent.mkdir(parents=True, exist_ok=True) + with zipfile.ZipFile(output_path, "w", zipfile.ZIP_DEFLATED) as zf: + for f in temp_content_dir.rglob("*"): + if f.is_file(): + zf.write(f, f.relative_to(temp_content_dir)) + + return None, f"Successfully packed {input_dir} to {output_file}" + + +def _run_validation( + unpacked_dir: Path, + original_file: Path, + suffix: str, + infer_author_func=None, +) -> tuple[bool, str | None]: + output_lines = [] + validators = [] + + if suffix == ".docx": + author = "Claude" + if infer_author_func: + try: + author = infer_author_func(unpacked_dir, original_file) + except ValueError as e: + print(f"Warning: {e} Using default author 'Claude'.", file=sys.stderr) + + validators = [ + DOCXSchemaValidator(unpacked_dir, original_file), + RedliningValidator(unpacked_dir, original_file, author=author), + ] + elif suffix == ".pptx": + validators = [PPTXSchemaValidator(unpacked_dir, original_file)] + + if not validators: + return True, None + + total_repairs = sum(v.repair() for v in validators) + if total_repairs: + output_lines.append(f"Auto-repaired {total_repairs} issue(s)") + + success = all(v.validate() for v in validators) + + if success: + output_lines.append("All validations PASSED!") + + return success, "\n".join(output_lines) if output_lines else None + + +def _condense_xml(xml_file: Path) -> None: + try: + with open(xml_file, encoding="utf-8") as f: + dom = defusedxml.minidom.parse(f) + + for element in dom.getElementsByTagName("*"): + if element.tagName.endswith(":t"): + continue + + for child in list(element.childNodes): + if ( + child.nodeType == child.TEXT_NODE + and child.nodeValue + and child.nodeValue.strip() == "" + ) or child.nodeType == child.COMMENT_NODE: + element.removeChild(child) + + xml_file.write_bytes(dom.toxml(encoding="UTF-8")) + except Exception as e: + print(f"ERROR: Failed to parse {xml_file.name}: {e}", file=sys.stderr) + raise + + +if __name__ == "__main__": + parser = argparse.ArgumentParser( + description="Pack a directory into a DOCX, PPTX, or XLSX file" + ) + parser.add_argument("input_directory", help="Unpacked Office document directory") + parser.add_argument("output_file", help="Output Office file (.docx/.pptx/.xlsx)") + parser.add_argument( + "--original", + help="Original file for validation comparison", + ) + parser.add_argument( + "--validate", + type=lambda x: x.lower() == "true", + default=True, + metavar="true|false", + help="Run validation with auto-repair (default: true)", + ) + args = parser.parse_args() + + _, message = pack( + args.input_directory, + args.output_file, + original_file=args.original, + validate=args.validate, + ) + print(message) + + if "Error" in message: + sys.exit(1) diff --git a/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/dml-chart.xsd b/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/dml-chart.xsd new file mode 100644 index 0000000..6454ef9 --- /dev/null +++ b/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/dml-chart.xsd @@ -0,0 +1,1499 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/dml-chartDrawing.xsd b/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/dml-chartDrawing.xsd new file mode 100644 index 0000000..afa4f46 --- /dev/null +++ b/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/dml-chartDrawing.xsd @@ -0,0 +1,146 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/dml-diagram.xsd b/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/dml-diagram.xsd new file mode 100644 index 0000000..64e66b8 --- /dev/null +++ b/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/dml-diagram.xsd @@ -0,0 +1,1085 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/dml-lockedCanvas.xsd b/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/dml-lockedCanvas.xsd new file mode 100644 index 0000000..687eea8 --- /dev/null +++ b/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/dml-lockedCanvas.xsd @@ -0,0 +1,11 @@ + + + + + diff --git a/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/dml-main.xsd b/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/dml-main.xsd new file mode 100644 index 0000000..6ac81b0 --- /dev/null +++ b/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/dml-main.xsd @@ -0,0 +1,3081 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/dml-picture.xsd b/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/dml-picture.xsd new file mode 100644 index 0000000..1dbf051 --- /dev/null +++ b/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/dml-picture.xsd @@ -0,0 +1,23 @@ + + + + + + + + + + + + + + + + + + diff --git a/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/dml-spreadsheetDrawing.xsd b/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/dml-spreadsheetDrawing.xsd new file mode 100644 index 0000000..f1af17d --- /dev/null +++ b/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/dml-spreadsheetDrawing.xsd @@ -0,0 +1,185 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/dml-wordprocessingDrawing.xsd b/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/dml-wordprocessingDrawing.xsd new file mode 100644 index 0000000..0a185ab --- /dev/null +++ b/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/dml-wordprocessingDrawing.xsd @@ -0,0 +1,287 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/pml.xsd b/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/pml.xsd new file mode 100644 index 0000000..14ef488 --- /dev/null +++ b/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/pml.xsd @@ -0,0 +1,1676 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/shared-additionalCharacteristics.xsd b/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/shared-additionalCharacteristics.xsd new file mode 100644 index 0000000..c20f3bf --- /dev/null +++ b/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/shared-additionalCharacteristics.xsd @@ -0,0 +1,28 @@ + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/shared-bibliography.xsd b/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/shared-bibliography.xsd new file mode 100644 index 0000000..ac60252 --- /dev/null +++ b/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/shared-bibliography.xsd @@ -0,0 +1,144 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/shared-commonSimpleTypes.xsd b/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/shared-commonSimpleTypes.xsd new file mode 100644 index 0000000..424b8ba --- /dev/null +++ b/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/shared-commonSimpleTypes.xsd @@ -0,0 +1,174 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/shared-customXmlDataProperties.xsd b/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/shared-customXmlDataProperties.xsd new file mode 100644 index 0000000..2bddce2 --- /dev/null +++ b/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/shared-customXmlDataProperties.xsd @@ -0,0 +1,25 @@ + + + + + + + + + + + + + + + + + + + diff --git a/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/shared-customXmlSchemaProperties.xsd b/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/shared-customXmlSchemaProperties.xsd new file mode 100644 index 0000000..8a8c18b --- /dev/null +++ b/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/shared-customXmlSchemaProperties.xsd @@ -0,0 +1,18 @@ + + + + + + + + + + + + + + + diff --git a/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesCustom.xsd b/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesCustom.xsd new file mode 100644 index 0000000..5c42706 --- /dev/null +++ b/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesCustom.xsd @@ -0,0 +1,59 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesExtended.xsd b/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesExtended.xsd new file mode 100644 index 0000000..853c341 --- /dev/null +++ b/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesExtended.xsd @@ -0,0 +1,56 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesVariantTypes.xsd b/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesVariantTypes.xsd new file mode 100644 index 0000000..da835ee --- /dev/null +++ b/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesVariantTypes.xsd @@ -0,0 +1,195 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/shared-math.xsd b/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/shared-math.xsd new file mode 100644 index 0000000..87ad265 --- /dev/null +++ b/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/shared-math.xsd @@ -0,0 +1,582 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/shared-relationshipReference.xsd b/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/shared-relationshipReference.xsd new file mode 100644 index 0000000..9e86f1b --- /dev/null +++ b/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/shared-relationshipReference.xsd @@ -0,0 +1,25 @@ + + + + + + + + + + + + + + + + + + + + diff --git a/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/sml.xsd b/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/sml.xsd new file mode 100644 index 0000000..d0be42e --- /dev/null +++ b/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/sml.xsd @@ -0,0 +1,4439 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/vml-main.xsd b/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/vml-main.xsd new file mode 100644 index 0000000..8821dd1 --- /dev/null +++ b/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/vml-main.xsd @@ -0,0 +1,570 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/vml-officeDrawing.xsd b/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/vml-officeDrawing.xsd new file mode 100644 index 0000000..ca2575c --- /dev/null +++ b/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/vml-officeDrawing.xsd @@ -0,0 +1,509 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/vml-presentationDrawing.xsd b/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/vml-presentationDrawing.xsd new file mode 100644 index 0000000..dd079e6 --- /dev/null +++ b/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/vml-presentationDrawing.xsd @@ -0,0 +1,12 @@ + + + + + + + + + diff --git a/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/vml-spreadsheetDrawing.xsd b/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/vml-spreadsheetDrawing.xsd new file mode 100644 index 0000000..3dd6cf6 --- /dev/null +++ b/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/vml-spreadsheetDrawing.xsd @@ -0,0 +1,108 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/vml-wordprocessingDrawing.xsd b/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/vml-wordprocessingDrawing.xsd new file mode 100644 index 0000000..f1041e3 --- /dev/null +++ b/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/vml-wordprocessingDrawing.xsd @@ -0,0 +1,96 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/wml.xsd b/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/wml.xsd new file mode 100644 index 0000000..9c5b7a6 --- /dev/null +++ b/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/wml.xsd @@ -0,0 +1,3646 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/xml.xsd b/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/xml.xsd new file mode 100644 index 0000000..0f13678 --- /dev/null +++ b/skills/pptx-run-report/scripts/office/schemas/ISO-IEC29500-4_2016/xml.xsd @@ -0,0 +1,116 @@ + + + + + + See http://www.w3.org/XML/1998/namespace.html and + http://www.w3.org/TR/REC-xml for information about this namespace. + + This schema document describes the XML namespace, in a form + suitable for import by other schema documents. + + Note that local names in this namespace are intended to be defined + only by the World Wide Web Consortium or its subgroups. The + following names are currently defined in this namespace and should + not be used with conflicting semantics by any Working Group, + specification, or document instance: + + base (as an attribute name): denotes an attribute whose value + provides a URI to be used as the base for interpreting any + relative URIs in the scope of the element on which it + appears; its value is inherited. This name is reserved + by virtue of its definition in the XML Base specification. + + lang (as an attribute name): denotes an attribute whose value + is a language code for the natural language of the content of + any element; its value is inherited. This name is reserved + by virtue of its definition in the XML specification. + + space (as an attribute name): denotes an attribute whose + value is a keyword indicating what whitespace processing + discipline is intended for the content of the element; its + value is inherited. This name is reserved by virtue of its + definition in the XML specification. + + Father (in any context at all): denotes Jon Bosak, the chair of + the original XML Working Group. This name is reserved by + the following decision of the W3C XML Plenary and + XML Coordination groups: + + In appreciation for his vision, leadership and dedication + the W3C XML Plenary on this 10th day of February, 2000 + reserves for Jon Bosak in perpetuity the XML name + xml:Father + + + + + This schema defines attributes and an attribute group + suitable for use by + schemas wishing to allow xml:base, xml:lang or xml:space attributes + on elements they define. + + To enable this, such a schema must import this schema + for the XML namespace, e.g. as follows: + <schema . . .> + . . . + <import namespace="http://www.w3.org/XML/1998/namespace" + schemaLocation="http://www.w3.org/2001/03/xml.xsd"/> + + Subsequently, qualified reference to any of the attributes + or the group defined below will have the desired effect, e.g. + + <type . . .> + . . . + <attributeGroup ref="xml:specialAttrs"/> + + will define a type which will schema-validate an instance + element with any of those attributes + + + + In keeping with the XML Schema WG's standard versioning + policy, this schema document will persist at + http://www.w3.org/2001/03/xml.xsd. + At the date of issue it can also be found at + http://www.w3.org/2001/xml.xsd. + The schema document at that URI may however change in the future, + in order to remain compatible with the latest version of XML Schema + itself. In other words, if the XML Schema namespace changes, the version + of this document at + http://www.w3.org/2001/xml.xsd will change + accordingly; the version at + http://www.w3.org/2001/03/xml.xsd will not change. + + + + + + In due course, we should install the relevant ISO 2- and 3-letter + codes as the enumerated possible values . . . + + + + + + + + + + + + + + + See http://www.w3.org/TR/xmlbase/ for + information about this attribute. + + + + + + + + + + diff --git a/skills/pptx-run-report/scripts/office/schemas/ecma/fouth-edition/opc-contentTypes.xsd b/skills/pptx-run-report/scripts/office/schemas/ecma/fouth-edition/opc-contentTypes.xsd new file mode 100644 index 0000000..a6de9d2 --- /dev/null +++ b/skills/pptx-run-report/scripts/office/schemas/ecma/fouth-edition/opc-contentTypes.xsd @@ -0,0 +1,42 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/skills/pptx-run-report/scripts/office/schemas/ecma/fouth-edition/opc-coreProperties.xsd b/skills/pptx-run-report/scripts/office/schemas/ecma/fouth-edition/opc-coreProperties.xsd new file mode 100644 index 0000000..10e978b --- /dev/null +++ b/skills/pptx-run-report/scripts/office/schemas/ecma/fouth-edition/opc-coreProperties.xsd @@ -0,0 +1,50 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/skills/pptx-run-report/scripts/office/schemas/ecma/fouth-edition/opc-digSig.xsd b/skills/pptx-run-report/scripts/office/schemas/ecma/fouth-edition/opc-digSig.xsd new file mode 100644 index 0000000..4248bf7 --- /dev/null +++ b/skills/pptx-run-report/scripts/office/schemas/ecma/fouth-edition/opc-digSig.xsd @@ -0,0 +1,49 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/skills/pptx-run-report/scripts/office/schemas/ecma/fouth-edition/opc-relationships.xsd b/skills/pptx-run-report/scripts/office/schemas/ecma/fouth-edition/opc-relationships.xsd new file mode 100644 index 0000000..5649746 --- /dev/null +++ b/skills/pptx-run-report/scripts/office/schemas/ecma/fouth-edition/opc-relationships.xsd @@ -0,0 +1,33 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/skills/pptx-run-report/scripts/office/schemas/mce/mc.xsd b/skills/pptx-run-report/scripts/office/schemas/mce/mc.xsd new file mode 100644 index 0000000..ef72545 --- /dev/null +++ b/skills/pptx-run-report/scripts/office/schemas/mce/mc.xsd @@ -0,0 +1,75 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/skills/pptx-run-report/scripts/office/schemas/microsoft/wml-2010.xsd b/skills/pptx-run-report/scripts/office/schemas/microsoft/wml-2010.xsd new file mode 100644 index 0000000..f65f777 --- /dev/null +++ b/skills/pptx-run-report/scripts/office/schemas/microsoft/wml-2010.xsd @@ -0,0 +1,560 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/skills/pptx-run-report/scripts/office/schemas/microsoft/wml-2012.xsd b/skills/pptx-run-report/scripts/office/schemas/microsoft/wml-2012.xsd new file mode 100644 index 0000000..6b00755 --- /dev/null +++ b/skills/pptx-run-report/scripts/office/schemas/microsoft/wml-2012.xsd @@ -0,0 +1,67 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/skills/pptx-run-report/scripts/office/schemas/microsoft/wml-2018.xsd b/skills/pptx-run-report/scripts/office/schemas/microsoft/wml-2018.xsd new file mode 100644 index 0000000..f321d33 --- /dev/null +++ b/skills/pptx-run-report/scripts/office/schemas/microsoft/wml-2018.xsd @@ -0,0 +1,14 @@ + + + + + + + + + + + + + + diff --git a/skills/pptx-run-report/scripts/office/schemas/microsoft/wml-cex-2018.xsd b/skills/pptx-run-report/scripts/office/schemas/microsoft/wml-cex-2018.xsd new file mode 100644 index 0000000..364c6a9 --- /dev/null +++ b/skills/pptx-run-report/scripts/office/schemas/microsoft/wml-cex-2018.xsd @@ -0,0 +1,20 @@ + + + + + + + + + + + + + + + + + + + + diff --git a/skills/pptx-run-report/scripts/office/schemas/microsoft/wml-cid-2016.xsd b/skills/pptx-run-report/scripts/office/schemas/microsoft/wml-cid-2016.xsd new file mode 100644 index 0000000..fed9d15 --- /dev/null +++ b/skills/pptx-run-report/scripts/office/schemas/microsoft/wml-cid-2016.xsd @@ -0,0 +1,13 @@ + + + + + + + + + + + + + diff --git a/skills/pptx-run-report/scripts/office/schemas/microsoft/wml-sdtdatahash-2020.xsd b/skills/pptx-run-report/scripts/office/schemas/microsoft/wml-sdtdatahash-2020.xsd new file mode 100644 index 0000000..680cf15 --- /dev/null +++ b/skills/pptx-run-report/scripts/office/schemas/microsoft/wml-sdtdatahash-2020.xsd @@ -0,0 +1,4 @@ + + + + diff --git a/skills/pptx-run-report/scripts/office/schemas/microsoft/wml-symex-2015.xsd b/skills/pptx-run-report/scripts/office/schemas/microsoft/wml-symex-2015.xsd new file mode 100644 index 0000000..89ada90 --- /dev/null +++ b/skills/pptx-run-report/scripts/office/schemas/microsoft/wml-symex-2015.xsd @@ -0,0 +1,8 @@ + + + + + + + + diff --git a/skills/pptx-run-report/scripts/office/soffice.py b/skills/pptx-run-report/scripts/office/soffice.py new file mode 100644 index 0000000..c7f7e32 --- /dev/null +++ b/skills/pptx-run-report/scripts/office/soffice.py @@ -0,0 +1,183 @@ +""" +Helper for running LibreOffice (soffice) in environments where AF_UNIX +sockets may be blocked (e.g., sandboxed VMs). Detects the restriction +at runtime and applies an LD_PRELOAD shim if needed. + +Usage: + from office.soffice import run_soffice, get_soffice_env + + # Option 1 – run soffice directly + result = run_soffice(["--headless", "--convert-to", "pdf", "input.docx"]) + + # Option 2 – get env dict for your own subprocess calls + env = get_soffice_env() + subprocess.run(["soffice", ...], env=env) +""" + +import os +import socket +import subprocess +import tempfile +from pathlib import Path + + +def get_soffice_env() -> dict: + env = os.environ.copy() + env["SAL_USE_VCLPLUGIN"] = "svp" + + if _needs_shim(): + shim = _ensure_shim() + env["LD_PRELOAD"] = str(shim) + + return env + + +def run_soffice(args: list[str], **kwargs) -> subprocess.CompletedProcess: + env = get_soffice_env() + return subprocess.run(["soffice"] + args, env=env, **kwargs) + + + +_SHIM_SO = Path(tempfile.gettempdir()) / "lo_socket_shim.so" + + +def _needs_shim() -> bool: + try: + s = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM) + s.close() + return False + except OSError: + return True + + +def _ensure_shim() -> Path: + if _SHIM_SO.exists(): + return _SHIM_SO + + src = Path(tempfile.gettempdir()) / "lo_socket_shim.c" + src.write_text(_SHIM_SOURCE) + subprocess.run( + ["gcc", "-shared", "-fPIC", "-o", str(_SHIM_SO), str(src), "-ldl"], + check=True, + capture_output=True, + ) + src.unlink() + return _SHIM_SO + + + +_SHIM_SOURCE = r""" +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include +#include + +static int (*real_socket)(int, int, int); +static int (*real_socketpair)(int, int, int, int[2]); +static int (*real_listen)(int, int); +static int (*real_accept)(int, struct sockaddr *, socklen_t *); +static int (*real_close)(int); +static int (*real_read)(int, void *, size_t); + +/* Per-FD bookkeeping (FDs >= 1024 are passed through unshimmed). */ +static int is_shimmed[1024]; +static int peer_of[1024]; +static int wake_r[1024]; /* accept() blocks reading this */ +static int wake_w[1024]; /* close() writes to this */ +static int listener_fd = -1; /* FD that received listen() */ + +__attribute__((constructor)) +static void init(void) { + real_socket = dlsym(RTLD_NEXT, "socket"); + real_socketpair = dlsym(RTLD_NEXT, "socketpair"); + real_listen = dlsym(RTLD_NEXT, "listen"); + real_accept = dlsym(RTLD_NEXT, "accept"); + real_close = dlsym(RTLD_NEXT, "close"); + real_read = dlsym(RTLD_NEXT, "read"); + for (int i = 0; i < 1024; i++) { + peer_of[i] = -1; + wake_r[i] = -1; + wake_w[i] = -1; + } +} + +/* ---- socket ---------------------------------------------------------- */ +int socket(int domain, int type, int protocol) { + if (domain == AF_UNIX) { + int fd = real_socket(domain, type, protocol); + if (fd >= 0) return fd; + /* socket(AF_UNIX) blocked – fall back to socketpair(). */ + int sv[2]; + if (real_socketpair(domain, type, protocol, sv) == 0) { + if (sv[0] >= 0 && sv[0] < 1024) { + is_shimmed[sv[0]] = 1; + peer_of[sv[0]] = sv[1]; + int wp[2]; + if (pipe(wp) == 0) { + wake_r[sv[0]] = wp[0]; + wake_w[sv[0]] = wp[1]; + } + } + return sv[0]; + } + errno = EPERM; + return -1; + } + return real_socket(domain, type, protocol); +} + +/* ---- listen ---------------------------------------------------------- */ +int listen(int sockfd, int backlog) { + if (sockfd >= 0 && sockfd < 1024 && is_shimmed[sockfd]) { + listener_fd = sockfd; + return 0; + } + return real_listen(sockfd, backlog); +} + +/* ---- accept ---------------------------------------------------------- */ +int accept(int sockfd, struct sockaddr *addr, socklen_t *addrlen) { + if (sockfd >= 0 && sockfd < 1024 && is_shimmed[sockfd]) { + /* Block until close() writes to the wake pipe. */ + if (wake_r[sockfd] >= 0) { + char buf; + real_read(wake_r[sockfd], &buf, 1); + } + errno = ECONNABORTED; + return -1; + } + return real_accept(sockfd, addr, addrlen); +} + +/* ---- close ----------------------------------------------------------- */ +int close(int fd) { + if (fd >= 0 && fd < 1024 && is_shimmed[fd]) { + int was_listener = (fd == listener_fd); + is_shimmed[fd] = 0; + + if (wake_w[fd] >= 0) { /* unblock accept() */ + char c = 0; + write(wake_w[fd], &c, 1); + real_close(wake_w[fd]); + wake_w[fd] = -1; + } + if (wake_r[fd] >= 0) { real_close(wake_r[fd]); wake_r[fd] = -1; } + if (peer_of[fd] >= 0) { real_close(peer_of[fd]); peer_of[fd] = -1; } + + if (was_listener) + _exit(0); /* conversion done – exit */ + } + return real_close(fd); +} +""" + + + +if __name__ == "__main__": + import sys + result = run_soffice(sys.argv[1:]) + sys.exit(result.returncode) diff --git a/skills/pptx-run-report/scripts/office/unpack.py b/skills/pptx-run-report/scripts/office/unpack.py new file mode 100755 index 0000000..0015253 --- /dev/null +++ b/skills/pptx-run-report/scripts/office/unpack.py @@ -0,0 +1,132 @@ +"""Unpack Office files (DOCX, PPTX, XLSX) for editing. + +Extracts the ZIP archive, pretty-prints XML files, and optionally: +- Merges adjacent runs with identical formatting (DOCX only) +- Simplifies adjacent tracked changes from same author (DOCX only) + +Usage: + python unpack.py [options] + +Examples: + python unpack.py document.docx unpacked/ + python unpack.py presentation.pptx unpacked/ + python unpack.py document.docx unpacked/ --merge-runs false +""" + +import argparse +import sys +import zipfile +from pathlib import Path + +import defusedxml.minidom + +from helpers.merge_runs import merge_runs as do_merge_runs +from helpers.simplify_redlines import simplify_redlines as do_simplify_redlines + +SMART_QUOTE_REPLACEMENTS = { + "\u201c": "“", + "\u201d": "”", + "\u2018": "‘", + "\u2019": "’", +} + + +def unpack( + input_file: str, + output_directory: str, + merge_runs: bool = True, + simplify_redlines: bool = True, +) -> tuple[None, str]: + input_path = Path(input_file) + output_path = Path(output_directory) + suffix = input_path.suffix.lower() + + if not input_path.exists(): + return None, f"Error: {input_file} does not exist" + + if suffix not in {".docx", ".pptx", ".xlsx"}: + return None, f"Error: {input_file} must be a .docx, .pptx, or .xlsx file" + + try: + output_path.mkdir(parents=True, exist_ok=True) + + with zipfile.ZipFile(input_path, "r") as zf: + zf.extractall(output_path) + + xml_files = list(output_path.rglob("*.xml")) + list(output_path.rglob("*.rels")) + for xml_file in xml_files: + _pretty_print_xml(xml_file) + + message = f"Unpacked {input_file} ({len(xml_files)} XML files)" + + if suffix == ".docx": + if simplify_redlines: + simplify_count, _ = do_simplify_redlines(str(output_path)) + message += f", simplified {simplify_count} tracked changes" + + if merge_runs: + merge_count, _ = do_merge_runs(str(output_path)) + message += f", merged {merge_count} runs" + + for xml_file in xml_files: + _escape_smart_quotes(xml_file) + + return None, message + + except zipfile.BadZipFile: + return None, f"Error: {input_file} is not a valid Office file" + except Exception as e: + return None, f"Error unpacking: {e}" + + +def _pretty_print_xml(xml_file: Path) -> None: + try: + content = xml_file.read_text(encoding="utf-8") + dom = defusedxml.minidom.parseString(content) + xml_file.write_bytes(dom.toprettyxml(indent=" ", encoding="utf-8")) + except Exception: + pass + + +def _escape_smart_quotes(xml_file: Path) -> None: + try: + content = xml_file.read_text(encoding="utf-8") + for char, entity in SMART_QUOTE_REPLACEMENTS.items(): + content = content.replace(char, entity) + xml_file.write_text(content, encoding="utf-8") + except Exception: + pass + + +if __name__ == "__main__": + parser = argparse.ArgumentParser( + description="Unpack an Office file (DOCX, PPTX, XLSX) for editing" + ) + parser.add_argument("input_file", help="Office file to unpack") + parser.add_argument("output_directory", help="Output directory") + parser.add_argument( + "--merge-runs", + type=lambda x: x.lower() == "true", + default=True, + metavar="true|false", + help="Merge adjacent runs with identical formatting (DOCX only, default: true)", + ) + parser.add_argument( + "--simplify-redlines", + type=lambda x: x.lower() == "true", + default=True, + metavar="true|false", + help="Merge adjacent tracked changes from same author (DOCX only, default: true)", + ) + args = parser.parse_args() + + _, message = unpack( + args.input_file, + args.output_directory, + merge_runs=args.merge_runs, + simplify_redlines=args.simplify_redlines, + ) + print(message) + + if "Error" in message: + sys.exit(1) diff --git a/skills/pptx-run-report/scripts/office/validate.py b/skills/pptx-run-report/scripts/office/validate.py new file mode 100755 index 0000000..03b01f6 --- /dev/null +++ b/skills/pptx-run-report/scripts/office/validate.py @@ -0,0 +1,111 @@ +""" +Command line tool to validate Office document XML files against XSD schemas and tracked changes. + +Usage: + python validate.py [--original ] [--auto-repair] [--author NAME] + +The first argument can be either: +- An unpacked directory containing the Office document XML files +- A packed Office file (.docx/.pptx/.xlsx) which will be unpacked to a temp directory + +Auto-repair fixes: +- paraId/durableId values that exceed OOXML limits +- Missing xml:space="preserve" on w:t elements with whitespace +""" + +import argparse +import sys +import tempfile +import zipfile +from pathlib import Path + +from validators import DOCXSchemaValidator, PPTXSchemaValidator, RedliningValidator + + +def main(): + parser = argparse.ArgumentParser(description="Validate Office document XML files") + parser.add_argument( + "path", + help="Path to unpacked directory or packed Office file (.docx/.pptx/.xlsx)", + ) + parser.add_argument( + "--original", + required=False, + default=None, + help="Path to original file (.docx/.pptx/.xlsx). If omitted, all XSD errors are reported and redlining validation is skipped.", + ) + parser.add_argument( + "-v", + "--verbose", + action="store_true", + help="Enable verbose output", + ) + parser.add_argument( + "--auto-repair", + action="store_true", + help="Automatically repair common issues (hex IDs, whitespace preservation)", + ) + parser.add_argument( + "--author", + default="Claude", + help="Author name for redlining validation (default: Claude)", + ) + args = parser.parse_args() + + path = Path(args.path) + assert path.exists(), f"Error: {path} does not exist" + + original_file = None + if args.original: + original_file = Path(args.original) + assert original_file.is_file(), f"Error: {original_file} is not a file" + assert original_file.suffix.lower() in [".docx", ".pptx", ".xlsx"], ( + f"Error: {original_file} must be a .docx, .pptx, or .xlsx file" + ) + + file_extension = (original_file or path).suffix.lower() + assert file_extension in [".docx", ".pptx", ".xlsx"], ( + f"Error: Cannot determine file type from {path}. Use --original or provide a .docx/.pptx/.xlsx file." + ) + + if path.is_file() and path.suffix.lower() in [".docx", ".pptx", ".xlsx"]: + temp_dir = tempfile.mkdtemp() + with zipfile.ZipFile(path, "r") as zf: + zf.extractall(temp_dir) + unpacked_dir = Path(temp_dir) + else: + assert path.is_dir(), f"Error: {path} is not a directory or Office file" + unpacked_dir = path + + match file_extension: + case ".docx": + validators = [ + DOCXSchemaValidator(unpacked_dir, original_file, verbose=args.verbose), + ] + if original_file: + validators.append( + RedliningValidator(unpacked_dir, original_file, verbose=args.verbose, author=args.author) + ) + case ".pptx": + validators = [ + PPTXSchemaValidator(unpacked_dir, original_file, verbose=args.verbose), + ] + case _: + print(f"Error: Validation not supported for file type {file_extension}") + sys.exit(1) + + if args.auto_repair: + total_repairs = sum(v.repair() for v in validators) + if total_repairs: + print(f"Auto-repaired {total_repairs} issue(s)") + + success = all(v.validate() for v in validators) + + if success: + print("All validations PASSED!") + + sys.exit(0 if success else 1) + + +if __name__ == "__main__": + main() diff --git a/skills/pptx-run-report/scripts/office/validators/__init__.py b/skills/pptx-run-report/scripts/office/validators/__init__.py new file mode 100644 index 0000000..db092ec --- /dev/null +++ b/skills/pptx-run-report/scripts/office/validators/__init__.py @@ -0,0 +1,15 @@ +""" +Validation modules for Word document processing. +""" + +from .base import BaseSchemaValidator +from .docx import DOCXSchemaValidator +from .pptx import PPTXSchemaValidator +from .redlining import RedliningValidator + +__all__ = [ + "BaseSchemaValidator", + "DOCXSchemaValidator", + "PPTXSchemaValidator", + "RedliningValidator", +] diff --git a/skills/pptx-run-report/scripts/office/validators/base.py b/skills/pptx-run-report/scripts/office/validators/base.py new file mode 100644 index 0000000..db4a06a --- /dev/null +++ b/skills/pptx-run-report/scripts/office/validators/base.py @@ -0,0 +1,847 @@ +""" +Base validator with common validation logic for document files. +""" + +import re +from pathlib import Path + +import defusedxml.minidom +import lxml.etree + + +class BaseSchemaValidator: + + IGNORED_VALIDATION_ERRORS = [ + "hyphenationZone", + "purl.org/dc/terms", + ] + + UNIQUE_ID_REQUIREMENTS = { + "comment": ("id", "file"), + "commentrangestart": ("id", "file"), + "commentrangeend": ("id", "file"), + "bookmarkstart": ("id", "file"), + "bookmarkend": ("id", "file"), + "sldid": ("id", "file"), + "sldmasterid": ("id", "global"), + "sldlayoutid": ("id", "global"), + "cm": ("authorid", "file"), + "sheet": ("sheetid", "file"), + "definedname": ("id", "file"), + "cxnsp": ("id", "file"), + "sp": ("id", "file"), + "pic": ("id", "file"), + "grpsp": ("id", "file"), + } + + EXCLUDED_ID_CONTAINERS = { + "sectionlst", + } + + ELEMENT_RELATIONSHIP_TYPES = {} + + SCHEMA_MAPPINGS = { + "word": "ISO-IEC29500-4_2016/wml.xsd", + "ppt": "ISO-IEC29500-4_2016/pml.xsd", + "xl": "ISO-IEC29500-4_2016/sml.xsd", + "[Content_Types].xml": "ecma/fouth-edition/opc-contentTypes.xsd", + "app.xml": "ISO-IEC29500-4_2016/shared-documentPropertiesExtended.xsd", + "core.xml": "ecma/fouth-edition/opc-coreProperties.xsd", + "custom.xml": "ISO-IEC29500-4_2016/shared-documentPropertiesCustom.xsd", + ".rels": "ecma/fouth-edition/opc-relationships.xsd", + "people.xml": "microsoft/wml-2012.xsd", + "commentsIds.xml": "microsoft/wml-cid-2016.xsd", + "commentsExtensible.xml": "microsoft/wml-cex-2018.xsd", + "commentsExtended.xml": "microsoft/wml-2012.xsd", + "chart": "ISO-IEC29500-4_2016/dml-chart.xsd", + "theme": "ISO-IEC29500-4_2016/dml-main.xsd", + "drawing": "ISO-IEC29500-4_2016/dml-main.xsd", + } + + MC_NAMESPACE = "http://schemas.openxmlformats.org/markup-compatibility/2006" + XML_NAMESPACE = "http://www.w3.org/XML/1998/namespace" + + PACKAGE_RELATIONSHIPS_NAMESPACE = ( + "http://schemas.openxmlformats.org/package/2006/relationships" + ) + OFFICE_RELATIONSHIPS_NAMESPACE = ( + "http://schemas.openxmlformats.org/officeDocument/2006/relationships" + ) + CONTENT_TYPES_NAMESPACE = ( + "http://schemas.openxmlformats.org/package/2006/content-types" + ) + + MAIN_CONTENT_FOLDERS = {"word", "ppt", "xl"} + + OOXML_NAMESPACES = { + "http://schemas.openxmlformats.org/officeDocument/2006/math", + "http://schemas.openxmlformats.org/officeDocument/2006/relationships", + "http://schemas.openxmlformats.org/schemaLibrary/2006/main", + "http://schemas.openxmlformats.org/drawingml/2006/main", + "http://schemas.openxmlformats.org/drawingml/2006/chart", + "http://schemas.openxmlformats.org/drawingml/2006/chartDrawing", + "http://schemas.openxmlformats.org/drawingml/2006/diagram", + "http://schemas.openxmlformats.org/drawingml/2006/picture", + "http://schemas.openxmlformats.org/drawingml/2006/spreadsheetDrawing", + "http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing", + "http://schemas.openxmlformats.org/wordprocessingml/2006/main", + "http://schemas.openxmlformats.org/presentationml/2006/main", + "http://schemas.openxmlformats.org/spreadsheetml/2006/main", + "http://schemas.openxmlformats.org/officeDocument/2006/sharedTypes", + "http://www.w3.org/XML/1998/namespace", + } + + def __init__(self, unpacked_dir, original_file=None, verbose=False): + self.unpacked_dir = Path(unpacked_dir).resolve() + self.original_file = Path(original_file) if original_file else None + self.verbose = verbose + + self.schemas_dir = Path(__file__).parent.parent / "schemas" + + patterns = ["*.xml", "*.rels"] + self.xml_files = [ + f for pattern in patterns for f in self.unpacked_dir.rglob(pattern) + ] + + if not self.xml_files: + print(f"Warning: No XML files found in {self.unpacked_dir}") + + def validate(self): + raise NotImplementedError("Subclasses must implement the validate method") + + def repair(self) -> int: + return self.repair_whitespace_preservation() + + def repair_whitespace_preservation(self) -> int: + repairs = 0 + + for xml_file in self.xml_files: + try: + content = xml_file.read_text(encoding="utf-8") + dom = defusedxml.minidom.parseString(content) + modified = False + + for elem in dom.getElementsByTagName("*"): + if elem.tagName.endswith(":t") and elem.firstChild: + text = elem.firstChild.nodeValue + if text and (text.startswith((' ', '\t')) or text.endswith((' ', '\t'))): + if elem.getAttribute("xml:space") != "preserve": + elem.setAttribute("xml:space", "preserve") + text_preview = repr(text[:30]) + "..." if len(text) > 30 else repr(text) + print(f" Repaired: {xml_file.name}: Added xml:space='preserve' to {elem.tagName}: {text_preview}") + repairs += 1 + modified = True + + if modified: + xml_file.write_bytes(dom.toxml(encoding="UTF-8")) + + except Exception: + pass + + return repairs + + def validate_xml(self): + errors = [] + + for xml_file in self.xml_files: + try: + lxml.etree.parse(str(xml_file)) + except lxml.etree.XMLSyntaxError as e: + errors.append( + f" {xml_file.relative_to(self.unpacked_dir)}: " + f"Line {e.lineno}: {e.msg}" + ) + except Exception as e: + errors.append( + f" {xml_file.relative_to(self.unpacked_dir)}: " + f"Unexpected error: {str(e)}" + ) + + if errors: + print(f"FAILED - Found {len(errors)} XML violations:") + for error in errors: + print(error) + return False + else: + if self.verbose: + print("PASSED - All XML files are well-formed") + return True + + def validate_namespaces(self): + errors = [] + + for xml_file in self.xml_files: + try: + root = lxml.etree.parse(str(xml_file)).getroot() + declared = set(root.nsmap.keys()) - {None} + + for attr_val in [ + v for k, v in root.attrib.items() if k.endswith("Ignorable") + ]: + undeclared = set(attr_val.split()) - declared + errors.extend( + f" {xml_file.relative_to(self.unpacked_dir)}: " + f"Namespace '{ns}' in Ignorable but not declared" + for ns in undeclared + ) + except lxml.etree.XMLSyntaxError: + continue + + if errors: + print(f"FAILED - {len(errors)} namespace issues:") + for error in errors: + print(error) + return False + if self.verbose: + print("PASSED - All namespace prefixes properly declared") + return True + + def validate_unique_ids(self): + errors = [] + global_ids = {} + + for xml_file in self.xml_files: + try: + root = lxml.etree.parse(str(xml_file)).getroot() + file_ids = {} + + mc_elements = root.xpath( + ".//mc:AlternateContent", namespaces={"mc": self.MC_NAMESPACE} + ) + for elem in mc_elements: + elem.getparent().remove(elem) + + for elem in root.iter(): + tag = ( + elem.tag.split("}")[-1].lower() + if "}" in elem.tag + else elem.tag.lower() + ) + + if tag in self.UNIQUE_ID_REQUIREMENTS: + in_excluded_container = any( + ancestor.tag.split("}")[-1].lower() in self.EXCLUDED_ID_CONTAINERS + for ancestor in elem.iterancestors() + ) + if in_excluded_container: + continue + + attr_name, scope = self.UNIQUE_ID_REQUIREMENTS[tag] + + id_value = None + for attr, value in elem.attrib.items(): + attr_local = ( + attr.split("}")[-1].lower() + if "}" in attr + else attr.lower() + ) + if attr_local == attr_name: + id_value = value + break + + if id_value is not None: + if scope == "global": + if id_value in global_ids: + prev_file, prev_line, prev_tag = global_ids[ + id_value + ] + errors.append( + f" {xml_file.relative_to(self.unpacked_dir)}: " + f"Line {elem.sourceline}: Global ID '{id_value}' in <{tag}> " + f"already used in {prev_file} at line {prev_line} in <{prev_tag}>" + ) + else: + global_ids[id_value] = ( + xml_file.relative_to(self.unpacked_dir), + elem.sourceline, + tag, + ) + elif scope == "file": + key = (tag, attr_name) + if key not in file_ids: + file_ids[key] = {} + + if id_value in file_ids[key]: + prev_line = file_ids[key][id_value] + errors.append( + f" {xml_file.relative_to(self.unpacked_dir)}: " + f"Line {elem.sourceline}: Duplicate {attr_name}='{id_value}' in <{tag}> " + f"(first occurrence at line {prev_line})" + ) + else: + file_ids[key][id_value] = elem.sourceline + + except (lxml.etree.XMLSyntaxError, Exception) as e: + errors.append( + f" {xml_file.relative_to(self.unpacked_dir)}: Error: {e}" + ) + + if errors: + print(f"FAILED - Found {len(errors)} ID uniqueness violations:") + for error in errors: + print(error) + return False + else: + if self.verbose: + print("PASSED - All required IDs are unique") + return True + + def validate_file_references(self): + errors = [] + + rels_files = list(self.unpacked_dir.rglob("*.rels")) + + if not rels_files: + if self.verbose: + print("PASSED - No .rels files found") + return True + + all_files = [] + for file_path in self.unpacked_dir.rglob("*"): + if ( + file_path.is_file() + and file_path.name != "[Content_Types].xml" + and not file_path.name.endswith(".rels") + ): + all_files.append(file_path.resolve()) + + all_referenced_files = set() + + if self.verbose: + print( + f"Found {len(rels_files)} .rels files and {len(all_files)} target files" + ) + + for rels_file in rels_files: + try: + rels_root = lxml.etree.parse(str(rels_file)).getroot() + + rels_dir = rels_file.parent + + referenced_files = set() + broken_refs = [] + + for rel in rels_root.findall( + ".//ns:Relationship", + namespaces={"ns": self.PACKAGE_RELATIONSHIPS_NAMESPACE}, + ): + target = rel.get("Target") + if target and not target.startswith( + ("http", "mailto:") + ): + if target.startswith("/"): + target_path = self.unpacked_dir / target.lstrip("/") + elif rels_file.name == ".rels": + target_path = self.unpacked_dir / target + else: + base_dir = rels_dir.parent + target_path = base_dir / target + + try: + target_path = target_path.resolve() + if target_path.exists() and target_path.is_file(): + referenced_files.add(target_path) + all_referenced_files.add(target_path) + else: + broken_refs.append((target, rel.sourceline)) + except (OSError, ValueError): + broken_refs.append((target, rel.sourceline)) + + if broken_refs: + rel_path = rels_file.relative_to(self.unpacked_dir) + for broken_ref, line_num in broken_refs: + errors.append( + f" {rel_path}: Line {line_num}: Broken reference to {broken_ref}" + ) + + except Exception as e: + rel_path = rels_file.relative_to(self.unpacked_dir) + errors.append(f" Error parsing {rel_path}: {e}") + + unreferenced_files = set(all_files) - all_referenced_files + + if unreferenced_files: + for unref_file in sorted(unreferenced_files): + unref_rel_path = unref_file.relative_to(self.unpacked_dir) + errors.append(f" Unreferenced file: {unref_rel_path}") + + if errors: + print(f"FAILED - Found {len(errors)} relationship validation errors:") + for error in errors: + print(error) + print( + "CRITICAL: These errors will cause the document to appear corrupt. " + + "Broken references MUST be fixed, " + + "and unreferenced files MUST be referenced or removed." + ) + return False + else: + if self.verbose: + print( + "PASSED - All references are valid and all files are properly referenced" + ) + return True + + def validate_all_relationship_ids(self): + import lxml.etree + + errors = [] + + for xml_file in self.xml_files: + if xml_file.suffix == ".rels": + continue + + rels_dir = xml_file.parent / "_rels" + rels_file = rels_dir / f"{xml_file.name}.rels" + + if not rels_file.exists(): + continue + + try: + rels_root = lxml.etree.parse(str(rels_file)).getroot() + rid_to_type = {} + + for rel in rels_root.findall( + f".//{{{self.PACKAGE_RELATIONSHIPS_NAMESPACE}}}Relationship" + ): + rid = rel.get("Id") + rel_type = rel.get("Type", "") + if rid: + if rid in rid_to_type: + rels_rel_path = rels_file.relative_to(self.unpacked_dir) + errors.append( + f" {rels_rel_path}: Line {rel.sourceline}: " + f"Duplicate relationship ID '{rid}' (IDs must be unique)" + ) + type_name = ( + rel_type.split("/")[-1] if "/" in rel_type else rel_type + ) + rid_to_type[rid] = type_name + + xml_root = lxml.etree.parse(str(xml_file)).getroot() + + r_ns = self.OFFICE_RELATIONSHIPS_NAMESPACE + rid_attrs_to_check = ["id", "embed", "link"] + for elem in xml_root.iter(): + for attr_name in rid_attrs_to_check: + rid_attr = elem.get(f"{{{r_ns}}}{attr_name}") + if not rid_attr: + continue + xml_rel_path = xml_file.relative_to(self.unpacked_dir) + elem_name = ( + elem.tag.split("}")[-1] if "}" in elem.tag else elem.tag + ) + + if rid_attr not in rid_to_type: + errors.append( + f" {xml_rel_path}: Line {elem.sourceline}: " + f"<{elem_name}> r:{attr_name} references non-existent relationship '{rid_attr}' " + f"(valid IDs: {', '.join(sorted(rid_to_type.keys())[:5])}{'...' if len(rid_to_type) > 5 else ''})" + ) + elif attr_name == "id" and self.ELEMENT_RELATIONSHIP_TYPES: + expected_type = self._get_expected_relationship_type( + elem_name + ) + if expected_type: + actual_type = rid_to_type[rid_attr] + if expected_type not in actual_type.lower(): + errors.append( + f" {xml_rel_path}: Line {elem.sourceline}: " + f"<{elem_name}> references '{rid_attr}' which points to '{actual_type}' " + f"but should point to a '{expected_type}' relationship" + ) + + except Exception as e: + xml_rel_path = xml_file.relative_to(self.unpacked_dir) + errors.append(f" Error processing {xml_rel_path}: {e}") + + if errors: + print(f"FAILED - Found {len(errors)} relationship ID reference errors:") + for error in errors: + print(error) + print("\nThese ID mismatches will cause the document to appear corrupt!") + return False + else: + if self.verbose: + print("PASSED - All relationship ID references are valid") + return True + + def _get_expected_relationship_type(self, element_name): + elem_lower = element_name.lower() + + if elem_lower in self.ELEMENT_RELATIONSHIP_TYPES: + return self.ELEMENT_RELATIONSHIP_TYPES[elem_lower] + + if elem_lower.endswith("id") and len(elem_lower) > 2: + prefix = elem_lower[:-2] + if prefix.endswith("master"): + return prefix.lower() + elif prefix.endswith("layout"): + return prefix.lower() + else: + if prefix == "sld": + return "slide" + return prefix.lower() + + if elem_lower.endswith("reference") and len(elem_lower) > 9: + prefix = elem_lower[:-9] + return prefix.lower() + + return None + + def validate_content_types(self): + errors = [] + + content_types_file = self.unpacked_dir / "[Content_Types].xml" + if not content_types_file.exists(): + print("FAILED - [Content_Types].xml file not found") + return False + + try: + root = lxml.etree.parse(str(content_types_file)).getroot() + declared_parts = set() + declared_extensions = set() + + for override in root.findall( + f".//{{{self.CONTENT_TYPES_NAMESPACE}}}Override" + ): + part_name = override.get("PartName") + if part_name is not None: + declared_parts.add(part_name.lstrip("/")) + + for default in root.findall( + f".//{{{self.CONTENT_TYPES_NAMESPACE}}}Default" + ): + extension = default.get("Extension") + if extension is not None: + declared_extensions.add(extension.lower()) + + declarable_roots = { + "sld", + "sldLayout", + "sldMaster", + "presentation", + "document", + "workbook", + "worksheet", + "theme", + } + + media_extensions = { + "png": "image/png", + "jpg": "image/jpeg", + "jpeg": "image/jpeg", + "gif": "image/gif", + "bmp": "image/bmp", + "tiff": "image/tiff", + "wmf": "image/x-wmf", + "emf": "image/x-emf", + } + + all_files = list(self.unpacked_dir.rglob("*")) + all_files = [f for f in all_files if f.is_file()] + + for xml_file in self.xml_files: + path_str = str(xml_file.relative_to(self.unpacked_dir)).replace( + "\\", "/" + ) + + if any( + skip in path_str + for skip in [".rels", "[Content_Types]", "docProps/", "_rels/"] + ): + continue + + try: + root_tag = lxml.etree.parse(str(xml_file)).getroot().tag + root_name = root_tag.split("}")[-1] if "}" in root_tag else root_tag + + if root_name in declarable_roots and path_str not in declared_parts: + errors.append( + f" {path_str}: File with <{root_name}> root not declared in [Content_Types].xml" + ) + + except Exception: + continue + + for file_path in all_files: + if file_path.suffix.lower() in {".xml", ".rels"}: + continue + if file_path.name == "[Content_Types].xml": + continue + if "_rels" in file_path.parts or "docProps" in file_path.parts: + continue + + extension = file_path.suffix.lstrip(".").lower() + if extension and extension not in declared_extensions: + if extension in media_extensions: + relative_path = file_path.relative_to(self.unpacked_dir) + errors.append( + f' {relative_path}: File with extension \'{extension}\' not declared in [Content_Types].xml - should add: ' + ) + + except Exception as e: + errors.append(f" Error parsing [Content_Types].xml: {e}") + + if errors: + print(f"FAILED - Found {len(errors)} content type declaration errors:") + for error in errors: + print(error) + return False + else: + if self.verbose: + print( + "PASSED - All content files are properly declared in [Content_Types].xml" + ) + return True + + def validate_file_against_xsd(self, xml_file, verbose=False): + xml_file = Path(xml_file).resolve() + unpacked_dir = self.unpacked_dir.resolve() + + is_valid, current_errors = self._validate_single_file_xsd( + xml_file, unpacked_dir + ) + + if is_valid is None: + return None, set() + elif is_valid: + return True, set() + + original_errors = self._get_original_file_errors(xml_file) + + assert current_errors is not None + new_errors = current_errors - original_errors + + new_errors = { + e for e in new_errors + if not any(pattern in e for pattern in self.IGNORED_VALIDATION_ERRORS) + } + + if new_errors: + if verbose: + relative_path = xml_file.relative_to(unpacked_dir) + print(f"FAILED - {relative_path}: {len(new_errors)} new error(s)") + for error in list(new_errors)[:3]: + truncated = error[:250] + "..." if len(error) > 250 else error + print(f" - {truncated}") + return False, new_errors + else: + if verbose: + print( + f"PASSED - No new errors (original had {len(current_errors)} errors)" + ) + return True, set() + + def validate_against_xsd(self): + new_errors = [] + original_error_count = 0 + valid_count = 0 + skipped_count = 0 + + for xml_file in self.xml_files: + relative_path = str(xml_file.relative_to(self.unpacked_dir)) + is_valid, new_file_errors = self.validate_file_against_xsd( + xml_file, verbose=False + ) + + if is_valid is None: + skipped_count += 1 + continue + elif is_valid and not new_file_errors: + valid_count += 1 + continue + elif is_valid: + original_error_count += 1 + valid_count += 1 + continue + + new_errors.append(f" {relative_path}: {len(new_file_errors)} new error(s)") + for error in list(new_file_errors)[:3]: + new_errors.append( + f" - {error[:250]}..." if len(error) > 250 else f" - {error}" + ) + + if self.verbose: + print(f"Validated {len(self.xml_files)} files:") + print(f" - Valid: {valid_count}") + print(f" - Skipped (no schema): {skipped_count}") + if original_error_count: + print(f" - With original errors (ignored): {original_error_count}") + print( + f" - With NEW errors: {len(new_errors) > 0 and len([e for e in new_errors if not e.startswith(' ')]) or 0}" + ) + + if new_errors: + print("\nFAILED - Found NEW validation errors:") + for error in new_errors: + print(error) + return False + else: + if self.verbose: + print("\nPASSED - No new XSD validation errors introduced") + return True + + def _get_schema_path(self, xml_file): + if xml_file.name in self.SCHEMA_MAPPINGS: + return self.schemas_dir / self.SCHEMA_MAPPINGS[xml_file.name] + + if xml_file.suffix == ".rels": + return self.schemas_dir / self.SCHEMA_MAPPINGS[".rels"] + + if "charts/" in str(xml_file) and xml_file.name.startswith("chart"): + return self.schemas_dir / self.SCHEMA_MAPPINGS["chart"] + + if "theme/" in str(xml_file) and xml_file.name.startswith("theme"): + return self.schemas_dir / self.SCHEMA_MAPPINGS["theme"] + + if xml_file.parent.name in self.MAIN_CONTENT_FOLDERS: + return self.schemas_dir / self.SCHEMA_MAPPINGS[xml_file.parent.name] + + return None + + def _clean_ignorable_namespaces(self, xml_doc): + xml_string = lxml.etree.tostring(xml_doc, encoding="unicode") + xml_copy = lxml.etree.fromstring(xml_string) + + for elem in xml_copy.iter(): + attrs_to_remove = [] + + for attr in elem.attrib: + if "{" in attr: + ns = attr.split("}")[0][1:] + if ns not in self.OOXML_NAMESPACES: + attrs_to_remove.append(attr) + + for attr in attrs_to_remove: + del elem.attrib[attr] + + self._remove_ignorable_elements(xml_copy) + + return lxml.etree.ElementTree(xml_copy) + + def _remove_ignorable_elements(self, root): + elements_to_remove = [] + + for elem in list(root): + if not hasattr(elem, "tag") or callable(elem.tag): + continue + + tag_str = str(elem.tag) + if tag_str.startswith("{"): + ns = tag_str.split("}")[0][1:] + if ns not in self.OOXML_NAMESPACES: + elements_to_remove.append(elem) + continue + + self._remove_ignorable_elements(elem) + + for elem in elements_to_remove: + root.remove(elem) + + def _preprocess_for_mc_ignorable(self, xml_doc): + root = xml_doc.getroot() + + if f"{{{self.MC_NAMESPACE}}}Ignorable" in root.attrib: + del root.attrib[f"{{{self.MC_NAMESPACE}}}Ignorable"] + + return xml_doc + + def _validate_single_file_xsd(self, xml_file, base_path): + schema_path = self._get_schema_path(xml_file) + if not schema_path: + return None, None + + try: + with open(schema_path, "rb") as xsd_file: + parser = lxml.etree.XMLParser() + xsd_doc = lxml.etree.parse( + xsd_file, parser=parser, base_url=str(schema_path) + ) + schema = lxml.etree.XMLSchema(xsd_doc) + + with open(xml_file, "r") as f: + xml_doc = lxml.etree.parse(f) + + xml_doc, _ = self._remove_template_tags_from_text_nodes(xml_doc) + xml_doc = self._preprocess_for_mc_ignorable(xml_doc) + + relative_path = xml_file.relative_to(base_path) + if ( + relative_path.parts + and relative_path.parts[0] in self.MAIN_CONTENT_FOLDERS + ): + xml_doc = self._clean_ignorable_namespaces(xml_doc) + + if schema.validate(xml_doc): + return True, set() + else: + errors = set() + for error in schema.error_log: + errors.add(error.message) + return False, errors + + except Exception as e: + return False, {str(e)} + + def _get_original_file_errors(self, xml_file): + if self.original_file is None: + return set() + + import tempfile + import zipfile + + xml_file = Path(xml_file).resolve() + unpacked_dir = self.unpacked_dir.resolve() + relative_path = xml_file.relative_to(unpacked_dir) + + with tempfile.TemporaryDirectory() as temp_dir: + temp_path = Path(temp_dir) + + with zipfile.ZipFile(self.original_file, "r") as zip_ref: + zip_ref.extractall(temp_path) + + original_xml_file = temp_path / relative_path + + if not original_xml_file.exists(): + return set() + + is_valid, errors = self._validate_single_file_xsd( + original_xml_file, temp_path + ) + return errors if errors else set() + + def _remove_template_tags_from_text_nodes(self, xml_doc): + warnings = [] + template_pattern = re.compile(r"\{\{[^}]*\}\}") + + xml_string = lxml.etree.tostring(xml_doc, encoding="unicode") + xml_copy = lxml.etree.fromstring(xml_string) + + def process_text_content(text, content_type): + if not text: + return text + matches = list(template_pattern.finditer(text)) + if matches: + for match in matches: + warnings.append( + f"Found template tag in {content_type}: {match.group()}" + ) + return template_pattern.sub("", text) + return text + + for elem in xml_copy.iter(): + if not hasattr(elem, "tag") or callable(elem.tag): + continue + tag_str = str(elem.tag) + if tag_str.endswith("}t") or tag_str == "t": + continue + + elem.text = process_text_content(elem.text, "text content") + elem.tail = process_text_content(elem.tail, "tail content") + + return lxml.etree.ElementTree(xml_copy), warnings + + +if __name__ == "__main__": + raise RuntimeError("This module should not be run directly.") diff --git a/skills/pptx-run-report/scripts/office/validators/docx.py b/skills/pptx-run-report/scripts/office/validators/docx.py new file mode 100644 index 0000000..fec405e --- /dev/null +++ b/skills/pptx-run-report/scripts/office/validators/docx.py @@ -0,0 +1,446 @@ +""" +Validator for Word document XML files against XSD schemas. +""" + +import random +import re +import tempfile +import zipfile + +import defusedxml.minidom +import lxml.etree + +from .base import BaseSchemaValidator + + +class DOCXSchemaValidator(BaseSchemaValidator): + + WORD_2006_NAMESPACE = "http://schemas.openxmlformats.org/wordprocessingml/2006/main" + W14_NAMESPACE = "http://schemas.microsoft.com/office/word/2010/wordml" + W16CID_NAMESPACE = "http://schemas.microsoft.com/office/word/2016/wordml/cid" + + ELEMENT_RELATIONSHIP_TYPES = {} + + def validate(self): + if not self.validate_xml(): + return False + + all_valid = True + if not self.validate_namespaces(): + all_valid = False + + if not self.validate_unique_ids(): + all_valid = False + + if not self.validate_file_references(): + all_valid = False + + if not self.validate_content_types(): + all_valid = False + + if not self.validate_against_xsd(): + all_valid = False + + if not self.validate_whitespace_preservation(): + all_valid = False + + if not self.validate_deletions(): + all_valid = False + + if not self.validate_insertions(): + all_valid = False + + if not self.validate_all_relationship_ids(): + all_valid = False + + if not self.validate_id_constraints(): + all_valid = False + + if not self.validate_comment_markers(): + all_valid = False + + self.compare_paragraph_counts() + + return all_valid + + def validate_whitespace_preservation(self): + errors = [] + + for xml_file in self.xml_files: + if xml_file.name != "document.xml": + continue + + try: + root = lxml.etree.parse(str(xml_file)).getroot() + + for elem in root.iter(f"{{{self.WORD_2006_NAMESPACE}}}t"): + if elem.text: + text = elem.text + if re.search(r"^[ \t\n\r]", text) or re.search( + r"[ \t\n\r]$", text + ): + xml_space_attr = f"{{{self.XML_NAMESPACE}}}space" + if ( + xml_space_attr not in elem.attrib + or elem.attrib[xml_space_attr] != "preserve" + ): + text_preview = ( + repr(text)[:50] + "..." + if len(repr(text)) > 50 + else repr(text) + ) + errors.append( + f" {xml_file.relative_to(self.unpacked_dir)}: " + f"Line {elem.sourceline}: w:t element with whitespace missing xml:space='preserve': {text_preview}" + ) + + except (lxml.etree.XMLSyntaxError, Exception) as e: + errors.append( + f" {xml_file.relative_to(self.unpacked_dir)}: Error: {e}" + ) + + if errors: + print(f"FAILED - Found {len(errors)} whitespace preservation violations:") + for error in errors: + print(error) + return False + else: + if self.verbose: + print("PASSED - All whitespace is properly preserved") + return True + + def validate_deletions(self): + errors = [] + + for xml_file in self.xml_files: + if xml_file.name != "document.xml": + continue + + try: + root = lxml.etree.parse(str(xml_file)).getroot() + namespaces = {"w": self.WORD_2006_NAMESPACE} + + for t_elem in root.xpath(".//w:del//w:t", namespaces=namespaces): + if t_elem.text: + text_preview = ( + repr(t_elem.text)[:50] + "..." + if len(repr(t_elem.text)) > 50 + else repr(t_elem.text) + ) + errors.append( + f" {xml_file.relative_to(self.unpacked_dir)}: " + f"Line {t_elem.sourceline}: found within : {text_preview}" + ) + + for instr_elem in root.xpath( + ".//w:del//w:instrText", namespaces=namespaces + ): + text_preview = ( + repr(instr_elem.text or "")[:50] + "..." + if len(repr(instr_elem.text or "")) > 50 + else repr(instr_elem.text or "") + ) + errors.append( + f" {xml_file.relative_to(self.unpacked_dir)}: " + f"Line {instr_elem.sourceline}: found within (use ): {text_preview}" + ) + + except (lxml.etree.XMLSyntaxError, Exception) as e: + errors.append( + f" {xml_file.relative_to(self.unpacked_dir)}: Error: {e}" + ) + + if errors: + print(f"FAILED - Found {len(errors)} deletion validation violations:") + for error in errors: + print(error) + return False + else: + if self.verbose: + print("PASSED - No w:t elements found within w:del elements") + return True + + def count_paragraphs_in_unpacked(self): + count = 0 + + for xml_file in self.xml_files: + if xml_file.name != "document.xml": + continue + + try: + root = lxml.etree.parse(str(xml_file)).getroot() + paragraphs = root.findall(f".//{{{self.WORD_2006_NAMESPACE}}}p") + count = len(paragraphs) + except Exception as e: + print(f"Error counting paragraphs in unpacked document: {e}") + + return count + + def count_paragraphs_in_original(self): + original = self.original_file + if original is None: + return 0 + + count = 0 + + try: + with tempfile.TemporaryDirectory() as temp_dir: + with zipfile.ZipFile(original, "r") as zip_ref: + zip_ref.extractall(temp_dir) + + doc_xml_path = temp_dir + "/word/document.xml" + root = lxml.etree.parse(doc_xml_path).getroot() + + paragraphs = root.findall(f".//{{{self.WORD_2006_NAMESPACE}}}p") + count = len(paragraphs) + + except Exception as e: + print(f"Error counting paragraphs in original document: {e}") + + return count + + def validate_insertions(self): + errors = [] + + for xml_file in self.xml_files: + if xml_file.name != "document.xml": + continue + + try: + root = lxml.etree.parse(str(xml_file)).getroot() + namespaces = {"w": self.WORD_2006_NAMESPACE} + + invalid_elements = root.xpath( + ".//w:ins//w:delText[not(ancestor::w:del)]", namespaces=namespaces + ) + + for elem in invalid_elements: + text_preview = ( + repr(elem.text or "")[:50] + "..." + if len(repr(elem.text or "")) > 50 + else repr(elem.text or "") + ) + errors.append( + f" {xml_file.relative_to(self.unpacked_dir)}: " + f"Line {elem.sourceline}: within : {text_preview}" + ) + + except (lxml.etree.XMLSyntaxError, Exception) as e: + errors.append( + f" {xml_file.relative_to(self.unpacked_dir)}: Error: {e}" + ) + + if errors: + print(f"FAILED - Found {len(errors)} insertion validation violations:") + for error in errors: + print(error) + return False + else: + if self.verbose: + print("PASSED - No w:delText elements within w:ins elements") + return True + + def compare_paragraph_counts(self): + original_count = self.count_paragraphs_in_original() + new_count = self.count_paragraphs_in_unpacked() + + diff = new_count - original_count + diff_str = f"+{diff}" if diff > 0 else str(diff) + print(f"\nParagraphs: {original_count} → {new_count} ({diff_str})") + + def _parse_id_value(self, val: str, base: int = 16) -> int: + return int(val, base) + + def validate_id_constraints(self): + errors = [] + para_id_attr = f"{{{self.W14_NAMESPACE}}}paraId" + durable_id_attr = f"{{{self.W16CID_NAMESPACE}}}durableId" + + for xml_file in self.xml_files: + try: + for elem in lxml.etree.parse(str(xml_file)).iter(): + if val := elem.get(para_id_attr): + if self._parse_id_value(val, base=16) >= 0x80000000: + errors.append( + f" {xml_file.name}:{elem.sourceline}: paraId={val} >= 0x80000000" + ) + + if val := elem.get(durable_id_attr): + if xml_file.name == "numbering.xml": + try: + if self._parse_id_value(val, base=10) >= 0x7FFFFFFF: + errors.append( + f" {xml_file.name}:{elem.sourceline}: " + f"durableId={val} >= 0x7FFFFFFF" + ) + except ValueError: + errors.append( + f" {xml_file.name}:{elem.sourceline}: " + f"durableId={val} must be decimal in numbering.xml" + ) + else: + if self._parse_id_value(val, base=16) >= 0x7FFFFFFF: + errors.append( + f" {xml_file.name}:{elem.sourceline}: " + f"durableId={val} >= 0x7FFFFFFF" + ) + except Exception: + pass + + if errors: + print(f"FAILED - {len(errors)} ID constraint violations:") + for e in errors: + print(e) + elif self.verbose: + print("PASSED - All paraId/durableId values within constraints") + return not errors + + def validate_comment_markers(self): + errors = [] + + document_xml = None + comments_xml = None + for xml_file in self.xml_files: + if xml_file.name == "document.xml" and "word" in str(xml_file): + document_xml = xml_file + elif xml_file.name == "comments.xml": + comments_xml = xml_file + + if not document_xml: + if self.verbose: + print("PASSED - No document.xml found (skipping comment validation)") + return True + + try: + doc_root = lxml.etree.parse(str(document_xml)).getroot() + namespaces = {"w": self.WORD_2006_NAMESPACE} + + range_starts = { + elem.get(f"{{{self.WORD_2006_NAMESPACE}}}id") + for elem in doc_root.xpath( + ".//w:commentRangeStart", namespaces=namespaces + ) + } + range_ends = { + elem.get(f"{{{self.WORD_2006_NAMESPACE}}}id") + for elem in doc_root.xpath( + ".//w:commentRangeEnd", namespaces=namespaces + ) + } + references = { + elem.get(f"{{{self.WORD_2006_NAMESPACE}}}id") + for elem in doc_root.xpath( + ".//w:commentReference", namespaces=namespaces + ) + } + + orphaned_ends = range_ends - range_starts + for comment_id in sorted( + orphaned_ends, key=lambda x: int(x) if x and x.isdigit() else 0 + ): + errors.append( + f' document.xml: commentRangeEnd id="{comment_id}" has no matching commentRangeStart' + ) + + orphaned_starts = range_starts - range_ends + for comment_id in sorted( + orphaned_starts, key=lambda x: int(x) if x and x.isdigit() else 0 + ): + errors.append( + f' document.xml: commentRangeStart id="{comment_id}" has no matching commentRangeEnd' + ) + + comment_ids = set() + if comments_xml and comments_xml.exists(): + comments_root = lxml.etree.parse(str(comments_xml)).getroot() + comment_ids = { + elem.get(f"{{{self.WORD_2006_NAMESPACE}}}id") + for elem in comments_root.xpath( + ".//w:comment", namespaces=namespaces + ) + } + + marker_ids = range_starts | range_ends | references + invalid_refs = marker_ids - comment_ids + for comment_id in sorted( + invalid_refs, key=lambda x: int(x) if x and x.isdigit() else 0 + ): + if comment_id: + errors.append( + f' document.xml: marker id="{comment_id}" references non-existent comment' + ) + + except (lxml.etree.XMLSyntaxError, Exception) as e: + errors.append(f" Error parsing XML: {e}") + + if errors: + print(f"FAILED - {len(errors)} comment marker violations:") + for error in errors: + print(error) + return False + else: + if self.verbose: + print("PASSED - All comment markers properly paired") + return True + + def repair(self) -> int: + repairs = super().repair() + repairs += self.repair_durableId() + return repairs + + def repair_durableId(self) -> int: + repairs = 0 + + for xml_file in self.xml_files: + try: + content = xml_file.read_text(encoding="utf-8") + dom = defusedxml.minidom.parseString(content) + modified = False + + for elem in dom.getElementsByTagName("*"): + if not elem.hasAttribute("w16cid:durableId"): + continue + + durable_id = elem.getAttribute("w16cid:durableId") + needs_repair = False + + if xml_file.name == "numbering.xml": + try: + needs_repair = ( + self._parse_id_value(durable_id, base=10) >= 0x7FFFFFFF + ) + except ValueError: + needs_repair = True + else: + try: + needs_repair = ( + self._parse_id_value(durable_id, base=16) >= 0x7FFFFFFF + ) + except ValueError: + needs_repair = True + + if needs_repair: + value = random.randint(1, 0x7FFFFFFE) + if xml_file.name == "numbering.xml": + new_id = str(value) + else: + new_id = f"{value:08X}" + + elem.setAttribute("w16cid:durableId", new_id) + print( + f" Repaired: {xml_file.name}: durableId {durable_id} → {new_id}" + ) + repairs += 1 + modified = True + + if modified: + xml_file.write_bytes(dom.toxml(encoding="UTF-8")) + + except Exception: + pass + + return repairs + + +if __name__ == "__main__": + raise RuntimeError("This module should not be run directly.") diff --git a/skills/pptx-run-report/scripts/office/validators/pptx.py b/skills/pptx-run-report/scripts/office/validators/pptx.py new file mode 100644 index 0000000..09842aa --- /dev/null +++ b/skills/pptx-run-report/scripts/office/validators/pptx.py @@ -0,0 +1,275 @@ +""" +Validator for PowerPoint presentation XML files against XSD schemas. +""" + +import re + +from .base import BaseSchemaValidator + + +class PPTXSchemaValidator(BaseSchemaValidator): + + PRESENTATIONML_NAMESPACE = ( + "http://schemas.openxmlformats.org/presentationml/2006/main" + ) + + ELEMENT_RELATIONSHIP_TYPES = { + "sldid": "slide", + "sldmasterid": "slidemaster", + "notesmasterid": "notesmaster", + "sldlayoutid": "slidelayout", + "themeid": "theme", + "tablestyleid": "tablestyles", + } + + def validate(self): + if not self.validate_xml(): + return False + + all_valid = True + if not self.validate_namespaces(): + all_valid = False + + if not self.validate_unique_ids(): + all_valid = False + + if not self.validate_uuid_ids(): + all_valid = False + + if not self.validate_file_references(): + all_valid = False + + if not self.validate_slide_layout_ids(): + all_valid = False + + if not self.validate_content_types(): + all_valid = False + + if not self.validate_against_xsd(): + all_valid = False + + if not self.validate_notes_slide_references(): + all_valid = False + + if not self.validate_all_relationship_ids(): + all_valid = False + + if not self.validate_no_duplicate_slide_layouts(): + all_valid = False + + return all_valid + + def validate_uuid_ids(self): + import lxml.etree + + errors = [] + uuid_pattern = re.compile( + r"^[\{\(]?[0-9A-Fa-f]{8}-?[0-9A-Fa-f]{4}-?[0-9A-Fa-f]{4}-?[0-9A-Fa-f]{4}-?[0-9A-Fa-f]{12}[\}\)]?$" + ) + + for xml_file in self.xml_files: + try: + root = lxml.etree.parse(str(xml_file)).getroot() + + for elem in root.iter(): + for attr, value in elem.attrib.items(): + attr_name = attr.split("}")[-1].lower() + if attr_name == "id" or attr_name.endswith("id"): + if self._looks_like_uuid(value): + if not uuid_pattern.match(value): + errors.append( + f" {xml_file.relative_to(self.unpacked_dir)}: " + f"Line {elem.sourceline}: ID '{value}' appears to be a UUID but contains invalid hex characters" + ) + + except (lxml.etree.XMLSyntaxError, Exception) as e: + errors.append( + f" {xml_file.relative_to(self.unpacked_dir)}: Error: {e}" + ) + + if errors: + print(f"FAILED - Found {len(errors)} UUID ID validation errors:") + for error in errors: + print(error) + return False + else: + if self.verbose: + print("PASSED - All UUID-like IDs contain valid hex values") + return True + + def _looks_like_uuid(self, value): + clean_value = value.strip("{}()").replace("-", "") + return len(clean_value) == 32 and all(c.isalnum() for c in clean_value) + + def validate_slide_layout_ids(self): + import lxml.etree + + errors = [] + + slide_masters = list(self.unpacked_dir.glob("ppt/slideMasters/*.xml")) + + if not slide_masters: + if self.verbose: + print("PASSED - No slide masters found") + return True + + for slide_master in slide_masters: + try: + root = lxml.etree.parse(str(slide_master)).getroot() + + rels_file = slide_master.parent / "_rels" / f"{slide_master.name}.rels" + + if not rels_file.exists(): + errors.append( + f" {slide_master.relative_to(self.unpacked_dir)}: " + f"Missing relationships file: {rels_file.relative_to(self.unpacked_dir)}" + ) + continue + + rels_root = lxml.etree.parse(str(rels_file)).getroot() + + valid_layout_rids = set() + for rel in rels_root.findall( + f".//{{{self.PACKAGE_RELATIONSHIPS_NAMESPACE}}}Relationship" + ): + rel_type = rel.get("Type", "") + if "slideLayout" in rel_type: + valid_layout_rids.add(rel.get("Id")) + + for sld_layout_id in root.findall( + f".//{{{self.PRESENTATIONML_NAMESPACE}}}sldLayoutId" + ): + r_id = sld_layout_id.get( + f"{{{self.OFFICE_RELATIONSHIPS_NAMESPACE}}}id" + ) + layout_id = sld_layout_id.get("id") + + if r_id and r_id not in valid_layout_rids: + errors.append( + f" {slide_master.relative_to(self.unpacked_dir)}: " + f"Line {sld_layout_id.sourceline}: sldLayoutId with id='{layout_id}' " + f"references r:id='{r_id}' which is not found in slide layout relationships" + ) + + except (lxml.etree.XMLSyntaxError, Exception) as e: + errors.append( + f" {slide_master.relative_to(self.unpacked_dir)}: Error: {e}" + ) + + if errors: + print(f"FAILED - Found {len(errors)} slide layout ID validation errors:") + for error in errors: + print(error) + print( + "Remove invalid references or add missing slide layouts to the relationships file." + ) + return False + else: + if self.verbose: + print("PASSED - All slide layout IDs reference valid slide layouts") + return True + + def validate_no_duplicate_slide_layouts(self): + import lxml.etree + + errors = [] + slide_rels_files = list(self.unpacked_dir.glob("ppt/slides/_rels/*.xml.rels")) + + for rels_file in slide_rels_files: + try: + root = lxml.etree.parse(str(rels_file)).getroot() + + layout_rels = [ + rel + for rel in root.findall( + f".//{{{self.PACKAGE_RELATIONSHIPS_NAMESPACE}}}Relationship" + ) + if "slideLayout" in rel.get("Type", "") + ] + + if len(layout_rels) > 1: + errors.append( + f" {rels_file.relative_to(self.unpacked_dir)}: has {len(layout_rels)} slideLayout references" + ) + + except Exception as e: + errors.append( + f" {rels_file.relative_to(self.unpacked_dir)}: Error: {e}" + ) + + if errors: + print("FAILED - Found slides with duplicate slideLayout references:") + for error in errors: + print(error) + return False + else: + if self.verbose: + print("PASSED - All slides have exactly one slideLayout reference") + return True + + def validate_notes_slide_references(self): + import lxml.etree + + errors = [] + notes_slide_references = {} + + slide_rels_files = list(self.unpacked_dir.glob("ppt/slides/_rels/*.xml.rels")) + + if not slide_rels_files: + if self.verbose: + print("PASSED - No slide relationship files found") + return True + + for rels_file in slide_rels_files: + try: + root = lxml.etree.parse(str(rels_file)).getroot() + + for rel in root.findall( + f".//{{{self.PACKAGE_RELATIONSHIPS_NAMESPACE}}}Relationship" + ): + rel_type = rel.get("Type", "") + if "notesSlide" in rel_type: + target = rel.get("Target", "") + if target: + normalized_target = target.replace("../", "") + + slide_name = rels_file.stem.replace( + ".xml", "" + ) + + if normalized_target not in notes_slide_references: + notes_slide_references[normalized_target] = [] + notes_slide_references[normalized_target].append( + (slide_name, rels_file) + ) + + except (lxml.etree.XMLSyntaxError, Exception) as e: + errors.append( + f" {rels_file.relative_to(self.unpacked_dir)}: Error: {e}" + ) + + for target, references in notes_slide_references.items(): + if len(references) > 1: + slide_names = [ref[0] for ref in references] + errors.append( + f" Notes slide '{target}' is referenced by multiple slides: {', '.join(slide_names)}" + ) + for slide_name, rels_file in references: + errors.append(f" - {rels_file.relative_to(self.unpacked_dir)}") + + if errors: + print( + f"FAILED - Found {len([e for e in errors if not e.startswith(' ')])} notes slide reference validation errors:" + ) + for error in errors: + print(error) + print("Each slide may optionally have its own slide file.") + return False + else: + if self.verbose: + print("PASSED - All notes slide references are unique") + return True + + +if __name__ == "__main__": + raise RuntimeError("This module should not be run directly.") diff --git a/skills/pptx-run-report/scripts/office/validators/redlining.py b/skills/pptx-run-report/scripts/office/validators/redlining.py new file mode 100644 index 0000000..71c81b6 --- /dev/null +++ b/skills/pptx-run-report/scripts/office/validators/redlining.py @@ -0,0 +1,247 @@ +""" +Validator for tracked changes in Word documents. +""" + +import subprocess +import tempfile +import zipfile +from pathlib import Path + + +class RedliningValidator: + + def __init__(self, unpacked_dir, original_docx, verbose=False, author="Claude"): + self.unpacked_dir = Path(unpacked_dir) + self.original_docx = Path(original_docx) + self.verbose = verbose + self.author = author + self.namespaces = { + "w": "http://schemas.openxmlformats.org/wordprocessingml/2006/main" + } + + def repair(self) -> int: + return 0 + + def validate(self): + modified_file = self.unpacked_dir / "word" / "document.xml" + if not modified_file.exists(): + print(f"FAILED - Modified document.xml not found at {modified_file}") + return False + + try: + import xml.etree.ElementTree as ET + + tree = ET.parse(modified_file) + root = tree.getroot() + + del_elements = root.findall(".//w:del", self.namespaces) + ins_elements = root.findall(".//w:ins", self.namespaces) + + author_del_elements = [ + elem + for elem in del_elements + if elem.get(f"{{{self.namespaces['w']}}}author") == self.author + ] + author_ins_elements = [ + elem + for elem in ins_elements + if elem.get(f"{{{self.namespaces['w']}}}author") == self.author + ] + + if not author_del_elements and not author_ins_elements: + if self.verbose: + print(f"PASSED - No tracked changes by {self.author} found.") + return True + + except Exception: + pass + + with tempfile.TemporaryDirectory() as temp_dir: + temp_path = Path(temp_dir) + + try: + with zipfile.ZipFile(self.original_docx, "r") as zip_ref: + zip_ref.extractall(temp_path) + except Exception as e: + print(f"FAILED - Error unpacking original docx: {e}") + return False + + original_file = temp_path / "word" / "document.xml" + if not original_file.exists(): + print( + f"FAILED - Original document.xml not found in {self.original_docx}" + ) + return False + + try: + import xml.etree.ElementTree as ET + + modified_tree = ET.parse(modified_file) + modified_root = modified_tree.getroot() + original_tree = ET.parse(original_file) + original_root = original_tree.getroot() + except ET.ParseError as e: + print(f"FAILED - Error parsing XML files: {e}") + return False + + self._remove_author_tracked_changes(original_root) + self._remove_author_tracked_changes(modified_root) + + modified_text = self._extract_text_content(modified_root) + original_text = self._extract_text_content(original_root) + + if modified_text != original_text: + error_message = self._generate_detailed_diff( + original_text, modified_text + ) + print(error_message) + return False + + if self.verbose: + print(f"PASSED - All changes by {self.author} are properly tracked") + return True + + def _generate_detailed_diff(self, original_text, modified_text): + error_parts = [ + f"FAILED - Document text doesn't match after removing {self.author}'s tracked changes", + "", + "Likely causes:", + " 1. Modified text inside another author's or tags", + " 2. Made edits without proper tracked changes", + " 3. Didn't nest inside when deleting another's insertion", + "", + "For pre-redlined documents, use correct patterns:", + " - To reject another's INSERTION: Nest inside their ", + " - To restore another's DELETION: Add new AFTER their ", + "", + ] + + git_diff = self._get_git_word_diff(original_text, modified_text) + if git_diff: + error_parts.extend(["Differences:", "============", git_diff]) + else: + error_parts.append("Unable to generate word diff (git not available)") + + return "\n".join(error_parts) + + def _get_git_word_diff(self, original_text, modified_text): + try: + with tempfile.TemporaryDirectory() as temp_dir: + temp_path = Path(temp_dir) + + original_file = temp_path / "original.txt" + modified_file = temp_path / "modified.txt" + + original_file.write_text(original_text, encoding="utf-8") + modified_file.write_text(modified_text, encoding="utf-8") + + result = subprocess.run( + [ + "git", + "diff", + "--word-diff=plain", + "--word-diff-regex=.", + "-U0", + "--no-index", + str(original_file), + str(modified_file), + ], + capture_output=True, + text=True, + ) + + if result.stdout.strip(): + lines = result.stdout.split("\n") + content_lines = [] + in_content = False + for line in lines: + if line.startswith("@@"): + in_content = True + continue + if in_content and line.strip(): + content_lines.append(line) + + if content_lines: + return "\n".join(content_lines) + + result = subprocess.run( + [ + "git", + "diff", + "--word-diff=plain", + "-U0", + "--no-index", + str(original_file), + str(modified_file), + ], + capture_output=True, + text=True, + ) + + if result.stdout.strip(): + lines = result.stdout.split("\n") + content_lines = [] + in_content = False + for line in lines: + if line.startswith("@@"): + in_content = True + continue + if in_content and line.strip(): + content_lines.append(line) + return "\n".join(content_lines) + + except (subprocess.CalledProcessError, FileNotFoundError, Exception): + pass + + return None + + def _remove_author_tracked_changes(self, root): + ins_tag = f"{{{self.namespaces['w']}}}ins" + del_tag = f"{{{self.namespaces['w']}}}del" + author_attr = f"{{{self.namespaces['w']}}}author" + + for parent in root.iter(): + to_remove = [] + for child in parent: + if child.tag == ins_tag and child.get(author_attr) == self.author: + to_remove.append(child) + for elem in to_remove: + parent.remove(elem) + + deltext_tag = f"{{{self.namespaces['w']}}}delText" + t_tag = f"{{{self.namespaces['w']}}}t" + + for parent in root.iter(): + to_process = [] + for child in parent: + if child.tag == del_tag and child.get(author_attr) == self.author: + to_process.append((child, list(parent).index(child))) + + for del_elem, del_index in reversed(to_process): + for elem in del_elem.iter(): + if elem.tag == deltext_tag: + elem.tag = t_tag + + for child in reversed(list(del_elem)): + parent.insert(del_index, child) + parent.remove(del_elem) + + def _extract_text_content(self, root): + p_tag = f"{{{self.namespaces['w']}}}p" + t_tag = f"{{{self.namespaces['w']}}}t" + + paragraphs = [] + for p_elem in root.findall(f".//{p_tag}"): + text_parts = [] + for t_elem in p_elem.findall(f".//{t_tag}"): + if t_elem.text: + text_parts.append(t_elem.text) + paragraph_text = "".join(text_parts) + if paragraph_text: + paragraphs.append(paragraph_text) + + return "\n".join(paragraphs) + + +if __name__ == "__main__": + raise RuntimeError("This module should not be run directly.") diff --git a/skills/pptx-run-report/scripts/thumbnail.py b/skills/pptx-run-report/scripts/thumbnail.py new file mode 100755 index 0000000..edcbdc0 --- /dev/null +++ b/skills/pptx-run-report/scripts/thumbnail.py @@ -0,0 +1,289 @@ +"""Create thumbnail grids from PowerPoint presentation slides. + +Creates a grid layout of slide thumbnails for quick visual analysis. +Labels each thumbnail with its XML filename (e.g., slide1.xml). +Hidden slides are shown with a placeholder pattern. + +Usage: + python thumbnail.py input.pptx [output_prefix] [--cols N] + +Examples: + python thumbnail.py presentation.pptx + # Creates: thumbnails.jpg + + python thumbnail.py template.pptx grid --cols 4 + # Creates: grid.jpg (or grid-1.jpg, grid-2.jpg for large decks) +""" + +import argparse +import subprocess +import sys +import tempfile +import zipfile +from pathlib import Path + +import defusedxml.minidom +from office.soffice import get_soffice_env +from PIL import Image, ImageDraw, ImageFont + +THUMBNAIL_WIDTH = 300 +CONVERSION_DPI = 100 +MAX_COLS = 6 +DEFAULT_COLS = 3 +JPEG_QUALITY = 95 +GRID_PADDING = 20 +BORDER_WIDTH = 2 +FONT_SIZE_RATIO = 0.10 +LABEL_PADDING_RATIO = 0.4 + + +def main(): + parser = argparse.ArgumentParser( + description="Create thumbnail grids from PowerPoint slides." + ) + parser.add_argument("input", help="Input PowerPoint file (.pptx)") + parser.add_argument( + "output_prefix", + nargs="?", + default="thumbnails", + help="Output prefix for image files (default: thumbnails)", + ) + parser.add_argument( + "--cols", + type=int, + default=DEFAULT_COLS, + help=f"Number of columns (default: {DEFAULT_COLS}, max: {MAX_COLS})", + ) + + args = parser.parse_args() + + cols = min(args.cols, MAX_COLS) + if args.cols > MAX_COLS: + print(f"Warning: Columns limited to {MAX_COLS}") + + input_path = Path(args.input) + if not input_path.exists() or input_path.suffix.lower() != ".pptx": + print(f"Error: Invalid PowerPoint file: {args.input}", file=sys.stderr) + sys.exit(1) + + output_path = Path(f"{args.output_prefix}.jpg") + + try: + slide_info = get_slide_info(input_path) + + with tempfile.TemporaryDirectory() as temp_dir: + temp_path = Path(temp_dir) + visible_images = convert_to_images(input_path, temp_path) + + if not visible_images and not any(s["hidden"] for s in slide_info): + print("Error: No slides found", file=sys.stderr) + sys.exit(1) + + slides = build_slide_list(slide_info, visible_images, temp_path) + + grid_files = create_grids(slides, cols, THUMBNAIL_WIDTH, output_path) + + print(f"Created {len(grid_files)} grid(s):") + for grid_file in grid_files: + print(f" {grid_file}") + + except Exception as e: + print(f"Error: {e}", file=sys.stderr) + sys.exit(1) + + +def get_slide_info(pptx_path: Path) -> list[dict]: + with zipfile.ZipFile(pptx_path, "r") as zf: + rels_content = zf.read("ppt/_rels/presentation.xml.rels").decode("utf-8") + rels_dom = defusedxml.minidom.parseString(rels_content) + + rid_to_slide = {} + for rel in rels_dom.getElementsByTagName("Relationship"): + rid = rel.getAttribute("Id") + target = rel.getAttribute("Target") + rel_type = rel.getAttribute("Type") + if "slide" in rel_type and target.startswith("slides/"): + rid_to_slide[rid] = target.replace("slides/", "") + + pres_content = zf.read("ppt/presentation.xml").decode("utf-8") + pres_dom = defusedxml.minidom.parseString(pres_content) + + slides = [] + for sld_id in pres_dom.getElementsByTagName("p:sldId"): + rid = sld_id.getAttribute("r:id") + if rid in rid_to_slide: + hidden = sld_id.getAttribute("show") == "0" + slides.append({"name": rid_to_slide[rid], "hidden": hidden}) + + return slides + + +def build_slide_list( + slide_info: list[dict], + visible_images: list[Path], + temp_dir: Path, +) -> list[tuple[Path, str]]: + if visible_images: + with Image.open(visible_images[0]) as img: + placeholder_size = img.size + else: + placeholder_size = (1920, 1080) + + slides = [] + visible_idx = 0 + + for info in slide_info: + if info["hidden"]: + placeholder_path = temp_dir / f"hidden-{info['name']}.jpg" + placeholder_img = create_hidden_placeholder(placeholder_size) + placeholder_img.save(placeholder_path, "JPEG") + slides.append((placeholder_path, f"{info['name']} (hidden)")) + else: + if visible_idx < len(visible_images): + slides.append((visible_images[visible_idx], info["name"])) + visible_idx += 1 + + return slides + + +def create_hidden_placeholder(size: tuple[int, int]) -> Image.Image: + img = Image.new("RGB", size, color="#F0F0F0") + draw = ImageDraw.Draw(img) + line_width = max(5, min(size) // 100) + draw.line([(0, 0), size], fill="#CCCCCC", width=line_width) + draw.line([(size[0], 0), (0, size[1])], fill="#CCCCCC", width=line_width) + return img + + +def convert_to_images(pptx_path: Path, temp_dir: Path) -> list[Path]: + pdf_path = temp_dir / f"{pptx_path.stem}.pdf" + + result = subprocess.run( + [ + "soffice", + "--headless", + "--convert-to", + "pdf", + "--outdir", + str(temp_dir), + str(pptx_path), + ], + capture_output=True, + text=True, + env=get_soffice_env(), + ) + if result.returncode != 0 or not pdf_path.exists(): + raise RuntimeError("PDF conversion failed") + + result = subprocess.run( + [ + "pdftoppm", + "-jpeg", + "-r", + str(CONVERSION_DPI), + str(pdf_path), + str(temp_dir / "slide"), + ], + capture_output=True, + text=True, + ) + if result.returncode != 0: + raise RuntimeError("Image conversion failed") + + return sorted(temp_dir.glob("slide-*.jpg")) + + +def create_grids( + slides: list[tuple[Path, str]], + cols: int, + width: int, + output_path: Path, +) -> list[str]: + max_per_grid = cols * (cols + 1) + grid_files = [] + + for chunk_idx, start_idx in enumerate(range(0, len(slides), max_per_grid)): + end_idx = min(start_idx + max_per_grid, len(slides)) + chunk_slides = slides[start_idx:end_idx] + + grid = create_grid(chunk_slides, cols, width) + + if len(slides) <= max_per_grid: + grid_filename = output_path + else: + stem = output_path.stem + suffix = output_path.suffix + grid_filename = output_path.parent / f"{stem}-{chunk_idx + 1}{suffix}" + + grid_filename.parent.mkdir(parents=True, exist_ok=True) + grid.save(str(grid_filename), quality=JPEG_QUALITY) + grid_files.append(str(grid_filename)) + + return grid_files + + +def create_grid( + slides: list[tuple[Path, str]], + cols: int, + width: int, +) -> Image.Image: + font_size = int(width * FONT_SIZE_RATIO) + label_padding = int(font_size * LABEL_PADDING_RATIO) + + with Image.open(slides[0][0]) as img: + aspect = img.height / img.width + height = int(width * aspect) + + rows = (len(slides) + cols - 1) // cols + grid_w = cols * width + (cols + 1) * GRID_PADDING + grid_h = rows * (height + font_size + label_padding * 2) + (rows + 1) * GRID_PADDING + + grid = Image.new("RGB", (grid_w, grid_h), "white") + draw = ImageDraw.Draw(grid) + + try: + font = ImageFont.load_default(size=font_size) + except Exception: + font = ImageFont.load_default() + + for i, (img_path, slide_name) in enumerate(slides): + row, col = i // cols, i % cols + x = col * width + (col + 1) * GRID_PADDING + y_base = ( + row * (height + font_size + label_padding * 2) + (row + 1) * GRID_PADDING + ) + + label = slide_name + bbox = draw.textbbox((0, 0), label, font=font) + text_w = bbox[2] - bbox[0] + draw.text( + (x + (width - text_w) // 2, y_base + label_padding), + label, + fill="black", + font=font, + ) + + y_thumbnail = y_base + label_padding + font_size + label_padding + + with Image.open(img_path) as img: + img.thumbnail((width, height), Image.Resampling.LANCZOS) + w, h = img.size + tx = x + (width - w) // 2 + ty = y_thumbnail + (height - h) // 2 + grid.paste(img, (tx, ty)) + + if BORDER_WIDTH > 0: + draw.rectangle( + [ + (tx - BORDER_WIDTH, ty - BORDER_WIDTH), + (tx + w + BORDER_WIDTH - 1, ty + h + BORDER_WIDTH - 1), + ], + outline="gray", + width=BORDER_WIDTH, + ) + + return grid + + +if __name__ == "__main__": + main() diff --git a/tests/test_cli.py b/tests/test_cli.py new file mode 100644 index 0000000..3b33a55 --- /dev/null +++ b/tests/test_cli.py @@ -0,0 +1,195 @@ +import shutil +import zipfile +from pathlib import Path +from types import SimpleNamespace + +import pytest + +from codex_autoloop import cli +from codex_autoloop.apps import cli_app +from codex_autoloop.models import CodexRunResult, ReviewDecision + + +class _DummyBtwAgent: + def __init__(self, *args, **kwargs) -> None: # type: ignore[no-untyped-def] + return + + def start_async(self, *args, **kwargs) -> bool: # type: ignore[no-untyped-def] + return False + + +class _DummyLoopEngine: + def __init__(self, **kwargs) -> None: # type: ignore[no-untyped-def] + self.state_store = kwargs["state_store"] + self.config = kwargs["config"] + + def run(self): # type: ignore[no-untyped-def] + pptx_path = Path(self.state_store.pptx_report_path()) + pptx_path.parent.mkdir(parents=True, exist_ok=True) + pptx_path.write_bytes(b"pptx-smoke") + self.state_store.record_pptx_report(str(pptx_path)) + return SimpleNamespace( + success=True, + session_id="thread-1", + stop_reason="smoke test complete", + rounds=[], + ) + + +class _PptxE2ERunner: + def __init__(self) -> None: + self.calls: list[dict[str, object]] = [] + + def run_exec(self, **kwargs): # type: ignore[no-untyped-def] + self.calls.append(kwargs) + run_label = kwargs.get("run_label") + if run_label == "main": + return CodexRunResult( + command=["codex", "exec"], + exit_code=0, + thread_id="thread-1", + agent_messages=["DONE:\n- implemented\nREMAINING:\n- none\nBLOCKERS:\n- none"], + turn_completed=True, + turn_failed=False, + fatal_error=None, + ) + if run_label == "main-final-report": + return CodexRunResult( + command=["codex", "exec", "resume"], + exit_code=1, + thread_id="thread-1", + agent_messages=["report not written in test runner"], + turn_completed=False, + turn_failed=True, + fatal_error="report write skipped in test runner", + ) + if run_label == "main-pptx-report": + # Agent attempt fails; fallback JS script will generate the PPTX + return CodexRunResult( + command=["codex", "exec", "resume"], + exit_code=1, + thread_id="thread-1", + agent_messages=["pptx not written in test runner"], + turn_completed=False, + turn_failed=True, + fatal_error="pptx write skipped in test runner", + ) + raise AssertionError(f"unexpected run label: {run_label}") + + +class _DoneReviewer: + def evaluate(self, **kwargs): # type: ignore[no-untyped-def] + return ReviewDecision( + status="done", + confidence=1.0, + reason="complete", + next_action="stop", + round_summary_markdown="## Round Summary\n- done\n", + completion_summary_markdown="## Completion\n- complete\n", + ) + + +def test_build_parser_accepts_pptx_report_file_option() -> None: + args = cli.build_parser().parse_args( + [ + "--pptx-report-file", + "/tmp/run-report.pptx", + "开始工作", + ] + ) + + assert args.pptx_report_file == "/tmp/run-report.pptx" + + +def test_build_parser_help_mentions_pptx_report_file() -> None: + help_text = cli.build_parser().format_help() + + assert "--pptx-report-file PPTX_REPORT_FILE" in help_text + assert "auto-generated PPTX run report" in help_text + + +def test_run_cli_smoke_returns_pptx_report_payload(tmp_path, monkeypatch) -> None: + state_file = tmp_path / "state.json" + pptx_report = tmp_path / "artifacts" / "run-report.pptx" + + monkeypatch.setattr(cli_app, "build_codex_runner", lambda **kwargs: object()) + monkeypatch.setattr(cli_app, "BtwAgent", _DummyBtwAgent) + monkeypatch.setattr(cli_app, "Reviewer", lambda runner: object()) + monkeypatch.setattr(cli_app, "Planner", lambda runner: object()) + monkeypatch.setattr(cli_app, "LoopEngine", _DummyLoopEngine) + + args = cli.build_parser().parse_args( + [ + "--state-file", + str(state_file), + "--pptx-report-file", + str(pptx_report), + "开始工作", + ] + ) + if not args.planner: + args.plan_mode = "off" + if args.main_prompt_file is None: + args.main_prompt_file = cli.resolve_main_prompt_file( + state_file=args.state_file, + control_file=args.control_file, + ) + + payload, exit_code = cli_app.run_cli(args) + + assert exit_code == 0 + assert payload["success"] is True + assert payload["session_id"] == "thread-1" + assert payload["pptx_report_file"] == str(pptx_report) + assert payload["pptx_report_ready"] is True + assert pptx_report.exists() + + +def test_run_cli_generates_default_pptx_report_artifact_end_to_end(tmp_path, monkeypatch) -> None: + project_root = Path(__file__).resolve().parents[1] + if shutil.which("node") is None: + pytest.skip("node is required for real PPTX generation") + if not (project_root / "node_modules" / "pptxgenjs").exists(): + pytest.skip("pptxgenjs runtime is not installed in this workspace") + + state_file = tmp_path / "state.json" + default_pptx_report = tmp_path / "run-report.pptx" + runner = _PptxE2ERunner() + + monkeypatch.setattr(cli_app, "build_codex_runner", lambda **kwargs: runner) + monkeypatch.setattr(cli_app, "BtwAgent", _DummyBtwAgent) + monkeypatch.setattr(cli_app, "Reviewer", lambda runner: _DoneReviewer()) + + args = cli.build_parser().parse_args( + [ + "--state-file", + str(state_file), + "--no-planner", + "开始工作", + ] + ) + if not args.planner: + args.plan_mode = "off" + if args.main_prompt_file is None: + args.main_prompt_file = cli.resolve_main_prompt_file( + state_file=args.state_file, + control_file=args.control_file, + ) + + payload, exit_code = cli_app.run_cli(args) + + assert exit_code == 0 + assert payload["success"] is True + assert payload["session_id"] == "thread-1" + assert payload["pptx_report_file"] == str(default_pptx_report) + assert payload["pptx_report_ready"] is True + assert default_pptx_report.exists() + assert default_pptx_report.stat().st_size > 0 + assert any(call.get("run_label") == "main" for call in runner.calls) + assert any(call.get("run_label") == "main-final-report" for call in runner.calls) + assert any(call.get("run_label") == "main-pptx-report" for call in runner.calls) + + with zipfile.ZipFile(default_pptx_report) as archive: + names = set(archive.namelist()) + assert "[Content_Types].xml" in names + assert "ppt/presentation.xml" in names diff --git a/tests/test_control_state.py b/tests/test_control_state.py index 90c33a2..9477447 100644 --- a/tests/test_control_state.py +++ b/tests/test_control_state.py @@ -1,3 +1,5 @@ +import json + from codex_autoloop.control_state import LoopControlState from codex_autoloop.core.state_store import LoopStateStore from codex_autoloop.models import PlanDecision, ReviewDecision, RoundSummary @@ -137,6 +139,28 @@ def test_state_store_writes_plan_and_review_docs(tmp_path) -> None: assert state.final_report_path() == str(final_report) +def test_state_store_records_pptx_report_in_runtime_and_state_file(tmp_path) -> None: + state_file = tmp_path / "state.json" + pptx_report = tmp_path / "run-report.pptx" + state = LoopStateStore( + objective="ship feature", + state_file=str(state_file), + pptx_report_file=str(pptx_report), + plan_mode="off", + ) + + assert state.has_pptx_report() is False + + state.record_pptx_report(str(pptx_report)) + + assert state.has_pptx_report() is True + assert state.pptx_report_path() == str(pptx_report) + + payload = json.loads(state_file.read_text(encoding="utf-8")) + assert payload["pptx_report_file"] == str(pptx_report) + assert payload["pptx_report_ready"] is True + + def test_state_store_renders_plan_and_review_context(tmp_path) -> None: state = LoopStateStore( objective="ship feature", diff --git a/tests/test_dashboard.py b/tests/test_dashboard.py index cb40276..5775777 100644 --- a/tests/test_dashboard.py +++ b/tests/test_dashboard.py @@ -1,10 +1,12 @@ from codex_autoloop.dashboard import DashboardStore from codex_autoloop.cli import ( + format_control_status, parse_telegram_events, resolve_final_report_file, resolve_operator_messages_file, resolve_plan_report_file, resolve_plan_todo_file, + resolve_pptx_report_file, ) @@ -75,3 +77,35 @@ def test_resolve_final_report_file_uses_review_dir() -> None: state_file=None, ) assert out == "/tmp/reviews/final-task-report.md" + + +def test_resolve_pptx_report_file_uses_operator_messages_dir() -> None: + out = resolve_pptx_report_file( + explicit_path=None, + operator_messages_file="/tmp/operator_messages.md", + control_file=None, + state_file=None, + ) + assert out == "/tmp/run-report.pptx" + + +def test_format_control_status_includes_pptx_report_details() -> None: + rendered = format_control_status( + { + "status": "completed", + "round": 3, + "session_id": "thread-1", + "success": True, + "stop_reason": "done", + "plan_mode": "auto", + "final_report_file": "/tmp/final-task-report.md", + "final_report_ready": True, + "pptx_report_file": "/tmp/run-report.pptx", + "pptx_report_ready": True, + } + ) + + assert "final_report_file=/tmp/final-task-report.md" in rendered + assert "final_report_ready=True" in rendered + assert "pptx_report_file=/tmp/run-report.pptx" in rendered + assert "pptx_report_ready=True" in rendered diff --git a/tests/test_event_sinks.py b/tests/test_event_sinks.py index 9010b63..723109c 100644 --- a/tests/test_event_sinks.py +++ b/tests/test_event_sinks.py @@ -83,6 +83,18 @@ def test_feishu_event_sink_sends_final_report_immediately(tmp_path: Path) -> Non assert notifier.files == [(str(report), "ArgusBot final task report")] +def test_feishu_event_sink_sends_pptx_report_immediately(tmp_path: Path) -> None: + notifier = _FakeFeishuNotifier() + report = tmp_path / "run-report.pptx" + report.write_bytes(b"pptx") + sink = FeishuEventSink(notifier=notifier, live_updates=False, live_interval_seconds=30) + + sink.handle_event({"type": "pptx.report.ready", "path": str(report)}) + + assert notifier.messages == [] + assert notifier.files == [(str(report), "ArgusBot run report (PPTX)")] + + def test_telegram_event_sink_sends_final_report_immediately(tmp_path: Path) -> None: notifier = _FakeFeishuNotifier() report = tmp_path / "final-task-report.md" @@ -96,6 +108,18 @@ def test_telegram_event_sink_sends_final_report_immediately(tmp_path: Path) -> N assert notifier.files == [(str(report), "ArgusBot final task report")] +def test_telegram_event_sink_sends_pptx_report_immediately(tmp_path: Path) -> None: + notifier = _FakeFeishuNotifier() + report = tmp_path / "run-report.pptx" + report.write_bytes(b"pptx") + sink = TelegramEventSink(notifier=notifier, live_updates=False, live_interval_seconds=30) + + sink.handle_event({"type": "pptx.report.ready", "path": str(report)}) + + assert notifier.messages == [] + assert notifier.files == [(str(report), "ArgusBot run report (PPTX)")] + + def test_telegram_event_sink_discards_live_update_backlog_after_final_report(tmp_path: Path) -> None: notifier = _FakeFeishuNotifier() report = tmp_path / "final-task-report.md" diff --git a/tests/test_loop_engine.py b/tests/test_loop_engine.py index 50b08a7..9ecdafa 100644 --- a/tests/test_loop_engine.py +++ b/tests/test_loop_engine.py @@ -55,6 +55,20 @@ def test_loop_engine_stops_immediately_on_quota_exhaustion() -> None: fatal_error="You exceeded your current quota, please check your plan and billing details.", ) ] + + +class _CapturingEventSink: + def __init__(self) -> None: + self.events: list[dict[str, object]] = [] + + def handle_event(self, event: dict[str, object]) -> None: + self.events.append(event) + + def handle_stream_line(self, stream: str, line: str) -> None: + return + + def close(self) -> None: + return ) engine = LoopEngine( runner=runner, # type: ignore[arg-type] @@ -480,3 +494,84 @@ def test_loop_engine_fully_plan_continues_follow_up_but_emits_final_report_once( assert len(result.rounds) == 2 assert [call.get("run_label") for call in runner.calls].count("main-final-report") == 1 assert [event.get("type") for event in event_sink.events].count("final.report.ready") == 1 + + +def test_loop_engine_generates_pptx_report_on_completion(tmp_path: Path, monkeypatch) -> None: + report_path = tmp_path / "final-task-report.md" + pptx_path = tmp_path / "run-report.pptx" + runner = _ReportWritingRunner( + outputs=[ + CodexRunResult( + command=["codex", "exec"], + exit_code=0, + thread_id="thread-1", + agent_messages=["DONE:\n- implemented\nREMAINING:\n- none\nBLOCKERS:\n- none"], + turn_completed=True, + turn_failed=False, + fatal_error=None, + ), + CodexRunResult( + command=["codex", "exec", "resume"], + exit_code=0, + thread_id="thread-1", + agent_messages=[f"REPORT_PATH: {report_path}\nREPORT_STATUS: written"], + turn_completed=True, + turn_failed=False, + fatal_error=None, + ), + # Third call: main agent PPTX attempt (will "fail" so fallback kicks in) + CodexRunResult( + command=["codex", "exec", "resume"], + exit_code=0, + thread_id="thread-1", + agent_messages=["PPTX_REPORT_STATUS: failed"], + turn_completed=True, + turn_failed=False, + fatal_error=None, + ), + ], + report_path=report_path, + ) + state_store = LoopStateStore( + objective="完成实验", + final_report_file=str(report_path), + pptx_report_file=str(pptx_path), + plan_mode="off", + ) + state_store.record_message( + text="operator requested a PPTX run report", + source="operator", + kind="initial-objective", + ) + event_sink = _CapturingEventSink() + observed: dict[str, object] = {} + + def fake_generate_pptx_fallback(**kwargs): # type: ignore[no-untyped-def] + observed.update(kwargs) + output_path = Path(str(kwargs["output_path"])) + output_path.write_bytes(b"pptx") + return str(output_path) + + monkeypatch.setattr("codex_autoloop.core.engine.generate_pptx_report_fallback", fake_generate_pptx_fallback) + + engine = LoopEngine( + runner=runner, # type: ignore[arg-type] + reviewer=_DoneReviewer(), # type: ignore[arg-type] + planner=None, + state_store=state_store, + event_sink=event_sink, # type: ignore[arg-type] + config=LoopConfig( + objective="完成实验", + max_rounds=3, + ), + ) + + result = engine.run() + + assert result.success is True + assert pptx_path.exists() + assert state_store.has_pptx_report() is True + assert observed["objective"] == "完成实验" + assert observed["output_path"] == str(pptx_path) + assert observed["plan_mode"] == "off" + assert any(item["type"] == "pptx.report.ready" for item in event_sink.events)