microsoft · jdrhyne · Feb 15, 2026 · Copilot · Feb 15, 2026 · Copilot
diff --git a/.github/skills/nutrient-document-processing/SKILL.md b/.github/skills/nutrient-document-processing/SKILL.md
@@ -0,0 +1,296 @@
+---
+name: nutrient-document-processing
+description: |
+  Nutrient Document Web Services (DWS) REST API for document processing. Convert between formats (DOCX/XLSX/PPTX ↔ PDF ↔ images), extract text/tables/key-value pairs, apply OCR to scanned documents, redact sensitive information with pattern matching or AI, add digital signatures and watermarks, and fill PDF forms. Language-agnostic REST API. Triggers: "convert to PDF", "PDF to Word", "extract text from PDF", "OCR", "redact PII", "redact SSN", "watermark PDF", "sign PDF", "document processing", "Nutrient".
+---
+
+# Nutrient Document Web Services (DWS) API
+
+Full PDF lifecycle processing: convert, extract, OCR, redact, sign, and watermark documents via REST API.
+
+## API Endpoint
+
+```
+https://api.nutrient.io
+```
+
+## Environment Variables
+
+```bash
+NUTRIENT_API_KEY=<your-api-key>
+```
+
+Get an API key at [nutrient.io/api](https://www.nutrient.io/api/)
+
+## Authentication
+
+All requests use Bearer token authentication:
+
+```bash
+curl -X POST https://api.nutrient.io/build \
+  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
+  -F instructions='{ ... }' \
+  -F document=@input.pdf \
+  -o output.pdf
+```
+
+## API Structure
+
+The API uses a single `/build` endpoint with an `instructions` JSON payload. The `parts` array defines the processing pipeline.
+
+```json
+{
+  "parts": [
+    {
+      "file": "document"
+    }
+  ],
+  "actions": [
+    {
+      "type": "<action-type>",
+      ...action-specific-options
+    }
+  ]
+}
+```
+
+## Core Workflows
+
+### 1. Convert DOCX/XLSX/PPTX to PDF
+
+```bash
+curl -X POST https://api.nutrient.io/build \
+  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
+  -F instructions='{ "parts": [{ "file": "document" }] }' \
+  -F document=@report.docx \
+  -o report.pdf
+```
+
+### 2. Convert PDF to Images (PNG/JPEG/WebP)
+
+```bash
+curl -X POST https://api.nutrient.io/build \
+  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
+  -F instructions='{
+    "parts": [{ "file": "document" }],
+    "actions": [{
+      "type": "renderPages",
+      "outputFormat": { "type": "png", "dpi": 150 },
+      "pages": { "start": 0, "end": 0 }
+    }]
+  }' \
+  -F document=@input.pdf \
+  -o page.png
+```
+
+### 3. Convert PDF to Office (DOCX/XLSX/PPTX)
+
+```bash
+curl -X POST https://api.nutrient.io/build \
+  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
+  -F instructions='{
+    "parts": [{ "file": "document" }],
+    "actions": [{ "type": "office", "format": "docx" }]
+  }' \
+  -F document=@input.pdf \
+  -o output.docx
+```
+
+### 4. Extract Text
+
+```bash
+curl -X POST https://api.nutrient.io/build \
+  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
+  -F instructions='{
+    "parts": [{ "file": "document" }],
+    "actions": [{ "type": "text", "outputFormat": "plain" }]
+  }' \
+  -F document=@input.pdf \
+  -o extracted.txt
+```
+
+### 5. Extract Tables
+
+```bash
+curl -X POST https://api.nutrient.io/build \
+  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
+  -F instructions='{
+    "parts": [{ "file": "document" }],
+    "actions": [{ "type": "tables" }]
+  }' \
+  -F document=@input.pdf \
+  -o tables.json
+```
+
+### 6. OCR Scanned Documents
+
+```bash
+curl -X POST https://api.nutrient.io/build \
+  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
+  -F instructions='{
+    "parts": [{ "file": "document" }],
+    "actions": [{
+      "type": "ocr",
+      "language": "english"
+    }]
+  }' \
+  -F document=@scanned.pdf \
+  -o searchable.pdf
+```
+
+### 7. Redact with Pattern Matching
+
+Preset patterns: `credit-card-number`, `date`, `email-address`, `international-phone-number`, `ipv4`, `ipv6`, `mac-address`, `north-american-phone-number`, `social-security-number`, `time`, `url`, `us-zip-code`, `vin`.
+
+```bash
+curl -X POST https://api.nutrient.io/build \
+  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
+  -F instructions='{
+    "parts": [{ "file": "document" }],
+    "actions": [{
+      "type": "redact",
+      "strategy": "preset",
+      "preset": "social-security-number"
+    }]
+  }' \
+  -F document=@input.pdf \
+  -o redacted.pdf
+```
+
+**Custom regex redaction:**
+
+```bash
+curl -X POST https://api.nutrient.io/build \
+  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
+  -F instructions='{
+    "parts": [{ "file": "document" }],
+    "actions": [{
+      "type": "redact",
+      "strategy": "regex",
+      "regex": "\\b[A-Z]{2}\\d{6}\\b"
+    }]
+  }' \
+  -F document=@input.pdf \
+  -o redacted.pdf
+```
+
+### 8. AI-Powered Redaction
+
+Natural language criteria for detecting and redacting sensitive information:
+
+```bash
+curl -X POST https://api.nutrient.io/build \
+  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
+  -F instructions='{
+    "parts": [{ "file": "document" }],
+    "actions": [{
+      "type": "aiRedact",
+      "criteria": "All personally identifiable information"
+    }]
+  }' \
+  -F document=@input.pdf \
+  -o redacted.pdf
+```
+
+### 9. Add Watermark
+
+```bash
+curl -X POST https://api.nutrient.io/build \
+  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
+  -F instructions='{
+    "parts": [{ "file": "document" }],
+    "actions": [{
+      "type": "watermark",
+      "watermarkType": "text",
+      "text": "CONFIDENTIAL",
+      "fontSize": 48,
+      "fontColor": "#FF0000",
+      "opacity": 0.3,
+      "rotation": 45,
+      "width": "50%",
+      "height": "50%"
+    }]
+  }' \
+  -F document=@input.pdf \
+  -o watermarked.pdf
+```
+
+### 10. Digital Signature
+
+```bash
+curl -X POST https://api.nutrient.io/build \
+  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
+  -F instructions='{
+    "parts": [{ "file": "document" }],
+    "actions": [{
+      "type": "sign",
+      "signatureType": "cms",
+      "signerName": "Jane Smith",
+      "reason": "Document approval",
+      "location": "New York"
+    }]
+  }' \
+  -F document=@contract.pdf \
+  -o signed.pdf
+```
+
+## Action Types Reference
+
+| Action | Description |
+|--------|-------------|
+| `renderPages` | Convert PDF pages to PNG, JPEG, or WebP images |
+| `office` | Convert PDF to DOCX, XLSX, or PPTX |
+| `text` | Extract plain text from documents |
+| `tables` | Extract tabular data as JSON |
+| `keyValues` | Extract key-value pairs (phone numbers, emails, dates) |
+| `ocr` | Apply OCR to scanned PDFs or images |
+| `redact` | Redact content via preset patterns, regex, or exact text |
+| `aiRedact` | AI-powered PII detection and redaction |
+| `watermark` | Add text or image watermarks |
+| `sign` | Add CMS or CAdES digital signatures |
+| `flatten` | Flatten annotations and form fields |
+
+## Supported Input Formats
+
+| Format | Extensions |
+|--------|------------|
+| PDF | `.pdf` |
+| Microsoft Office | `.docx`, `.xlsx`, `.pptx` |
+| Images | `.jpg`, `.png`, `.gif`, `.webp`, `.tiff` |
+| HTML | `.html` |
+
+## MCP Server Integration
+
+Nutrient provides an MCP server for direct agent integration:
+
+```json
+{
+  "mcpServers": {
+    "nutrient": {
+      "command": "npx",
+      "args": ["-y", "@anthropic-ai/nutrient-mcp-server"],
+      "env": {
+        "NUTRIENT_API_KEY": "<your-api-key>"
+      }
+    }
+  }
+}
+```
+
+## Best Practices
+
+1. **Chain actions** — Multiple actions execute sequentially in one request (e.g., OCR then redact)
+2. **Use preset redaction patterns** for standard PII types — more reliable than regex for known formats
+3. **Use AI redaction** for complex or context-dependent PII that presets can't cover
+4. **Set DPI appropriately** — 150 DPI for screen, 300 DPI for print when rendering pages
+5. **Check credit usage** — Each API call consumes credits based on document size and action type
+
+## Reference Links
+
+| Resource | URL |
+|----------|-----|
+| API Documentation | https://www.nutrient.io/guides/document-engine/api/api-reference/ |
+| Getting Started | https://www.nutrient.io/getting-started/web-services/ |
+| npm MCP Server | https://www.npmjs.com/package/@anthropic-ai/nutrient-mcp-server |
+| OpenClaw Plugin | https://www.npmjs.com/package/@nutrient-sdk/nutrient-openclaw |
+| GitHub | https://github.com/nicegoodthings/nutrient-dws-examples |
-| GitHub | https://github.com/nicegoodthings/nutrient-dws-examples |
+| GitHub | https://github.com/nicegoodthings/nutrient-dws-examples |
+
+## Troubleshooting & Error Handling
+
+When calls to the Nutrient DWS API fail, agents should surface concise, actionable errors and suggest next steps.
+
+Common HTTP status codes:
+
+- **400 Bad Request** — Invalid parameters (e.g., unsupported file type, invalid action configuration).
+  - **Agent response:** Re-check requested actions and formats; confirm the source file type and requested target format are compatible.
+- **401 Unauthorized / 403 Forbidden** — Missing or invalid `NUTRIENT_API_KEY`.
+  - **Agent response:** Ask the user (or operator) to verify that the API key is configured in the environment and has access to the DWS API.
+- **413 Payload Too Large** — File exceeds size or page limits for the configured plan.
+  - **Agent response:** Propose splitting the document, downsampling images, or processing only selected pages.
+- **429 Too Many Requests** — Rate limiting or credit exhaustion.
+  - **Agent response:** Back off, wait, and retry; consider batching requests or reducing concurrency.
+- **5xx Server Errors** — Temporary service issues.
+  - **Agent response:** Retry with exponential backoff; if persistent, suggest manual retry later.
+
+For OCR and redaction:
+
+- If **OCR output is low quality**, suggest:
+  - Increasing DPI when rendering pages to images.
+  - Requesting grayscale instead of color to improve contrast.
+  - Ensuring the input is not overly compressed or blurred.
+- If **PII remains visible after redaction**, suggest:
+  - Combining preset redaction patterns with AI redaction.
+  - Expanding the redaction scope (e.g., include names, addresses, or custom patterns).
+  - Re-running redaction on specific pages or regions of interest.
+
+When chaining multiple actions (e.g., convert → OCR → redact), validate each step’s output before proceeding and log the step where failure occurs.
+
+## Performance & Limits
+
+While exact limits depend on the user’s Nutrient plan, agents should be conservative about:
+
+- **Document size and page count** — Prefer processing only the required pages (e.g., first N pages, or a specified range) when the user’s intent is narrow.
+- **Number of simultaneous requests** — Batch related operations rather than firing many single-page requests in parallel.
+- **Credit consumption** — Some actions (especially AI-based redaction or heavy OCR) can be more expensive than simple conversions.
+
+Recommended patterns:
+
+1. **Scope-first** — Ask the user which pages or sections matter before processing entire documents.
+2. **Combine actions** — Use multi-step workflows in a single API call where available (e.g., OCR + redact) to reduce round-trips.
+3. **Cache intermediate results** — If allowed by the use case, reuse extracted text or OCR results for subsequent analysis rather than reprocessing the same file.
+
+## Security & Privacy Considerations
+
+This skill is often used with sensitive documents (contracts, IDs, financial records, medical records).
+
+- **Never log raw secrets or full documents** in agent logs or chat transcripts when not strictly necessary.
+- Prefer **redacting PII before sharing** any document (or page images) back into other tools or with collaborators.
+- Make it clear to the user when:
+  - A document has been successfully redacted (no underlying text remains).
+  - Only visual redaction has been applied (e.g., overlays), and underlying text may still be present.
+- When filling forms or adding signatures, confirm:
+  - The user’s intent (e.g., “Sign as John Doe on page 3, bottom right”).
+  - That the document is the correct and final version before applying a digital signature.
+
+If the user expresses compliance or regulatory constraints (HIPAA, GDPR, etc.), bias towards more aggressive redaction scopes and minimize data retention.
+
+## Example End-to-End Flow
+
+A typical agent-driven flow for handling a scanned contract with PII:
+
+1. **User request:** “Take this scanned contract, extract all text, redact SSNs and email addresses, and give me a clean searchable PDF plus a text summary.”
+2. **Agent plan:**
+   - Upload the scanned PDF to Nutrient.
+   - Run OCR with sufficient DPI for legibility.
+   - Apply preset redaction patterns for SSNs and emails, plus AI redaction for any remaining PII.
+   - Export a searchable redacted PDF.
+   - Extract text and generate a summary for the user.
+3. **Nutrient actions (conceptual):**
+   - Action 1: `ocr` on the uploaded file.
+   - Action 2: `redact` with presets (`ssn`, `email`) and AI redaction enabled.
+   - Action 3: `export` as searchable PDF and plain text.
+4. **Agent response:**
+   - Provide a link or attachment for the redacted PDF (if the environment supports file return).
+   - Return a textual summary of the contract.
+   - Optionally include a brief log of actions performed (OCR → redact → export) for transparency.
+
+Use this pattern as a template for other workflows (e.g., “PDF to editable Word,” “extract tables only,” “sign and watermark before distribution”).
-| GitHub | https://github.com/nicegoodthings/nutrient-dws-examples |
+| GitHub | https://github.com/nicegoodthings/nutrient-dws-examples |
+
+## Troubleshooting & Error Handling
+
+When calls to the Nutrient DWS API fail, agents should surface concise, actionable errors and suggest next steps.
+
+Common HTTP status codes:
+
+- **400 Bad Request** — Invalid parameters (e.g., unsupported file type, invalid action configuration).
+  - **Agent response:** Re-check requested actions and formats; confirm the source file type and requested target format are compatible.
+- **401 Unauthorized / 403 Forbidden** — Missing or invalid `NUTRIENT_API_KEY`.
+  - **Agent response:** Ask the user (or operator) to verify that the API key is configured in the environment and has access to the DWS API.
+- **413 Payload Too Large** — File exceeds size or page limits for the configured plan.
+  - **Agent response:** Propose splitting the document, downsampling images, or processing only selected pages.
+- **429 Too Many Requests** — Rate limiting or credit exhaustion.
+  - **Agent response:** Back off, wait, and retry; consider batching requests or reducing concurrency.
+- **5xx Server Errors** — Temporary service issues.
+  - **Agent response:** Retry with exponential backoff; if persistent, suggest manual retry later.
+
+For OCR and redaction:
+
+- If **OCR output is low quality**, suggest:
+  - Increasing DPI when rendering pages to images.
+  - Requesting grayscale instead of color to improve contrast.
+  - Ensuring the input is not overly compressed or blurred.
+- If **PII remains visible after redaction**, suggest:
+  - Combining preset redaction patterns with AI redaction.
+  - Expanding the redaction scope (e.g., include names, addresses, or custom patterns).
+  - Re-running redaction on specific pages or regions of interest.
+
+When chaining multiple actions (e.g., convert → OCR → redact), validate each step’s output before proceeding and log the step where failure occurs.
+
+## Performance & Limits
+
+While exact limits depend on the user’s Nutrient plan, agents should be conservative about:
+
+- **Document size and page count** — Prefer processing only the required pages (e.g., first N pages, or a specified range) when the user’s intent is narrow.
+- **Number of simultaneous requests** — Batch related operations rather than firing many single-page requests in parallel.
+- **Credit consumption** — Some actions (especially AI-based redaction or heavy OCR) can be more expensive than simple conversions.
+
+Recommended patterns:
+
+1. **Scope-first** — Ask the user which pages or sections matter before processing entire documents.
+2. **Combine actions** — Use multi-step workflows in a single API call where available (e.g., OCR + redact) to reduce round-trips.
+3. **Cache intermediate results** — If allowed by the use case, reuse extracted text or OCR results for subsequent analysis rather than reprocessing the same file.
+
+## Security & Privacy Considerations
+
+This skill is often used with sensitive documents (contracts, IDs, financial records, medical records).
+
+- **Never log raw secrets or full documents** in agent logs or chat transcripts when not strictly necessary.
+- Prefer **redacting PII before sharing** any document (or page images) back into other tools or with collaborators.
+- Make it clear to the user when:
+  - A document has been successfully redacted (no underlying text remains).
+  - Only visual redaction has been applied (e.g., overlays), and underlying text may still be present.
+- When filling forms or adding signatures, confirm:
+  - The user’s intent (e.g., “Sign as John Doe on page 3, bottom right”).
+  - That the document is the correct and final version before applying a digital signature.
+
+If the user expresses compliance or regulatory constraints (HIPAA, GDPR, etc.), bias towards more aggressive redaction scopes and minimize data retention.
+
+## Example End-to-End Flow
+
+A typical agent-driven flow for handling a scanned contract with PII:
+
+1. **User request:** “Take this scanned contract, extract all text, redact SSNs and email addresses, and give me a clean searchable PDF plus a text summary.”
+2. **Agent plan:**
+   - Upload the scanned PDF to Nutrient.
+   - Run OCR with sufficient DPI for legibility.
+   - Apply preset redaction patterns for SSNs and emails, plus AI redaction for any remaining PII.
+   - Export a searchable redacted PDF.
+   - Extract text and generate a summary for the user.
+3. **Nutrient actions (conceptual):**
+   - Action 1: `ocr` on the uploaded file.
+   - Action 2: `redact` with presets (`ssn`, `email`) and AI redaction enabled.
+   - Action 3: `export` as searchable PDF and plain text.
+4. **Agent response:**
+   - Provide a link or attachment for the redacted PDF (if the environment supports file return).
+   - Return a textual summary of the contract.
+   - Optionally include a brief log of actions performed (OCR → redact → export) for transparency.
+
+Use this pattern as a template for other workflows (e.g., “PDF to editable Word,” “extract tables only,” “sign and watermark before distribution”).