Skip to content

feat: AWS Comprehend PII redactor utility#480

Open
WeJaWi wants to merge 8 commits into
arakoodev:tsfrom
WeJaWi:feat/aws-comprehend-pii-redactor
Open

feat: AWS Comprehend PII redactor utility#480
WeJaWi wants to merge 8 commits into
arakoodev:tsfrom
WeJaWi:feat/aws-comprehend-pii-redactor

Conversation

@WeJaWi
Copy link
Copy Markdown

@WeJaWi WeJaWi commented May 11, 2026

Summary

Implements the AWS Comprehend PII redactor utility requested in #290.

What's included

New package: JS/edgechains/arakoodev/src/pii-redactor/

  • AWSComprehendPIIRedactor class with three methods:
    • detectPII(text) — returns detected PII entities with offsets, types, and confidence scores
    • redactPII(text, options) — replaces PII with labeled placeholders [NAME], [EMAIL], etc. or mask characters
    • sanitize(text) — convenience wrapper that returns only the cleaned string, ideal for chaining

Key features:

  • Labeled redaction mode (default): "My name is John Doe""My name is [NAME]"
  • Mask mode: replace with repeated character, e.g. "****"
  • Filter by entity type: only redact EMAIL, leave NAME intact
  • Configurable language code (en/es/fr/de/it/pt/ar/hi/ja/ko/zh)
  • AWS credentials from constructor args or environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION)
  • Entities sorted descending by offset before replacement — prevents offset shifting when multiple PII spans are replaced

Test suite: 9 unit tests using Vitest with fully mocked AWS SDK (no real AWS calls needed to run tests)

Working example: JS/edgechains/examples/pii-redaction-example/

  • Hono server with GET /redact?text=... — full chain: Comprehend → sanitize → OpenAI
  • GET /redact/detect?text=... — detection only, no LLM call

Usage

import { AWSComprehendPIIRedactor } from "@arakoodev/edgechains.js/pii-redactor";

const redactor = new AWSComprehendPIIRedactor();

// Chain directly with any LLM endpoint
const safePrompt = await redactor.sanitize(userInput);
const response = await openai.chat({ prompt: safePrompt });

Closes #290

@github-actions
Copy link
Copy Markdown

CLA Assistant Lite bot: Thank you for your submission, we really appreciate it. Before we can accept your contribution, we ask that you sign the Arakoo Contributor License Agreement. You can sign the CLA by adding a new comment to this pull request and pasting exactly the following text.


I have read the Arakoo CLA Document and I hereby sign the CLA


You can retrigger this bot by commenting recheck in this Pull Request

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BOUNTY: integrate AWS Comprehend as a utility to redact data

1 participant