namespace-guard Reference

This file contains the full reference documentation (adapters, migrations, CLI, API, and framework examples) moved from the top-level README for easier onboarding.

For the fast-start overview, see ../README.md.

Live Demo - try it in your browser | Blog Post - why this exists

The world's first library that detects confusable characters across non-Latin scripts. Slug claimability, Unicode anti-spoofing, and LLM Denial of Spend defence in one zero-dependency package.

For multi-tenant apps where usernames, organisation slugs, and reserved routes share one URL namespace, and for LLM pipelines where confusable substitution inflates token costs up to 5.2x.

The Problem

You have a URL structure like yourapp.com/:slug that could be:

A user profile (/sarah)
An organization (/acme-corp)
A reserved route (/settings, /admin, /api)

When someone signs up or creates an org, you need to check that their chosen slug:

Isn't already taken by another user
Isn't already taken by an organization
Isn't a reserved system route
Follows your naming rules
Isn't confusable with protected names (anti-impersonation)

This library handles all of that in one guard call.

Installation

npm install namespace-guard

Quick Start

import { createNamespaceGuardWithProfile } from "namespace-guard";
import { createPrismaAdapter } from "namespace-guard/adapters/prisma";
import { PrismaClient } from "@prisma/client";

const prisma = new PrismaClient();

// One-liner-ready guard with practical defaults
const guard = createNamespaceGuardWithProfile(
  "consumer-handle",
  {
    reserved: ["admin", "api", "settings", "dashboard", "login", "signup"],
    sources: [
      { name: "user", column: "handle", scopeKey: "id" },
      { name: "organization", column: "slug", scopeKey: "id" },
    ],
  },
  createPrismaAdapter(prisma)
);

// One-liner: format/reserved/taken + anti-spoofing policy
await guard.assertClaimable("acme-corp");
// throws on failure, otherwise safe to create

Why namespace-guard?

Feature	namespace-guard	DIY Solution
Multi-table uniqueness	One call	Multiple queries
Reserved name blocking	Built-in with categories	Manual list checking
Ownership scoping	No false positives on self-update	Easy to forget
Format validation	Configurable regex	Scattered validation
Conflict suggestions	Auto-suggest alternatives	Not built
Async validators	Custom hooks (profanity, etc.)	Manual wiring
Batch checking	`checkMany()`	Loop it yourself
ORM agnostic	Prisma, Drizzle, Kysely, Knex, TypeORM, MikroORM, Sequelize, Mongoose, raw SQL	Tied to your ORM
CLI	`check`, `risk`, `attack-gen`, `audit-canonical`, `calibrate`, `recommend`, `drift`	None

Adapters

Prisma

import { PrismaClient } from "@prisma/client";
import { createPrismaAdapter } from "namespace-guard/adapters/prisma";

const prisma = new PrismaClient();
const adapter = createPrismaAdapter(prisma);

Drizzle

Note: The Drizzle adapter uses db.query (the relational query API). Make sure your Drizzle client is set up with drizzle(client, { schema }) so that db.query.<tableName> is available.

import { eq } from "drizzle-orm";
import { createDrizzleAdapter } from "namespace-guard/adapters/drizzle";
import { db } from "./db";
import { users, organizations } from "./schema";

// Pass eq directly, or use { eq, ilike } for case-insensitive support
const adapter = createDrizzleAdapter(db, { users, organizations }, eq);

Kysely

import { Kysely, PostgresDialect } from "kysely";
import { createKyselyAdapter } from "namespace-guard/adapters/kysely";

const db = new Kysely<Database>({ dialect: new PostgresDialect({ pool }) });
const adapter = createKyselyAdapter(db);

Knex

import Knex from "knex";
import { createKnexAdapter } from "namespace-guard/adapters/knex";

const knex = Knex({ client: "pg", connection: process.env.DATABASE_URL });
const adapter = createKnexAdapter(knex);

TypeORM

import { DataSource } from "typeorm";
import { createTypeORMAdapter } from "namespace-guard/adapters/typeorm";
import { User, Organization } from "./entities";

const dataSource = new DataSource({ /* ... */ });
const adapter = createTypeORMAdapter(dataSource, { user: User, organization: Organization });

MikroORM

import { MikroORM } from "@mikro-orm/core";
import { createMikroORMAdapter } from "namespace-guard/adapters/mikro-orm";
import { User, Organization } from "./entities";

const orm = await MikroORM.init(config);
const adapter = createMikroORMAdapter(orm.em, { user: User, organization: Organization });

Sequelize

import { createSequelizeAdapter } from "namespace-guard/adapters/sequelize";
import { User, Organization } from "./models";

const adapter = createSequelizeAdapter({ user: User, organization: Organization });

Mongoose

import { createMongooseAdapter } from "namespace-guard/adapters/mongoose";
import { User, Organization } from "./models";

// Note: Mongoose sources typically use idColumn: "_id"
const adapter = createMongooseAdapter({ user: User, organization: Organization });

Raw SQL (pg, mysql2, better-sqlite3, etc.)

The raw adapter generates PostgreSQL-style SQL ($1 placeholders, double-quoted identifiers). For pg this works directly. For MySQL or SQLite, translate the parameter syntax in your executor wrapper.

import { Pool } from "pg";
import { createRawAdapter } from "namespace-guard/adapters/raw";

const pool = new Pool();
const adapter = createRawAdapter((sql, params) => pool.query(sql, params));

MySQL2 wrapper (translates $1 to ? and "col" to `col`):

import mysql from "mysql2/promise";
import { createRawAdapter } from "namespace-guard/adapters/raw";

const pool = mysql.createPool({ uri: process.env.DATABASE_URL });
const adapter = createRawAdapter(async (sql, params) => {
  const mysqlSql = sql.replace(/\$\d+/g, "?").replace(/"/g, "`");
  const [rows] = await pool.execute(mysqlSql, params);
  return { rows: rows as Record<string, unknown>[] };
});

better-sqlite3 wrapper (translates $1 to ? and strips identifier quotes):

import Database from "better-sqlite3";
import { createRawAdapter } from "namespace-guard/adapters/raw";

const db = new Database("app.db");
const adapter = createRawAdapter(async (sql, params) => {
  const sqliteSql = sql.replace(/\$\d+/g, "?").replace(/"/g, "");
const rows = db.prepare(sqliteSql).all(...params);
  return { rows: rows as Record<string, unknown>[] };
});

Canonical Uniqueness Migration (Per Adapter)

To fully close Unicode/canonicalization edge cases and race windows, enforce uniqueness on a canonical column in your database.

Use this rollout plan:

Add a canonical column (nullable first).
Dual-write canonical values on every create/update (normalize(raw)).
Backfill existing rows in batches.
Resolve duplicates found after backfill.
Add a unique index/constraint and make canonical non-null.
Point namespace-guard sources[*].column at the canonical column.

Example dual-write pattern:

import { normalize } from "namespace-guard";

const raw = input.handle;
const canonical = normalize(raw);

await guard.assertClaimable(raw);
await db.user.create({
  data: {
    handle: raw,                 // display value
    handleCanonical: canonical,  // canonical key
  },
});

After cutover, configure sources to check canonical columns:

sources: [{ name: "user", column: "handleCanonical", scopeKey: "id" }]

Prisma

schema.prisma:

model User {
  id              String  @id @default(cuid())
  handle          String
  handleCanonical String? @unique
}

Create/apply migration:

npx prisma migrate dev --name add_handle_canonical

Then backfill via Prisma client in batches, resolve duplicates, and finally make handleCanonical required.

Drizzle

schema.ts (Postgres example):

import { pgTable, text, uniqueIndex } from "drizzle-orm/pg-core";

export const users = pgTable(
  "users",
  {
    id: text("id").primaryKey(),
    handle: text("handle").notNull(),
    handleCanonical: text("handle_canonical"),
  },
  (table) => [uniqueIndex("users_handle_canonical_uidx").on(table.handleCanonical)]
);

Generate/apply migration:

npx drizzle-kit generate --name add-handle-canonical
npx drizzle-kit migrate

Kysely

Migration (up):

import { Kysely } from "kysely";

export async function up(db: Kysely<any>): Promise<void> {
  await db.schema
    .alterTable("users")
    .addColumn("handle_canonical", "varchar(255)")
    .execute();

  await db.schema
    .createIndex("users_handle_canonical_uidx")
    .on("users")
    .column("handle_canonical")
    .unique()
    .execute();
}

Backfill in app code, dedupe, then add NOT NULL in a follow-up migration.

Knex

Migration:

export async function up(knex) {
  await knex.schema.alterTable("users", (table) => {
    table.string("handle_canonical", 255);
  });

  await knex.schema.alterTable("users", (table) => {
    table.unique(["handle_canonical"], "users_handle_canonical_uidx");
  });
}

TypeORM

Entity:

import { Column, Entity, Index, PrimaryGeneratedColumn } from "typeorm";

@Entity("user")
export class User {
  @PrimaryGeneratedColumn("uuid")
  id!: string;

  @Column({ type: "varchar", length: 255 })
  handle!: string;

  @Index("IDX_user_handle_canonical_unique", { unique: true })
  @Column({ name: "handle_canonical", type: "varchar", length: 255, nullable: true })
  handleCanonical!: string | null;
}

Migration (up):

import { MigrationInterface, QueryRunner, TableColumn, TableIndex } from "typeorm";

export class AddHandleCanonical1730000000000 implements MigrationInterface {
  async up(queryRunner: QueryRunner): Promise<void> {
    await queryRunner.addColumn(
      "user",
      new TableColumn({
        name: "handle_canonical",
        type: "varchar",
        length: "255",
        isNullable: true,
      })
    );

    await queryRunner.createIndex(
      "user",
      new TableIndex({
        name: "IDX_user_handle_canonical_unique",
        columnNames: ["handle_canonical"],
        isUnique: true,
      })
    );
  }
}

MikroORM

Entity:

import { Entity, PrimaryKey, Property } from "@mikro-orm/core";

@Entity()
export class User {
  @PrimaryKey()
  id!: string;

  @Property()
  handle!: string;

  @Property({
    fieldName: "handle_canonical",
    length: 255,
    nullable: true,
    unique: true,
  })
  handleCanonical?: string;
}

Generate/apply migration:

npx mikro-orm migration:create
npx mikro-orm migration:up

Sequelize

Migration:

export async function up(queryInterface, Sequelize) {
  await queryInterface.addColumn("Users", "handleCanonical", {
    type: Sequelize.DataTypes.STRING(255),
    allowNull: true,
  });

  await queryInterface.addIndex("Users", ["handleCanonical"], {
    name: "users_handle_canonical_uidx",
    unique: true,
  });
}

If you use model definitions directly, keep handleCanonical persisted and indexed there too.

Mongoose

Schema:

import { Schema, model } from "mongoose";

const userSchema = new Schema({
  handle: { type: String, required: true },
  handleCanonical: { type: String, required: true, unique: true, index: true },
});

export const User = model("User", userSchema);

Build indexes after deploying the schema change:

await User.init();

Raw SQL (PostgreSQL)

ALTER TABLE users ADD COLUMN handle_canonical text;
-- backfill from application code using namespace-guard normalize()
CREATE UNIQUE INDEX users_handle_canonical_uidx ON users (handle_canonical);
ALTER TABLE users ALTER COLUMN handle_canonical SET NOT NULL;

Backfill + duplicate check

Before adding UNIQUE/NOT NULL, run a duplicate report:

SELECT handle_canonical, COUNT(*)
FROM users
GROUP BY handle_canonical
HAVING COUNT(*) > 1;

Resolve these rows first, then enforce the constraint.

Official docs used for adapter syntax

Prisma schema unique fields: https://www.prisma.io/docs/orm/reference/prisma-schema-reference#unique
Prisma migrations (migrate dev): https://www.prisma.io/docs/concepts/components/prisma-migrate/migrate-development-production
Drizzle column/index uniqueness and migrations: https://orm.drizzle.team/docs/indexes-constraints and https://orm.drizzle.team/docs/drizzle-kit-migrate
Kysely migrations and index builder: https://www.kysely.dev/docs/migrations and https://kysely-org.github.io/kysely-apidoc/classes/CreateIndexBuilder.html
Knex table.unique(...): https://knexjs.org/guide/schema-builder.html#unique
TypeORM @Column({ unique: true }) / @Index({ unique: true }): https://typeorm.io/docs/entity/entities and https://typeorm.io/docs/help/decorator-reference/
MikroORM @Property({ unique: true }) and migrations: https://mikro-orm.io/docs/decorators and https://mikro-orm.io/docs/migrations
Sequelize QueryInterface migrations (addColumn, addIndex): https://sequelize.org/api/v6/class/src/dialects/abstract/query-interface.js~queryinterface
Mongoose indexes and unique: https://mongoosejs.com/docs/guide.html#indexes and https://mongoosejs.com/docs/schematypes.html
PostgreSQL unique indexes: https://www.postgresql.org/docs/current/indexes-unique.html

Configuration

const guard = createNamespaceGuard({
  // Reserved names - flat list, Set, or categorized
  reserved: new Set([
    "admin",
    "api",
    "settings",
    "dashboard",
    "login",
    "signup",
    "help",
    "support",
    "billing",
  ]),

  // Data sources to check for collisions
  // Queries run in parallel for speed
  sources: [
    {
      name: "user",           // Prisma model / Drizzle table / SQL table name
      column: "handle",       // Column containing the slug/handle
      idColumn: "id",         // Primary key column (default: "id")
      scopeKey: "id",         // Key for ownership checks (see below)
    },
    {
      name: "organization",
      column: "slug",
      scopeKey: "id",
    },
    {
      name: "team",
      column: "slug",
      scopeKey: "id",
    },
  ],

  // Validation pattern (default: /^[a-z0-9][a-z0-9-]{1,29}$/)
  // This default requires: 2-30 chars, lowercase alphanumeric + hyphens, can't start with hyphen
  pattern: /^[a-z0-9][a-z0-9-]{2,39}$/,

  // Custom error messages
  messages: {
    invalid: "Use 3-40 lowercase letters, numbers, or hyphens.",
    reserved: "That name is reserved. Please choose another.",
    taken: (sourceName) => `That name is already taken.`,
  },
}, adapter);

Reserved Name Categories

Group reserved names by category with different error messages:

const guard = createNamespaceGuard({
  reserved: {
    system: ["admin", "api", "settings", "dashboard"],
    brand: ["oncor", "bandcamp"],
    offensive: ["..."],
  },
  sources: [/* ... */],
  messages: {
    reserved: {
      system: "That's a system route.",
      brand: "That's a protected brand name.",
      offensive: "That name is not allowed.",
    },
  },
}, adapter);

const result = await guard.check("admin");
// { available: false, reason: "reserved", category: "system", message: "That's a system route." }

You can also use a single string message for all categories, or mix - categories without a specific message fall back to the default.

Async Validators

Add custom async checks that run after format/reserved validation but before database queries:

const guard = createNamespaceGuard({
  sources: [/* ... */],
  validators: [
    async (identifier) => {
      if (await isProfane(identifier)) {
        return { available: false, message: "That name is not allowed." };
      }
      return null; // pass
    },
    async (identifier) => {
      if (await isTrademarkViolation(identifier)) {
        return { available: false, message: "That name is trademarked." };
      }
      return null;
    },
  ],
}, adapter);

Validators run sequentially and stop at the first rejection. They receive the normalized identifier.

Zero-Dependency External Filters

namespace-guard stays zero-dependency and does not bundle external profanity/moderation libraries.
Use createPredicateValidator to wrap any third-party boolean check in one line:

import { createNamespaceGuard, createPredicateValidator } from "namespace-guard";
import { Profanity } from "@2toad/profanity";

const profanity = new Profanity();

const guard = createNamespaceGuard({
  sources: [/* ... */],
  validators: [
    createPredicateValidator((identifier) => profanity.exists(identifier), {
      message: "Please choose an appropriate name.",
      transform: (identifier) => identifier.normalize("NFKC"),
    }),
  ],
}, adapter);

Common optional libraries people pair with namespace-guard:

obscenity (strong normalization/evasion-focused APIs)
@2toad/profanity (simple boolean API)
bad-words (basic list-based filter)

Built-in Profanity Validator

Use createProfanityValidator for a turnkey profanity filter - supply your own word list:

import { createNamespaceGuard, createProfanityValidator } from "namespace-guard";

const guard = createNamespaceGuard({
  sources: [/* ... */],
  validators: [
    createProfanityValidator(["badword", "offensive", "slur"], {
      message: "Please choose an appropriate name.", // optional custom message
      checkSubstrings: true,                         // default: true
      mode: "evasion",                               // default: "evasion"
      variantProfile: "balanced",                    // default: "balanced"
      minSubstringLength: 4,                         // default: 4 (exact matches always checked)
    }),
  ],
}, adapter);

mode: "evasion" is designed for profanity bypasses: it folds Unicode confusables plus common ASCII substitutes (for example sh1t, 5h1t, mixed-script variants, and separator insertion like s-h.i_t). Use mode: "basic" if you want strict lowercase matching only.

variantProfile: "balanced" is precision-first and avoids more ambiguous substitutions; switch to "aggressive" only when you need broader catch coverage.

Curated English Default List (Optional Subpath)

If you want a ready-to-use default list without wiring your own dataset first:

import { createNamespaceGuard } from "namespace-guard";
import { createEnglishProfanityValidator } from "namespace-guard/profanity-en";

const guard = createNamespaceGuard({
  sources: [/* ... */],
  validators: [
    createEnglishProfanityValidator({
      mode: "evasion",            // default
      variantProfile: "balanced", // default
      minSubstringLength: 4,      // default
    }),
  ],
}, adapter);

namespace-guard/profanity-en also exports:

PROFANITY_WORDS_EN (curated word list)
PROFANITY_WORDS_EN_COUNT
PROFANITY_WORDS_EN_SOURCE
PROFANITY_WORDS_EN_LICENSE

Core namespace-guard remains zero-dependency and does not force any external moderation package.

Built-in Homoglyph Validator

Prevent spoofing attacks where visually similar characters from any Unicode script are substituted for Latin letters (e.g., Cyrillic "а" for Latin "a" in "admin"). Note: with the default ASCII-only pattern ([a-z0-9-]), non-Latin characters are already rejected by the format check. The homoglyph validator is most useful when your pattern allows Unicode characters, or as defense-in-depth alongside rejectMixedScript:

import { createNamespaceGuard, createHomoglyphValidator } from "namespace-guard";

const guard = createNamespaceGuard({
  sources: [/* ... */],
  validators: [
    createHomoglyphValidator(),
  ],
}, adapter);

Options:

createHomoglyphValidator({
  message: "Custom rejection message.",       // optional
  additionalMappings: { "\u0261": "g" },      // extend the built-in map
  rejectMixedScript: true,                    // also reject Latin + non-Latin script mixing
})

The built-in CONFUSABLE_MAP contains 613 character pairs generated from Unicode TR39 confusables.txt plus supplemental Latin small capitals. It covers Cyrillic, Greek, Armenian, Cherokee, IPA, Coptic, Lisu, Canadian Syllabics, Georgian, and 20+ other scripts. The map is exported for inspection or extension, and is regenerable for new Unicode versions with npx tsx scripts/generate-confusables.ts.

Built-in Invisible Character Validator

Reject zero-width/default-ignorable and bidi direction-control characters in the claim path.
Optionally reject combining marks for stricter anti-evasion policies.

import {
  createNamespaceGuard,
  createInvisibleCharacterValidator,
} from "namespace-guard";

const guard = createNamespaceGuard({
  sources: [/* ... */],
  validators: [createInvisibleCharacterValidator()],
}, adapter);

Options:

createInvisibleCharacterValidator({
  message: "That name contains hidden Unicode controls.",
  rejectDefaultIgnorables: true, // default
  rejectBidiControls: true,      // default
  rejectCombiningMarks: false,   // default (set true for strict anti-evasion mode)
})

rejectCombiningMarks is intentionally opt-in because many legitimate identifiers in Unicode-enabled products use combining marks naturally.

CONFUSABLE_MAP_FULL

For standalone use without NFKC normalization, CONFUSABLE_MAP_FULL (~1,400 entries) includes every single-character-to-Latin mapping from TR39 with no NFKC filtering. This is the right map when your pipeline does not run NFKC before confusable detection, which is the case for most real-world systems: TR39's skeleton algorithm uses NFD, Chromium's IDN spoof checker uses NFD, Rust's confusable_idents lint runs on NFC, and django-registration applies the confusable map to raw input with no normalization at all.

import { CONFUSABLE_MAP_FULL } from "namespace-guard";

// Contains everything in CONFUSABLE_MAP, plus:
// - ~766 entries where NFKC agrees with TR39 (mathematical alphanumerics, fullwidth forms)
// - 31 entries where TR39 and NFKC disagree on the target letter
CONFUSABLE_MAP_FULL["\u017f"]; // "f" (Long S: TR39 visual mapping)
CONFUSABLE_MAP_FULL["\u{1D41A}"]; // "a" (Mathematical Bold Small A)

`skeleton()` and `areConfusable()`

The TR39 Section 4 skeleton algorithm computes a normalized form of a string for confusable comparison. Two strings that look alike will produce the same skeleton. This is the same algorithm used by ICU's SpoofChecker, Chromium's IDN spoof checker, and the Rust compiler's confusable_idents lint.

import { skeleton, areConfusable, CONFUSABLE_MAP } from "namespace-guard";

// Compute skeletons for comparison
skeleton("paypal");           // "paypal"
skeleton("\u0440\u0430ypal"); // "paypal" (Cyrillic р and а)
skeleton("pay\u200Bpal");     // "paypal" (zero-width space stripped)
skeleton("\u017f");            // "f"      (Long S via TR39 visual mapping)

// Compare two strings directly
areConfusable("paypal", "\u0440\u0430ypal"); // true
areConfusable("google", "g\u043e\u043egle"); // true  (Cyrillic о)
areConfusable("hello", "world");             // false

// Use CONFUSABLE_MAP for NFKC-first pipelines
skeleton("\u017f", { map: CONFUSABLE_MAP }); // "\u017f" (Long S not in filtered map)

By default, skeleton() uses CONFUSABLE_MAP_FULL (the complete TR39 map), which matches the NFD-based pipeline specified by TR39. Pass { map: CONFUSABLE_MAP } if your pipeline runs NFKC normalization before calling skeleton().

How the anti-spoofing pipeline works

Most confusable-detection libraries apply a character map in isolation. namespace-guard uses a three-stage pipeline where each stage is aware of the others:

Input  →  NFKC normalize  →  Confusable map  →  Mixed-script reject
           (stage 1)          (stage 2)           (stage 3)

Stage 1: NFKC normalization collapses full-width characters (Ｉ → I), ligatures (ﬁ → fi), superscripts, and other Unicode compatibility forms to their canonical equivalents. This runs first, before any confusable check.

Stage 2: Confusable map catches characters that survive NFKC but visually mimic Latin letters - Cyrillic а for a, Greek ο for o, Cherokee Ꭺ for A, and 600+ others from the Unicode Consortium's confusables.txt.

Stage 3: Mixed-script rejection (rejectMixedScript: true) blocks identifiers that mix Latin with non-Latin scripts (Hebrew, Arabic, Devanagari, Thai, Georgian, Ethiopic, etc.) even if the specific characters aren't in the confusable map. This catches novel homoglyphs that the map doesn't cover.

Why NFKC-aware filtering matters

The key insight: TR39's confusables.txt and NFKC normalization sometimes disagree. For example, Unicode says capital I (U+0049) is confusable with lowercase l - visually true in many fonts. But NFKC maps Mathematical Bold 𝐈 (U+1D408) to I, not l. If you naively ship the TR39 mapping (𝐈 → l), the confusable check will never see that character - NFKC already converted it to I in stage 1.

We found 31 entries where this happens:

Character	TR39 says	NFKC says	Winner
`ſ` Long S (U+017F)	`f`	`s`	NFKC (`s` is correct)
`Ⅰ` Roman Numeral I (U+2160)	`l`	`i`	NFKC (`i` is correct)
`Ｉ` Fullwidth I (U+FF29)	`l`	`i`	NFKC (`i` is correct)
`𝟎` Math Bold 0 (U+1D7CE)	`o`	`0`	NFKC (`0` is correct)
11 Mathematical I variants	`l`	`i`	NFKC
12 Mathematical 0/1 variants	`o`/`l`	`0`/`1`	NFKC

These entries are unreachable in any pipeline that runs NFKC first - NFKC has already transformed the character before the confusable map sees it. In a non-NFKC pipeline (which is what TR39 specifies), these entries are correct visual judgments. The generate script (scripts/generate-confusables.ts) produces both CONFUSABLE_MAP (NFKC-filtered) and CONFUSABLE_MAP_FULL (unfiltered) so you can match the map to your normalization strategy.

Composability regression suite artifact

For reproducible cross-library testing, import the named vector suite directly:

import {
  COMPOSABILITY_VECTOR_SUITE,
  COMPOSABILITY_VECTORS,
  COMPOSABILITY_VECTORS_COUNT,
} from "namespace-guard/composability-vectors";

This suite is an alias of NFKC_TR39_DIVERGENCE_VECTORS and represents TR39-full vs NFKC-lowercase divergence vectors.

COMPOSABILITY_VECTOR_SUITE: stable suite id ("nfkc-tr39-divergence-v1")
COMPOSABILITY_VECTORS: vector rows ({ char, codePoint, tr39, nfkc })
COMPOSABILITY_VECTORS_COUNT: row count

JSON artifact for tooling/pipelines is included at:

docs/data/composability-vectors.json

Confusable benchmark corpus artifact

For cross-tool evaluation and repeatable security testing, use:

docs/data/confusable-bench.v1.json

This dataset includes labeled malicious and benign rows across:

NFKC/TR39 divergence vectors
confusable substitutions and mixed-script variants
default-ignorable and bidi control insertions
combining-mark evasion rows
benign controls (including precomposed and decomposed accent forms)

Generation/source notes:

docs/data/confusable-bench.v1.SOURCE.md

LLM Pipeline Preprocessing

Confusable characters are visually identical to Latin letters but encode as multi-byte BPE tokens. A 95-line contract inflates from 881 to 4,567 tokens when flooded with confusables: 5.2x the API bill. We tested this across 4 models, 8 attack types, and 130+ API calls. The model reads through every substitution. The billing attack succeeds. We call it Denial of Spend. Full research: The new DDoS: Unicode confusables can't fool LLMs, but they can 5x your API bill.

Use namespace-guard as a deterministic preprocessing stage before sending text to your model:

Document ingestion
       |
       v
+----------------+
| namespace-     |  <-- Detect mixed-script confusable substitution
| guard          |  <-- Canonicalise to Latin equivalents
| (microseconds) |  <-- Flag suspicious patterns for review
+----------------+
       |
       v
+----------------+
| LLM API        |  <-- Any model/provider
| (GPT/Claude/   |  <-- Receives canonicalised text
| Llama/etc)     |
+----------------+
       |
       v
   Analysis output

`canonicalise(text, options?)`

Rewrites confusable characters to Latin equivalents. By default (strategy: "mixed"), only rewrites characters inside tokens that already contain Latin letters. Standalone non-Latin words are preserved to reduce false positives in multilingual text.

With strategy: "all", rewrites every confusable character regardless of surrounding context. Use this when the document is known to be Latin-script (e.g. an English contract) and you need to catch words where every character was substituted.

import { canonicalise } from "namespace-guard";

canonicalise("The seller аssumes аll liаbility.");
// "The seller assumes all liability."

canonicalise("Москва is the capital");
// "Москва is the capital" (standalone Cyrillic word preserved)

// Strategy "all": rewrites even standalone non-Latin confusable words
canonicalise("поп-refundable", { strategy: "all" });
// "non-refundable" (all Cyrillic п/о/п replaced)

canonicalise("ԝаіⅴеѕ any right", { strategy: "all" });
// "waives any right" (every confusable character replaced)

`scan(text, options?)`

Returns structured findings (codepoint, script, latinEquivalent, visualScore, source, position, word context) plus a summary risk level (none | low | medium | high).

import { scan } from "namespace-guard";

const report = scan("The seller liаbility clause applies.");
// report.hasConfusables === true
// report.summary.riskLevel === "medium" | "high" (context dependent)

`isClean(text, options?)`

Fast boolean gate for confusable substitution. Short-circuits on first suspicious match.

With the default strategy: "mixed", only mixed-script confusables fail the gate. With strategy: "all", any confusable character fails the gate.

import { isClean } from "namespace-guard";

isClean("The seller assumes all liability."); // true
isClean("The seller liаbility clause applies."); // false

// Strategy "all": any confusable fails, even standalone non-Latin
isClean("поп-refundable", { strategy: "all" }); // false

Options (`canonicalise`, `scan`, `isClean`)

strategy -- "mixed" (default) only acts on tokens containing both Latin and non-Latin characters; "all" acts on every confusable character regardless of context. Use "all" for known-Latin documents.
threshold -- minimum visual similarity score for replacement/detection (default: 0.7)
includeNovel -- include confusable-vision novel mappings in addition to TR39 baseline (default: true)
scripts -- optional allowlist of source scripts (case-insensitive)
riskTerms (scan/isClean) -- optional list of high-value terms used by risk heuristics

Unicode Normalization

By default, normalize() applies NFKC normalization before lowercasing. This collapses full-width characters, ligatures, superscripts, and other Unicode compatibility forms to their canonical equivalents:

normalize("ｈｅｌｌｏ");  // "hello" (full-width → ASCII)
normalize("\ufb01nance"); // "finance" (ﬁ ligature → fi)

NFKC is a no-op for ASCII input and matches what ENS, GitHub, and Unicode IDNA standards mandate. To opt out:

const guard = createNamespaceGuard({
  sources: [/* ... */],
  normalizeUnicode: false,
}, adapter);

Rejecting Purely Numeric Identifiers

Twitter/X blocks purely numeric handles. Enable this with allowPurelyNumeric: false:

const guard = createNamespaceGuard({
  sources: [/* ... */],
  allowPurelyNumeric: false,
  messages: {
    purelyNumeric: "Handles cannot be all numbers.", // optional custom message
  },
}, adapter);

await guard.check("123456"); // { available: false, reason: "invalid", message: "Handles cannot be all numbers." }
await guard.check("abc123"); // available (has letters)

Conflict Suggestions

When a slug is taken, automatically suggest available alternatives using pluggable strategies:

const guard = createNamespaceGuard({
  sources: [/* ... */],
  suggest: {
    // Named strategy (default: ["sequential", "random-digits"])
    strategy: "suffix-words",
    // Max suggestions to return (default: 3)
    max: 3,
  },
}, adapter);

const result = await guard.check("acme-corp");
// {
//   available: false,
//   reason: "taken",
//   message: "That name is already in use.",
//   source: "organization",
//   suggestions: ["acme-corp-dev", "acme-corp-io", "acme-corp-app"]
// }

Built-in Strategies

Strategy	Example Output	Description
`"sequential"`	`sarah-1`, `sarah1`, `sarah-2`	Hyphenated and compact numeric suffixes
`"random-digits"`	`sarah-4821`, `sarah-1037`	Random 3-4 digit suffixes
`"suffix-words"`	`sarah-dev`, `sarah-hq`, `sarah-app`	Common word suffixes
`"short-random"`	`sarah-x7k`, `sarah-m2p`	Short 3-char alphanumeric suffixes
`"scramble"`	`asrah`, `sarha`	Adjacent character transpositions
`"similar"`	`sara`, `darah`, `thesarah`	Edit-distance-1 mutations (deletions, keyboard-adjacent substitutions, prefix/suffix)

Composing Strategies

Combine multiple strategies - candidates are interleaved round-robin:

suggest: {
  strategy: ["random-digits", "suffix-words"],
  max: 4,
}
// → ["sarah-4821", "sarah-dev", "sarah-1037", "sarah-io"]

Custom Strategy Function

Pass a function that returns candidate slugs:

suggest: {
  strategy: (identifier) => [
    `${identifier}-io`,
    `${identifier}-app`,
    `the-real-${identifier}`,
  ],
}

Suggestions are verified against format, reserved names, validators, and database collisions using a progressive batched pipeline. Only available suggestions are returned.

Batch Checking

Check multiple identifiers at once:

const results = await guard.checkMany(["sarah", "admin", "acme-corp"]);
// {
//   sarah: { available: true },
//   admin: { available: false, reason: "reserved", ... },
//   "acme-corp": { available: false, reason: "taken", ... }
// }

All checks run in parallel. Accepts an optional scope parameter.

Ownership Scoping

When users update their own slug, you don't want a false "already taken" error:

// User with ID "user_123" wants to change handle from "sarah" to "sarah-dev"
// Without scoping, this would error because "sarah-dev" != their current handle

// Pass their ID to exclude their own record from collision detection
const result = await guard.check("sarah-dev", { id: "user_123" });
// Available (unless another user/org has it)

The scope object keys map to scopeKey in your source config. This lets you check multiple ownership types:

// Check if a user OR their org owns this slug
const result = await guard.check("acme", {
  userId: currentUser.id,
  orgId: currentOrg.id,
});

CLI

Validate slugs from the command line:

# Format + reserved name checking (no database needed)
npx namespace-guard check acme-corp
# ✓ acme-corp is available

npx namespace-guard check admin
# ✗ admin - That name is reserved. Try another one.

npx namespace-guard check "a"
# ✗ a - Use 2-30 lowercase letters, numbers, or hyphens.

# Risk scoring against protected targets
npx namespace-guard risk paуpal --protect paypal
# ⛔ paуpal — risk 100/100 (block)

With a config file

Create namespace-guard.config.json:

{
  "reserved": ["admin", "api", "settings", "dashboard"],
  "pattern": "^[a-z0-9][a-z0-9-]{2,39}$",
  "sources": [
    { "name": "users", "column": "handle" },
    { "name": "organizations", "column": "slug" }
  ]
}

Or with categorized reserved names:

{
  "reserved": {
    "system": ["admin", "api", "settings"],
    "brand": ["oncor"]
  }
}

npx namespace-guard check sarah --config ./my-config.json

With database checking

npx namespace-guard check sarah --database-url postgres://localhost/mydb

Requires pg to be installed (npm install pg).

Exit code 0 = available, 1 = unavailable.

Risk command options

# Warn/block thresholds (0-100)
npx namespace-guard risk paypa1 --protect paypal --warn-threshold 45 --block-threshold 80

# Fail CI on warn or block (default fail mode is block only)
npx namespace-guard risk paypa1 --protect paypal --fail-on warn

# JSON output for automation
npx namespace-guard risk paуpal --protect paypal --json

Attack-gen command

Generate confusable attack candidates for red-team testing and policy tuning:

# one-liner default: evasion mode (confusables + substitutions)
npx namespace-guard attack-gen paypal
npx namespace-guard attack-gen paypal --json

# Impersonation-focused generation (Unicode/NFKC/NFC confusables)
npx namespace-guard attack-gen paypal --mode impersonation

# Profanity-evasion generation (adds ASCII/lookalike substitutions)
npx namespace-guard attack-gen shit --mode evasion --json

# Explore deeper substitutions and wider candidate pools
npx namespace-guard attack-gen github --max-edits 2 --max-candidates 50

Default mode is evasion for practical coverage. Use --mode impersonation when you want Unicode-only spoof analysis without substitution noise.

Useful for finding:

high-risk variants your policy already blocks
non-blocking variants (allow or warn) that still pass format checks (useful for tightening thresholds/protect lists)

Audit-canonical command

Audit an exported dataset for canonical collisions before enforcing a DB unique constraint:

# Dataset must be a JSON array. Identifier can be in: identifier/raw/handle/slug/username/value
npx namespace-guard audit-canonical ./users-export.json
npx namespace-guard audit-canonical ./users-export.json --json

Accepted optional stored-canonical fields:

canonical
normalized
handleCanonical
slugCanonical
handle_canonical
slug_canonical

Exit code:

0 when no collisions/mismatches are found
1 when collisions or canonical mismatches are found

Calibrate command

Use labeled examples to recommend warn/block thresholds for your namespace:

[
  { "identifier": "paуpal", "label": "malicious", "target": "paypal" },
  { "identifier": "teamspace", "label": "benign", "target": "paypal" }
]

npx namespace-guard calibrate ./risk-dataset.json
npx namespace-guard calibrate ./risk-dataset.json --json

# Cost-aware calibration (optimize expected harm, not just F1)
npx namespace-guard calibrate ./risk-dataset.json \
  --cost-block-benign 8 \
  --cost-warn-benign 1 \
  --cost-allow-malicious 12 \
  --cost-warn-malicious 3 \
  --malicious-prior 0.05

Recommend command

Run calibration + drift together and get a ready-to-paste risk config plus CI gate command:

npx namespace-guard recommend ./risk-dataset.json
npx namespace-guard recommend ./risk-dataset.json --json

This is the fastest onboarding path when you already have labeled examples.
It calibrates thresholds from your dataset, then derives CI gate budgets from the built-in composability vector suite.

Drift command

Quantify composability drift between TR39-full mapping (CONFUSABLE_MAP_FULL) and NFKC-filtered mapping (CONFUSABLE_MAP):

# Built-in composability vector suite
npx namespace-guard drift

# Your own dataset (same shape as calibrate)
npx namespace-guard drift ./risk-dataset.json --json

CI drift gate

Use the included drift gate script to fail CI if drift metrics exceed your budget:

# Build first so dist/cli.js exists
npm run build

# Fail when drift exceeds these limits
npm run ci:drift-gate -- \
  --max-action-flips 29 \
  --max-average-score-delta 95 \
  --max-abs-score-delta 100

GitHub Actions workflow is included at .github/workflows/drift-gate.yml.

API Reference

`createNamespaceGuard(config, adapter)`

Creates a guard instance with your configuration and database adapter.

Returns: NamespaceGuard instance

`createNamespaceGuardWithProfile(profile, config, adapter)`

Create a guard with practical profile defaults, then apply your explicit config overrides.

Built-in profiles:

consumer-handle
org-slug
developer-id

Each profile sets defaults for:

pattern
normalizeUnicode
allowPurelyNumeric
risk thresholds (warnThreshold, blockThreshold, etc.)

`guard.check(identifier, scope?)`

Check if an identifier is available.

Parameters:

identifier - The slug/handle to check
scope - Optional ownership scope to exclude own records

Returns:

// Available
{ available: true }

// Not available
{
  available: false,
  reason: "invalid" | "reserved" | "taken",
  message: string,
  source?: string,       // Which table caused the collision (reason: "taken")
  category?: string,     // Reserved name category (reason: "reserved")
  suggestions?: string[] // Available alternatives (reason: "taken", requires suggest config)
}

`guard.checkMany(identifiers, scope?, options?)`

Check multiple identifiers in parallel. Suggestions are skipped by default for performance.

Parameters:

identifiers - Array of slugs/handles to check
scope - Optional ownership scope applied to all checks
options - Optional { skipSuggestions?: boolean } (default: true)

Pass { skipSuggestions: false } to include suggestions for taken identifiers.

Returns: Record<string, CheckResult>

`guard.checkRisk(identifier, options?)`

Score spoofing/confusability risk against protected targets using weighted confusable distance + chain depth.

Parameters:

identifier - Candidate slug/handle to assess
options - Optional:
- protect?: string[] additional high-value targets to compare against
- includeReserved?: boolean include configured reserved names as protected targets (default: true)
- warnThreshold?: number threshold for action: "warn" (default: 45)
- blockThreshold?: number threshold for action: "block" (default: 70)
- maxMatches?: number number of top matches to return (default: 3)
- map?: Record<string, string> custom confusable map

Returns: { score, level, action, reasons, matches, ... }

`guard.enforceRisk(identifier, options?)`

Apply a deny policy on top of risk scoring.

Options:

All checkRisk options
failOn?: "block" | "warn" ("block" default)
messages?: { warn?: string; block?: string }
If protect is omitted, uses config.risk.protect, then falls back to DEFAULT_PROTECTED_TOKENS

Returns: { allowed, action, message?, risk }

`guard.assertAvailable(identifier, scope?)`

Same as check(), but throws an Error if not available.

`guard.assertClaimable(identifier, scope?, options?)`

One-liner guard for production claim checks.

Runs:

check() (format/reserved/validators/database)
enforceRisk() (confusable risk policy)

Throws an Error if the identifier should not be claimed.

`guard.claim(identifier, write, options?)`

Race-safe claim helper. Runs claimability checks, then executes your write callback with the normalized identifier.

If a duplicate-key/unique-constraint race happens, it returns an unavailable result instead of throwing.

const result = await guard.claim(input.handle, async (canonical) => {
  return prisma.user.create({
    data: {
      handle: input.handle,
      handleCanonical: canonical,
    },
  });
});

if (!result.claimed) {
  // show result.message to user
}

Options:

All assertClaimable options
scope?: OwnershipScope
isUniqueViolation?: (error) => boolean custom duplicate detector
takenMessage?: string custom duplicate-race message

`guard.validateFormat(identifier)`

Validate format, purely-numeric restriction, and reserved name status without querying the database.

Returns: Error message string if invalid or reserved, null if OK.

`guard.validateFormatOnly(identifier)`

Validate only the identifier's format and purely-numeric restriction. Does not check reserved names or query the database. Useful for instant client-side feedback on input shape.

Returns: Error message string if the format is invalid, null if OK.

`guard.normalize(identifier)`

Convenience re-export of the standalone normalize() function. Note: always applies NFKC normalization regardless of the guard's normalizeUnicode setting. Use normalize(id, { unicode: false }) directly if you need to skip NFKC.

`guard.clearCache()`

Clear the in-memory cache and reset hit/miss counters. No-op if caching is not enabled.

`guard.cacheStats()`

Get cache performance statistics.

Returns: { size: number; hits: number; misses: number }

`normalize(identifier, options?)`

Utility function to normalize identifiers. Trims whitespace, applies NFKC Unicode normalization (by default), lowercases, and strips leading @ symbols. Pass { unicode: false } to skip NFKC.

import { normalize } from "namespace-guard";

normalize("  @Sarah  "); // "sarah"
normalize("ACME-Corp"); // "acme-corp"

`isLikelyUniqueViolationError(error)`

Best-effort detector for duplicate-key / unique-constraint write errors across common stacks (Postgres 23505, Prisma P2002, MySQL ER_DUP_ENTRY, SQLite constraint errors, Mongo E11000).

Useful when you want custom race handling outside guard.claim().

`canonicalise(text, options?)`

LLM preprocessing helper that rewrites confusable characters to Latin equivalents in Latin-containing tokens.

Returns: string

Options:

threshold?: number (default 0.7)
includeNovel?: boolean (default true)
scripts?: string[] (optional script allowlist)

`scan(text, options?)`

LLM preprocessing scanner that returns structured confusable findings and summary risk metadata.

Returns: ScanResult

ScanResult:

hasConfusables: boolean
count: number
findings: ScanFinding[]
summary: { distinctChars, wordsAffected, scriptsDetected, riskLevel }

ScanFinding:

char, codepoint, script, latinEquivalent
visualScore, source ("tr39" | "novel")
index, word, mixedScript

Options:

threshold?: number (default 0.7)
includeNovel?: boolean (default true)
scripts?: string[] (optional script allowlist)
riskTerms?: string[] (optional targeted-term hints for risk scoring)

`isClean(text, options?)`

Fast boolean gate for mixed-script confusable substitutions. Short-circuits on first suspicious finding.

Returns: boolean

Options: same as scan().

`LLM_CONFUSABLE_MAP`, `LLM_CONFUSABLE_MAP_PAIR_COUNT`, `LLM_CONFUSABLE_MAP_CHAR_COUNT`, `LLM_CONFUSABLE_MAP_SOURCE_COUNTS`

Static lookup data powering the LLM preprocessing helpers.

LLM_CONFUSABLE_MAP: source char -> candidate Latin mappings with visual similarity score and source metadata
LLM_CONFUSABLE_MAP_PAIR_COUNT: total mapping rows
LLM_CONFUSABLE_MAP_CHAR_COUNT: number of source characters
LLM_CONFUSABLE_MAP_SOURCE_COUNTS: split by { tr39, novel }

Regenerate from source artifacts with:

npm run build:llm-confusable-map

Advanced Security Primitives

Use these when you need explicit pairwise spoof checks, custom scoring pipelines, or explainable risk details outside guard.check() / guard.checkRisk().

`skeleton(input, options?)`

Generate a TR39-style skeleton string for visual-comparison workflows.

Default map: CONFUSABLE_MAP_FULL (TR39/raw-input compatible)
Optional map override: { map: CONFUSABLE_MAP } for NFKC-first pipelines
Removes default-ignorable characters before map replacement

import { skeleton, CONFUSABLE_MAP } from "namespace-guard";

skeleton("pa\u0443pal"); // "paypal"
skeleton("pay\u200Bpal"); // "paypal"
skeleton("pa\u0443pal", { map: CONFUSABLE_MAP }); // explicit NFKC-first map mode

`areConfusable(a, b, options?)`

Boolean helper. Without weights, uses skeleton() equality (TR39 coverage only). With weights, also checks character-level visual similarity from confusable-vision's measured data (v2: RaySpace), including cross-script pairs.

import { areConfusable } from "namespace-guard";
import { CONFUSABLE_WEIGHTS } from "namespace-guard/confusable-weights";

areConfusable("paypal", "pa\u0443pal"); // true (skeleton match)
areConfusable("hello", "world"); // false

// With weights: catches cross-script pairs that skeleton() misses
areConfusable("\u1175", "\u4E28", { weights: CONFUSABLE_WEIGHTS }); // true (Hangul/Han)
areConfusable("\u0406", "\u0399", { weights: CONFUSABLE_WEIGHTS }); // true (Cyrillic/Greek)
areConfusable("\u1175", "\u4E28"); // false (no weights = skeleton only)

`detectCrossScriptRisk(identifier, options?)`

Analyse an identifier for cross-script confusable risk. Returns the scripts present, any confusable pairs found between different scripts, and a risk level.

import { detectCrossScriptRisk } from "namespace-guard";
import { CONFUSABLE_WEIGHTS } from "namespace-guard/confusable-weights";

detectCrossScriptRisk("hello"); // { riskLevel: "none", scripts: ["latin"], crossScriptPairs: [] }

detectCrossScriptRisk("\u1175\u4E28", { weights: CONFUSABLE_WEIGHTS });
// { riskLevel: "high", scripts: ["hangul", "han"], crossScriptPairs: [{ a: { char: "ᅵ", script: "hangul" }, b: { char: "丨", script: "han" }, score: 0.996 }] }

Options:

weights -- optional visually-scored weights for cross-script pair lookup

`confusableDistance(a, b, options?)`

Compute weighted confusable distance between two strings.

Outputs:

distance (lower means closer)
similarity (0..1)
chainDepth (number of non-trivial edit steps)
crossScriptCount, ignorableCount, divergenceCount
steps (explainable shortest-path operations)
skeletonEqual / normalizedEqual

import { confusableDistance } from "namespace-guard";

const score = confusableDistance("paypal", "pa\u0443pal");
// score.similarity, score.chainDepth, score.steps, etc.

Options:

map -- confusable character map (default: CONFUSABLE_MAP_FULL)
weights -- optional measured visual weights from confusable-vision scoring; when provided, TR39 pairs use measured cost instead of hardcoded 0.35, and novel pairs (not in TR39 map) use their visual-weight cost
context -- filter weights by deployment context: 'identifier' (XID_Continue only), 'domain' (IDNA PVALID only), or 'all' (default)

Weighted usage:

import { confusableDistance } from "namespace-guard";
import { CONFUSABLE_WEIGHTS } from "namespace-guard/confusable-weights";

const result = confusableDistance("paypal", "pa\u0443pal", {
  weights: CONFUSABLE_WEIGHTS,
  context: "identifier",
});
// Steps with reason: "visual-weight" indicate novel pairs scored via RaySpace

`isDomainSpoof(label, target, options?)`

Check whether a domain label is a realistic spoof of a target label. Unlike areConfusable(), this function only flags threats that could produce registrable domain names under ICANN IDN rules — mixed-script labels are excluded because registrars reject them.

import { isDomainSpoof } from "namespace-guard";
import { CONFUSABLE_WEIGHTS } from "namespace-guard/confusable-weights";

// Full-Cyrillic lookalike of "paypal" — realistic, registrable spoof
isDomainSpoof("\u0440\u0430\u0443\u0440\u0430\u04CF", "paypal", { weights: CONFUSABLE_WEIGHTS });
// { spoof: true, script: "cyrillic", danger: 0.91, substitutions: [...] }

// Mixed-script — cannot be registered, not a spoof
isDomainSpoof("\u0440aypal", "paypal", { weights: CONFUSABLE_WEIGHTS });
// { spoof: false }

// Known-legitimate non-Latin domain — skip via allowlist
isDomainSpoof("\u0430\u0441\u0435", "ace", {
  weights: CONFUSABLE_WEIGHTS,
  allowlist: ["\u0430\u0441\u0435"],
});
// { spoof: false }

Options:

map -- confusable character map (default: CONFUSABLE_MAP_FULL)
weights -- measured visual weights for similarity scoring
minDanger -- minimum average danger for spoof to be true (default: 0.5). The danger score is always returned regardless, so callers can apply their own threshold
allowlist -- known-legitimate non-Latin labels to skip (checked after NFKC + lowercase normalisation)

Returns DomainSpoofResult:

spoof -- opinionated verdict (true when danger >= minDanger)
script -- script of the spoofing label (always set when a match is found, even below threshold)
danger -- average visual similarity across substitutions (0–1)
substitutions -- per-character details: { index, from, to, similarity }

`deriveNfkcTr39DivergenceVectors(map?)`

Derive the composability regression corpus: characters where TR39 mapping and NFKC lowercase disagree.

`NFKC_TR39_DIVERGENCE_VECTORS`

Built-in divergence vectors derived from CONFUSABLE_MAP_FULL, useful for drift and pipeline regression tests.

`COMPOSABILITY_VECTOR_SUITE`, `COMPOSABILITY_VECTORS`, `COMPOSABILITY_VECTORS_COUNT`

Named aliases for the same corpus, packaged for cross-library regression imports.

Case-Insensitive Matching

By default, slug lookups are case-sensitive. Enable case-insensitive matching to catch collisions regardless of stored casing:

const guard = createNamespaceGuard({
  sources: [/* ... */],
  caseInsensitive: true,
}, adapter);

Each adapter handles this differently:

Prisma: Uses mode: "insensitive" on the where clause
Drizzle: Uses ilike instead of eq (pass ilike to the adapter: createDrizzleAdapter(db, tables, { eq, ilike }))
Kysely: Uses ilike operator
Knex: Uses LOWER() in a raw where clause
TypeORM: Uses ILike (pass it to the adapter: createTypeORMAdapter(dataSource, entities, ILike))
MikroORM: Uses $ilike operator
Sequelize: Uses LOWER() via Sequelize helpers (pass { where: Sequelize.where, fn: Sequelize.fn, col: Sequelize.col })
Mongoose: Uses collation { locale: "en", strength: 2 }
Raw SQL: Wraps both sides in LOWER()

Caching

Enable in-memory caching to reduce database calls during rapid checks (e.g., live form validation, suggestion generation):

const guard = createNamespaceGuard({
  sources: [/* ... */],
  cache: {
    ttl: 5000,     // milliseconds (default: 5000)
    maxSize: 1000, // max cached entries before LRU eviction (default: 1000)
  },
}, adapter);

// Manually clear the cache after writes
guard.clearCache();

// Monitor cache performance
const stats = guard.cacheStats();
// { size: 12, hits: 48, misses: 12 }

Framework Integration

Next.js (Server Actions)

// lib/guard.ts
import { createNamespaceGuard } from "namespace-guard";
import { createPrismaAdapter } from "namespace-guard/adapters/prisma";
import { prisma } from "./db";

export const guard = createNamespaceGuard({
  reserved: ["admin", "api", "settings"],
  sources: [
    { name: "user", column: "handle", scopeKey: "id" },
    { name: "organization", column: "slug", scopeKey: "id" },
  ],
  suggest: {},
}, createPrismaAdapter(prisma));

// app/signup/actions.ts
"use server";

import { guard } from "@/lib/guard";

export async function checkHandle(handle: string) {
  return guard.check(handle);
}

export async function createUser(handle: string, email: string) {
  const result = await guard.check(handle);
  if (!result.available) return { error: result.message };

  const user = await prisma.user.create({
    data: { handle: guard.normalize(handle), email },
  });
  return { user };
}

Express Middleware

import express from "express";
import { guard } from "./lib/guard";

const app = express();

// Reusable middleware
function validateSlug(req, res, next) {
  const slug = req.body.handle || req.body.slug;
  if (!slug) return res.status(400).json({ error: "Slug is required" });

  guard.check(slug, { id: req.user?.id }).then((result) => {
    if (!result.available) return res.status(409).json(result);
    req.normalizedSlug = guard.normalize(slug);
    next();
  });
}

app.post("/api/users", validateSlug, async (req, res) => {
  const user = await db.user.create({ handle: req.normalizedSlug, ... });
  res.json({ user });
});

tRPC

import { z } from "zod";
import { router, protectedProcedure } from "./trpc";
import { guard } from "./lib/guard";

export const namespaceRouter = router({
  check: protectedProcedure
    .input(z.object({ slug: z.string() }))
    .query(async ({ input, ctx }) => {
      return guard.check(input.slug, { id: ctx.user.id });
    }),

  claim: protectedProcedure
    .input(z.object({ slug: z.string() }))
    .mutation(async ({ input, ctx }) => {
      await guard.assertClaimable(input.slug, { id: ctx.user.id });
      return ctx.db.user.update({
        where: { id: ctx.user.id },
        data: { handle: guard.normalize(input.slug) },
      });
    }),
});

TypeScript

Full TypeScript support with exported types:

import {
  createNamespaceGuard,
  createNamespaceGuardWithProfile,
  createPredicateValidator,
  createProfanityValidator,
  createHomoglyphValidator,
  createInvisibleCharacterValidator,
  skeleton,
  areConfusable,
  confusableDistance,
  deriveNfkcTr39DivergenceVectors,
  isLikelyUniqueViolationError,
  NAMESPACE_PROFILES,
  DEFAULT_PROTECTED_TOKENS,
  NFKC_TR39_DIVERGENCE_VECTORS,
  COMPOSABILITY_VECTOR_SUITE,
  COMPOSABILITY_VECTORS,
  COMPOSABILITY_VECTORS_COUNT,
  CONFUSABLE_MAP,
  CONFUSABLE_MAP_FULL,
  normalize,
  type NamespaceConfig,
  type NamespaceSource,
  type NamespaceAdapter,
  type NamespaceGuard,
  type CheckResult,
  type FindOneOptions,
  type OwnershipScope,
  type NamespaceValidator,
  type NamespaceValidatorResult,
  type PredicateValidatorOptions,
  type SuggestStrategyName,
  type ProfanityValidationMode,
  type ProfanityVariantProfile,
  type ProfanityValidatorOptions,
  type InvisibleCharacterValidatorOptions,
  type SkeletonOptions,
  type CheckManyOptions,
  type CheckRiskOptions,
  type RiskCheckResult,
  type AssertClaimableOptions,
  type ClaimOptions,
  type ClaimResult,
  type EnforceRiskOptions,
  type EnforceRiskResult,
  type UniqueViolationDetector,
  type RiskReason,
  type RiskMatch,
  type RiskLevel,
  type RiskAction,
  type NamespaceProfileName,
  type NamespaceProfilePreset,
  type ConfusableDistanceOptions,
  type ConfusableDistanceResult,
  type ConfusableDistanceStep,
  type ConfusableWeight,
  type ConfusableWeights,
  type NfkcTr39DivergenceVector,
  type ComposabilityVector,
} from "namespace-guard";

Optional curated profanity subpath export:

import {
  createEnglishProfanityValidator,
  PROFANITY_WORDS_EN,
  PROFANITY_WORDS_EN_COUNT,
  PROFANITY_WORDS_EN_SOURCE,
  PROFANITY_WORDS_EN_LICENSE,
} from "namespace-guard/profanity-en";

Composability vectors subpath export:

import {
  COMPOSABILITY_VECTOR_SUITE,
  COMPOSABILITY_VECTORS,
  COMPOSABILITY_VECTORS_COUNT,
  type ComposabilityVector,
} from "namespace-guard/composability-vectors";

Confusable weights subpath export:

import { CONFUSABLE_WEIGHTS } from "namespace-guard/confusable-weights";

// 4,174 visually-scored pairs (3,111 TR39 + 1,063 novel discoveries)
// Each pair has a `danger` score (0–1) measuring geometric similarity across 245 fonts.
// The shipped dataset uses a 0.5 floor. For higher precision, filter at danger > 0.7 (574 pairs).
// Pass to confusableDistance() for measured visual costs
const result = confusableDistance("paypal", "pa\u0443pal", {
  weights: CONFUSABLE_WEIGHTS,
});

Support

If you find this useful, consider supporting the project:

Contributing

Contributions welcome! Please open an issue first to discuss what you'd like to change.

FilesExpand file tree

reference.md

Latest commit

History

reference.md

File metadata and controls

namespace-guard Reference

The Problem

Installation

Quick Start

Why namespace-guard?

Adapters

Prisma

Drizzle

Kysely

Knex

TypeORM

MikroORM

Sequelize

Mongoose

Raw SQL (pg, mysql2, better-sqlite3, etc.)

Canonical Uniqueness Migration (Per Adapter)

Prisma

Drizzle

Kysely

Knex

TypeORM

MikroORM

Sequelize

Mongoose

Raw SQL (PostgreSQL)

Backfill + duplicate check

Official docs used for adapter syntax

Configuration

Reserved Name Categories

Async Validators

Zero-Dependency External Filters

Built-in Profanity Validator

Curated English Default List (Optional Subpath)

Built-in Homoglyph Validator

Built-in Invisible Character Validator

CONFUSABLE_MAP_FULL

skeleton() and areConfusable()

How the anti-spoofing pipeline works

Why NFKC-aware filtering matters

Composability regression suite artifact

Confusable benchmark corpus artifact

LLM Pipeline Preprocessing

canonicalise(text, options?)

scan(text, options?)

isClean(text, options?)

Options (canonicalise, scan, isClean)

Unicode Normalization

Rejecting Purely Numeric Identifiers

Conflict Suggestions

Built-in Strategies

Composing Strategies

Custom Strategy Function

Batch Checking

Ownership Scoping

CLI

With a config file

With database checking

Risk command options

Attack-gen command

Audit-canonical command

Calibrate command

Recommend command

Drift command

CI drift gate

API Reference

createNamespaceGuard(config, adapter)

createNamespaceGuardWithProfile(profile, config, adapter)

guard.check(identifier, scope?)

guard.checkMany(identifiers, scope?, options?)

guard.checkRisk(identifier, options?)

guard.enforceRisk(identifier, options?)

guard.assertAvailable(identifier, scope?)

guard.assertClaimable(identifier, scope?, options?)

guard.claim(identifier, write, options?)

`skeleton()` and `areConfusable()`

`canonicalise(text, options?)`

`scan(text, options?)`

`isClean(text, options?)`

Options (`canonicalise`, `scan`, `isClean`)

`createNamespaceGuard(config, adapter)`

`createNamespaceGuardWithProfile(profile, config, adapter)`

`guard.check(identifier, scope?)`

`guard.checkMany(identifiers, scope?, options?)`

`guard.checkRisk(identifier, options?)`

`guard.enforceRisk(identifier, options?)`

`guard.assertAvailable(identifier, scope?)`

`guard.assertClaimable(identifier, scope?, options?)`

`guard.claim(identifier, write, options?)`

`guard.validateFormat(identifier)`

`guard.validateFormatOnly(identifier)`

`guard.normalize(identifier)`

`guard.clearCache()`

`guard.cacheStats()`

`normalize(identifier, options?)`

`isLikelyUniqueViolationError(error)`

`canonicalise(text, options?)`

`scan(text, options?)`

`isClean(text, options?)`

`LLM_CONFUSABLE_MAP`, `LLM_CONFUSABLE_MAP_PAIR_COUNT`, `LLM_CONFUSABLE_MAP_CHAR_COUNT`, `LLM_CONFUSABLE_MAP_SOURCE_COUNTS`

`skeleton(input, options?)`

`areConfusable(a, b, options?)`

`detectCrossScriptRisk(identifier, options?)`

`confusableDistance(a, b, options?)`

`isDomainSpoof(label, target, options?)`

`deriveNfkcTr39DivergenceVectors(map?)`

`NFKC_TR39_DIVERGENCE_VECTORS`

`COMPOSABILITY_VECTOR_SUITE`, `COMPOSABILITY_VECTORS`, `COMPOSABILITY_VECTORS_COUNT`