Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
60fae32
chore: bump version to 0.1.5-canary.4 in package.json and version-emb…
dev-pi2pie Mar 25, 2026
1d4f757
docs(research): resolve detector policy and inspector surface decisions
dev-pi2pie Mar 25, 2026
1560e6d
docs(research): refine detector inspect contract and schema direction
dev-pi2pie Mar 25, 2026
0c6f41e
docs(research): clarify inspect v1 limits and output contracts
dev-pi2pie Mar 25, 2026
7f8200b
docs(plans): add detector policy refactor and inspect command documen…
dev-pi2pie Mar 25, 2026
2f79b47
feat(detector): refactor wasm policy and add Hani inspector groundwork
dev-pi2pie Mar 25, 2026
3b0c9e5
feat(inspect): add detector inspector API and CLI command
dev-pi2pie Mar 25, 2026
617b4c8
fix(cli): add inspect short aliases for path and format
dev-pi2pie Mar 25, 2026
c6f4789
docs(research): add follow-up studies for content-gate config and ins…
dev-pi2pie Mar 25, 2026
c96a452
docs(research): solidify inspect batch path and json contract
dev-pi2pie Mar 25, 2026
6a5515c
docs(research): finalize inspect batch CLI contract
dev-pi2pie Mar 25, 2026
dd05f37
docs(plan): add inspect batch implementation plan
dev-pi2pie Mar 25, 2026
4cf2fda
feat(cli): implement inspect batch path mode and json pretty output
dev-pi2pie Mar 25, 2026
b0af585
fix(inspect): align batch skip handling and docs contract
dev-pi2pie Mar 25, 2026
9f76c87
docs(plans): add phased checklist to ts modularization plan
dev-pi2pie Mar 25, 2026
b2e50f1
refactor(cli): split inspect command and command test suites
dev-pi2pie Mar 25, 2026
7638f88
refactor(cli): split path resolution and batch aggregation helpers
dev-pi2pie Mar 25, 2026
67ce024
refactor(detector): complete wasm modularization and split word count…
dev-pi2pie Mar 25, 2026
3f0d8e0
test(detector): add regression coverage for inspect fallback paths
dev-pi2pie Mar 25, 2026
7e8b882
docs(detector): add language detection support guide
dev-pi2pie Mar 25, 2026
e80b541
docs(research): revise configurable content gate direction
dev-pi2pie Mar 25, 2026
fcf7a95
docs(plans): add initial plan for configurable content gate behavior
dev-pi2pie Mar 25, 2026
f13c7b5
chore(config): add oxlint and oxfmt configuration files with ignore p…
dev-pi2pie Mar 25, 2026
2f2c59d
chore(lint): add oxlint config and fix noisy warnings
dev-pi2pie Mar 25, 2026
8033890
style(format): run oxfmt across source and tests
dev-pi2pie Mar 25, 2026
11e269e
docs(plans): update validation steps and add lint/format checks for c…
dev-pi2pie Mar 25, 2026
cd5ace4
feat(detector): add configurable content gate modes
dev-pi2pie Mar 25, 2026
2eee2c6
feat(inspect): disclose content gate mode in diagnostics
dev-pi2pie Mar 25, 2026
f317480
docs(content-gate): revise mode design and reopen implementation plan
dev-pi2pie Mar 25, 2026
0625231
feat(detector): couple content gate modes to latin eligibility
dev-pi2pie Mar 25, 2026
b00bc65
docs(content-gate): define planned hani mode calibration
dev-pi2pie Mar 25, 2026
d7f3174
feat(detector): add hani mode-aware eligibility thresholds
dev-pi2pie Mar 25, 2026
c9b61c1
docs(content-gate): finalize plan closure and detector docs
dev-pi2pie Mar 25, 2026
fe04316
fix(detector): default inspect API to wasm pipeline
dev-pi2pie Mar 25, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions .oxfmtrc.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
{
"ignorePatterns": [
"coverage/**",
"dist/**",
"generated/**",
"node_modules/**",
"examples/playground/**"
]
}
12 changes: 12 additions & 0 deletions .oxlintrc.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
{
"env": {
"builtin": true
},
"ignorePatterns": [
"coverage/**",
"dist/**",
"generated/**",
"node_modules/**",
"examples/playground/**"
]
}
77 changes: 77 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,18 +101,94 @@ Enable the optional WASM detector for ambiguous Latin and Han routes:
```bash
word-counter --detector wasm "This sentence should clearly be detected as English for the wasm detector path."
word-counter --detector wasm "漢字測試需要更多內容才能觸發偵測"
word-counter --detector wasm --content-gate strict "Internationalization documentation remains understandable."
word-counter --detector wasm --content-gate loose "四字成語"
word-counter --detector wasm --content-gate off "mode: debug\ntee: true\npath: logs\nUse this for testing."
```

Inspect detector behavior without count output:

```bash
word-counter inspect "こんにちは、世界!これはテストです。"
word-counter inspect --view engine "This sentence should clearly be detected as English for the wasm detector path."
word-counter inspect --detector regex -f json "こんにちは、世界!これはテストです。"
word-counter inspect --detector regex -f json --pretty "こんにちは、世界!これはテストです。"
word-counter inspect --detector wasm --content-gate off "mode: debug\ntee: true\npath: logs\nUse this for testing."
word-counter inspect -p ./examples/yaml-basic.md
word-counter inspect -p ./examples/test-case-multi-files-support
word-counter inspect -p ./examples/test-case-multi-files-support --section content -f json --pretty
```

Detector mode notes:

- `--detector regex` is the default behavior.
- `--detector wasm` only runs for ambiguous `und-Latn` and `und-Hani` chunks.
- `--content-gate default|strict|loose|off` configures the shared detector policy mode used by the WASM detector path.
- `default`: current fixture-backed project policy
- `strict`: raises detector eligibility thresholds and makes more borderline windows fall back
- `loose`: lowers detector eligibility thresholds and makes more borderline windows eligible or upgradable
- `off`: bypasses `contentGate` evaluation only
- mode behavior differs by route:
- `und-Latn`: `default|strict|loose` affect both eligibility and the Latin prose-style `contentGate`
- `und-Hani`: `default|strict|loose` affect eligibility only, while `contentGate` still reports `policy=none`
- current Hani behavior:
- `default`: keeps the current Hani diagnostic-sample threshold
- `strict`: raises the Hani diagnostic-sample threshold
- `loose`: uses a short-window Han-focused threshold so idiom-length samples such as `四字成語` can become eligible
- `off`: keeps the same Hani eligibility thresholds as `default`
- `--detector regex` keeps the original script/regex chunk-first detection path.
- `--detector wasm` uses a detector-oriented ambiguous-window scoring pass before accepted tags are projected back onto the counting chunks.
- In `--detector wasm` mode, Latin hint rules and explicit Latin hint flags are deferred until after detector evaluation and only relabel unresolved `und-Latn` output.
- Very short chunks stay on the original `und-*` fallback.
- Low-confidence or unsupported detector results fall back to `und-*`.
- Technical-noise-heavy Latin windows stay conservative and may remain `und-Latn` even when the detector produces a wrong-but-confident language guess.
- inspect/debug disclosure uses `contentGate` as the canonical gate field.
- legacy debug/evidence payloads still emit `qualityGate` as a compatibility alias derived from `contentGate.passed`.
- for practical verification, use `inspect` to compare direct mode outcomes across `default`, `strict`, `loose`, and `off`; use `--debug --detector-evidence` when you specifically need counting-flow event details or legacy `qualityGate` compatibility
- `word-counter inspect` supports:
- positional text input
- one direct `-p, --path <file>` input
- repeated `-p, --path` inputs for batch inspect
- directory inputs in default `--path-mode auto`
- literal file-only path handling in `--path-mode manual`
- `--section all|frontmatter|content`
- batch inspect keeps counting-style path acquisition but not counting aggregation:
- no inspect `--merged`
- no inspect `--per-file`
- no inspect `--jobs`

### Detector Subpath (`@dev-pi2pie/word-counter/detector`)

Use the detector subpath when you need async detector-aware APIs directly in library code.

```ts
import {
inspectTextWithDetector,
segmentTextByLocaleWithDetector,
wordCounterWithDetector,
} from "@dev-pi2pie/word-counter/detector";

const inspectResult = await inspectTextWithDetector("こんにちは、世界!これはテストです。", {
detector: "wasm",
view: "pipeline",
});
const countResult = await wordCounterWithDetector(
"Internationalization documentation remains understandable.",
{
detector: "wasm",
contentGate: { mode: "strict" },
},
);
```

Detector subpath notes:

- detector entrypoints are async
- use the root package for normal counting when you do not need detector-specific control
- detector-subpath APIs that execute detector policy also accept:
- `contentGate: { mode: "default" | "strict" | "loose" | "off" }`
- use `detectorDebug` for counting-flow runtime diagnostics
- use `inspectTextWithDetector()` for direct detector diagnosis as structured data

Collect non-words (emoji/symbols/punctuation):

Expand Down Expand Up @@ -500,6 +576,7 @@ Import from `@dev-pi2pie/word-counter/detector` for the explicit detector-enable
| `wordCounterWithDetector` | function | Async detector-aware counting entrypoint. |
| `segmentTextByLocaleWithDetector` | function | Async detector-aware locale segmentation. |
| `countSectionsWithDetector` | function | Async detector-aware section counting. |
| `inspectTextWithDetector` | function | Async detector-aware inspect entrypoint. |
| `DEFAULT_DETECTOR_MODE` | value | Current default detector mode (`regex`). |
| `DETECTOR_MODES` | value | Supported detector modes. |

Expand Down
Loading