Summary
Once entity persistence is restored (see #464), the next gap is the generator itself:
- The per-label generators use small inline arrays of fake values, so collisions and obviously-fake outputs are common — we should expand the pools and produce more realistic data (e.g., richer name pools, locale-aware addresses, realistic-looking but invalid credit card / SSN / phone formats).
- The masking pipeline regenerates a fresh dummy for every detected entity on every request. We should look up the persisted mapping first so the same original PII always maps to the same dummy, avoiding duplicates and making conversations consistent across turns.
Code path to review
Generator dispatch and per-type generators:
src/backend/pii/generator_service.go:54 — GenerateReplacement(label, originalText) and the label → generator routing table at :60
src/backend/pii/generators/pii_generators.go — all per-type generators (~500 lines). Example: EmailGenerator at :29 uses ~50 first names and ~40 last names; PhoneGenerator at :73. Expand the inline pools and/or load from data files.
Masking pipeline that currently bypasses the existing mapping:
src/backend/pii/masking_service.go:89-94 — the loop calls s.generator.GenerateReplacement(...) unconditionally for every detected entity. It does not check the existing store before generating, so each request produces a new dummy and over-writes the previous mapping via StoreMapping's upsert.
Mapping lookup that already exists and should be wired in:
src/backend/pii/mapper.go:106 — PIIMapping.GetDummy(original) — cache-first, falls through to SQLite. This is exactly the dedupe check we need before calling the generator.
src/backend/pii/database.go:185 — StoreMapping upserts; once dedupe is in place, repeated entities should hit the cache/DB and skip the generator entirely.
Suggested next steps
- In
masking_service.go:89, before calling GenerateReplacement, pass the PIIMapping (or a small lookup interface) into MaskingService and check mapper.GetDummy(originalText) first. Only generate if there's no existing mapping. Then AddMapping the new one so future requests are consistent.
- Audit each generator in
pii_generators.go for pool size and realism. Decide whether to keep them inline (and just expand) or move to embedded data files (embed.FS) so we can ship larger, locale-aware pools without bloating the source.
- Add tests in
src/backend/pii/generators/pii_generators_test.go covering: (a) the generator never returns the original input, (b) repeated calls for the same (label, original) return the persisted dummy rather than a new one (this test will fail until step 1 lands).
Depends on
Summary
Once entity persistence is restored (see #464), the next gap is the generator itself:
Code path to review
Generator dispatch and per-type generators:
src/backend/pii/generator_service.go:54—GenerateReplacement(label, originalText)and the label → generator routing table at:60src/backend/pii/generators/pii_generators.go— all per-type generators (~500 lines). Example:EmailGeneratorat:29uses ~50 first names and ~40 last names;PhoneGeneratorat:73. Expand the inline pools and/or load from data files.Masking pipeline that currently bypasses the existing mapping:
src/backend/pii/masking_service.go:89-94— the loop callss.generator.GenerateReplacement(...)unconditionally for every detected entity. It does not check the existing store before generating, so each request produces a new dummy and over-writes the previous mapping viaStoreMapping's upsert.Mapping lookup that already exists and should be wired in:
src/backend/pii/mapper.go:106—PIIMapping.GetDummy(original)— cache-first, falls through to SQLite. This is exactly the dedupe check we need before calling the generator.src/backend/pii/database.go:185—StoreMappingupserts; once dedupe is in place, repeated entities should hit the cache/DB and skip the generator entirely.Suggested next steps
masking_service.go:89, before callingGenerateReplacement, pass thePIIMapping(or a small lookup interface) intoMaskingServiceand checkmapper.GetDummy(originalText)first. Only generate if there's no existing mapping. ThenAddMappingthe new one so future requests are consistent.pii_generators.gofor pool size and realism. Decide whether to keep them inline (and just expand) or move to embedded data files (embed.FS) so we can ship larger, locale-aware pools without bloating the source.src/backend/pii/generators/pii_generators_test.gocovering: (a) the generator never returns the original input, (b) repeated calls for the same(label, original)return the persisted dummy rather than a new one (this test will fail until step 1 lands).Depends on
StoreMappingis actually persisting again.