Skip to content

Add consistent pseudolocalization#1061

Merged
wadimw merged 3 commits into
upstream-patchedfrom
consistent-pseudo
May 27, 2026
Merged

Add consistent pseudolocalization#1061
wadimw merged 3 commits into
upstream-patchedfrom
consistent-pseudo

Conversation

@wadimw
Copy link
Copy Markdown

@wadimw wadimw commented May 21, 2026

This PR adds ability to produce pseudolocalized strings consistently instead of picking replacement characters at random.

In some cases, it's not desirable to get a completely different pseudolocalized string after each run of the pseudo command. To avoid it, this PR adds a CLI option --substitute which allows to select whether replacement characters are picked at random (--substitute=RANDOM), or consistently (--substitute=CONSISTENT).

Consistent pseudo means that for a given input string, pseudolocalized output will be invariant. Note that generally there will still be different diacritic characters inserted across the whole content (e.g. a G can still be one of ĜĞĠĢ) - the choice in consistent mode is based on the amount of previous occurrences of that character inside a given input string.

To preserve original behaviour, default value for this option is RANDOM.

@wadimw wadimw changed the base branch from master to upstream-patched May 21, 2026 15:00
@wadimw wadimw requested a review from ehoogerbeets May 21, 2026 15:00
@socket-security
Copy link
Copy Markdown

socket-security Bot commented May 21, 2026

No dependency changes detected. Learn more about Socket for GitHub.

👍 No dependency changes detected in pull request

@wadimw wadimw force-pushed the consistent-pseudo branch from f3a8d9d to 231a370 Compare May 21, 2026 15:04
@wadimw wadimw marked this pull request as ready for review May 21, 2026 15:14
@wadimw wadimw added the upstream-patched Experimental features ported from legacy branch label May 21, 2026
@wadimw wadimw changed the title Consistent pseudo Add consistent pseudolocalization May 21, 2026
String result = ps.convertAsciiToDiacritics("qQV", SubstituteType.CONSISTENT);
assertEquals(
"Unmapped chars should remain unchanged with consistent substitution", "qQV", result);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we need a few tests with SubstitutionType.RANDOM to verify its functionality still works as before.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in ff4834b

ehoogerbeets
ehoogerbeets previously approved these changes May 22, 2026
Copy link
Copy Markdown
Contributor

@ehoogerbeets ehoogerbeets left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving so that after you add the requested unit tests, everything else is already good to go.

@ehoogerbeets
Copy link
Copy Markdown
Contributor

The new tests look fine, but now there are seemingly unrelated unit test errors? Any ideas as to what is going on there?

@wadimw wadimw merged commit a8f491b into upstream-patched May 27, 2026
6 of 7 checks passed
@wadimw wadimw deleted the consistent-pseudo branch May 27, 2026 10:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

upstream-patched Experimental features ported from legacy branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants