[LOW] ProfileSanitizer.stripUnsafe misses supplementary-plane and generic format (Cf) characters — invisible Unicode TAG block survives

## Severity: LOW

## Summary

`ProfileSanitizer.stripUnsafe` iterates the string as UTF-16 `Char` units and removes control chars (`Cc`) plus an explicit allow/deny list of specific BMP code points. It does **not** strip:

- **Supplementary-plane (non-BMP) format characters**, most notably the Unicode TAG block `U+E0000`–`U+E007F` (`Cf`), a documented invisible-text / spoofing / steganography vector. These are encoded as surrogate pairs, and `Character.getType(Char)` on an isolated surrogate returns `SURROGATE`, never `FORMAT`/`CONTROL`, so they fall through to the `else -> append(char)` branch unchanged.
- **BMP format chars not in the explicit list** (the code only catches an enumerated set), so any future-assigned or omitted `Cf` code point also survives.

## Evidence

`app/src/main/java/dev/ipf/darkmatter/core/ProfileSanitizer.kt`
```kotlin
fun stripUnsafe(value: String): String =
    buildString(value.length) {
        value.forEach { char ->                                  // per UTF-16 Char
            when {
                char == '\n' || char == '\t' || char == '\r' -> append(char)
                Character.getType(char) == Character.CONTROL.toInt() -> Unit
                char.code == 0x200E || char.code == 0x200F -> Unit
                ...                                              // explicit BMP list only
                else -> append(char)                            // non-BMP Cf (e.g. U+E0001) lands here
            }
        }
    }
```

## Impact

Display names, about text, and message bodies (all routed through `stripUnsafe`) can carry invisible TAG-block characters and other supplementary-plane / unlisted format characters, enabling hidden-text and homoglyph-style spoofing that the sanitizer is meant to prevent.

## Suggested fix

Iterate by code point and classify generically:
```kotlin
value.codePoints().forEach { cp ->
    val type = Character.getType(cp)
    when {
        cp == '\n'.code || cp == '\t'.code || cp == '\r'.code -> appendCodePoint(cp)
        type == Character.CONTROL.toInt() || type == Character.FORMAT.toInt() -> {
            // keep only the deliberately-allowed shaping joiners
            if (cp == 0x200C || cp == 0x200D) appendCodePoint(cp)
        }
        else -> appendCodePoint(cp)
    }
}
```
This covers non-BMP format characters (the TAG block) and any unlisted `Cf` code point in one rule, while preserving ZWNJ/ZWJ.

## Validation

Add a test asserting `stripUnsafe("a󠄁b")` (U+E0101) and a TAG-block string return only the visible characters.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LOW] ProfileSanitizer.stripUnsafe misses supplementary-plane and generic format (Cf) characters — invisible Unicode TAG block survives #184

Severity: LOW

Summary

Evidence

Impact

Suggested fix

Validation

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[LOW] ProfileSanitizer.stripUnsafe misses supplementary-plane and generic format (Cf) characters — invisible Unicode TAG block survives #184

Description

Severity: LOW

Summary

Evidence

Impact

Suggested fix

Validation

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions