Skip to content

XMLEncode: illegal XML control characters not encoded #390

Description

@elharo

XMLEncode.xmlEncodeTextAsPCDATA() passes characters in the range U+0000-U+001F (excluding U+0009 TAB, U+000A LF, U+000D CR) through unencoded in the default branch of its switch statement. These characters are illegal in XML 1.0 and will cause XML parsers to reject the output.

The default case at line 112 just does n.append(c) — it should instead encode these characters as &#xHH; numeric character references. The explicitly handled cases (&, <, >, ", ', \r, \n) cover the legal control chars, but \0, \1-\b, \v, \f, \u000E-\u001F all slip through.

This affects both attribute values (via PrettyPrintXMLWriter.addAttribute()) and text content (via PrettyPrintXMLWriter.writeText()).

Fix: add a check in the default case to encode illegal XML control characters.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions