Skip to content

How to handle undefined conversions? #12

@krepflap

Description

@krepflap

I was wondering how to replace undefined conversions by a substitute character when they are outside of the destination encoding, e.g. when I try to convert the euro sign (€) to SHIFT JIS encoding.

In Ruby, we can do this:

"xx€xx".encode('SHIFT_JIS', 'UTF-8', undef: :replace)
=> "xx?xx"

And the € which cannot be converted is replaced by a "?" character. This is important when doing text comparison i.e. https://unicode.org/reports/tr36/#Text_Comparison

When converting charsets, never simply omit characters that cannot be converted; at least substitute U+FFFD (when converting to Unicode) or 0x1A (when converting to bytes) to reduce security problems.

Can we do this using iconv library in Elixir/Erlang? Currently the undefined character is omitted. I guess I could do the conversion char by char and check if it returns an empty string but I was hoping if there is anything more elegant possible?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions