-
Notifications
You must be signed in to change notification settings - Fork 12
Incorrect unescape result. #53
Description
julia> XML.unescape("" &")
"\" &"I think the correct result should be " &
The unescape function processes each escape sequence as it appears after the previous unescape has completed.
So, in order, " => " and then, in a second bite, " => "
I think this is incorrect.
Claude suggests the following:
const escape_chars = ['&' => "&", '<' => "<", '>' => ">", '"' => """, '\'' => "'"]
function unescape(x::AbstractString)
result = x
for (char, entity) in reverse(escape_chars)
result = replace(result, entity => char)
end
return result
endFurther, about the escape function, Claude says:
"This approach is clever but has a subtle bug — the regex r"&(?!amp;|quot;|apos;|gt;|lt;)" is intended to only escape & characters that aren't already part of an XML entity, but this means the function assumes the input may contain already-escaped XML entities and tries to preserve them. That's an unusual contract for an escape function, which normally treats its input as plain text and escapes everything unconditionally. If the lookahead behaviour is intentional, it's worth documenting clearly that the function is idempotent by design."
It therefore suggests this for escape:
function escape(x::AbstractString)
result = replace(x, '&' => "&")
for (char, entity) in escape_chars[2:end]
result = replace(result, char => entity)
end
return result
endThis also restores AbstractString from your original for generality.
As I said before, I don't have a view about the behaviour of escape but I do think the unescape behaviour is wrong and should be fixed.