What steps will reproduce the problem?
1. Pass UniversalDetector a byte buffer for WINDOWS-1252 containing a series of
degree symbols and character / numbers
e.g. {91, -80, 52, -80, 48, -80, 84, -80, 67, -80, 67, -80, 48, -80, 67, -80, 84}
2. Call UniversalDetector#getDetectedCharset(), it should be WINDOWS-1252, but
instead returns GB18030.
See attached unit test for minimal reproduction test case.
What is the expected output? What do you see instead?
Expected output from UniversalDetector#getDetectedCharset() is "WINDOWS-1252,"
but instead is "GB18030."
What version of the product are you using? On what operating system?
I'm using version 1.0.3 on 64-bit Ubuntu 11.4 (Natty) with default kernel 2.6.38-10-generic. The JDK I'm currently running is 1.6.0_23-x64.
Original issue reported on code.google.com by
icw...@gmail.comon 13 Jul 2011 at 4:34Attachments: