Skip to content

GB18030 false positive with WINDOWS-1252 data set #11

@GoogleCodeExporter

Description

@GoogleCodeExporter
What steps will reproduce the problem?
1. Pass UniversalDetector a byte buffer for WINDOWS-1252 containing a series of 
degree symbols and character / numbers
 e.g. {91, -80, 52, -80, 48, -80, 84, -80, 67, -80, 67, -80, 48, -80, 67, -80, 84}
2. Call UniversalDetector#getDetectedCharset(), it should be WINDOWS-1252, but 
instead returns GB18030.

See attached unit test for minimal reproduction test case.

What is the expected output? What do you see instead?
Expected output from UniversalDetector#getDetectedCharset() is "WINDOWS-1252," 
but instead is "GB18030."

What version of the product are you using? On what operating system?
 I'm using version 1.0.3 on 64-bit Ubuntu 11.4 (Natty) with default kernel 2.6.38-10-generic.  The JDK I'm currently running is 1.6.0_23-x64.

Original issue reported on code.google.com by icw...@gmail.com on 13 Jul 2011 at 4:34

Attachments:

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions