Skip to content

Fails to Detect UTF-8 without BOM #13

@GoogleCodeExporter

Description

@GoogleCodeExporter
What steps will reproduce the problem?
1. Save a file in UTF-8 without BOM
2. Try to detect Character Encoding.

What is the expected output? What do you see instead?
I expect to see UTF-8 from the #getDetectedCharset() method. Instead I get null.

What version of the product are you using? On what operating system?
I am using juniversalchardet-1.0.3.jar on a Windows 7 System.


Please provide any additional information below.
When I use UTF-8 with BOM I can detect the file just fine but Java does not 
support BOM so I get characters at the beginning of the file which I do not 
want. Therefore I have been using UTF-8 without BOM.

Perhaps I am not feeding the detector enough data with the file I am reading 
in? Although I don't think that is the case because I have extended the amount 
of data inside of the file up to 171390 characters with no difference.

Original issue reported on code.google.com by mgunnett...@gmail.com on 30 Sep 2011 at 10:20

Attachments:

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions