What steps will reproduce the problem?
1. Save a file in UTF-8 without BOM
2. Try to detect Character Encoding.
What is the expected output? What do you see instead?
I expect to see UTF-8 from the #getDetectedCharset() method. Instead I get null.
What version of the product are you using? On what operating system?
I am using juniversalchardet-1.0.3.jar on a Windows 7 System.
Please provide any additional information below.
When I use UTF-8 with BOM I can detect the file just fine but Java does not
support BOM so I get characters at the beginning of the file which I do not
want. Therefore I have been using UTF-8 without BOM.
Perhaps I am not feeding the detector enough data with the file I am reading
in? Although I don't think that is the case because I have extended the amount
of data inside of the file up to 171390 characters with no difference.
Original issue reported on code.google.com by mgunnett...@gmail.com on 30 Sep 2011 at 10:20
Attachments:
Original issue reported on code.google.com by
mgunnett...@gmail.comon 30 Sep 2011 at 10:20Attachments: