When processing a logfile that contains binary parts, the following exception gets thrown:
Traceback (most recent call last):
File "anonip.py", line 508, in <module>
main()
File "anonip.py", line 491, in main
for line in anonip.run(input_file):
File "anonip.py", line 161, in run
line = input_file.readline()
File "/usr/lib/python3.7/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbf in position 1040: invalid start byte
While obviously processing purely binary content isn't the target of this project, this issue arose while anonymizing an nginx error.log which contained the following line:
2022/01/09 05:21:49 [info] 58271#58271: *55771 client sent invalid method while reading client request line, client: 192.0.2.0, server: foo.example.org, request: "<binary rubbish>"
Note that there's even an IP address in that line that needs to be anonymized!
So maybe the file shouldn't be read as UTF-8, or as string at all for that matter, but as bytes?
When processing a logfile that contains binary parts, the following exception gets thrown:
While obviously processing purely binary content isn't the target of this project, this issue arose while anonymizing an nginx error.log which contained the following line:
Note that there's even an IP address in that line that needs to be anonymized!
So maybe the file shouldn't be read as UTF-8, or as string at all for that matter, but as bytes?