Skip to content

Doesn't handle binary logfile content #63

@jplitza

Description

@jplitza

When processing a logfile that contains binary parts, the following exception gets thrown:

Traceback (most recent call last):
  File "anonip.py", line 508, in <module>
    main()
  File "anonip.py", line 491, in main
    for line in anonip.run(input_file):
  File "anonip.py", line 161, in run
    line = input_file.readline()
  File "/usr/lib/python3.7/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbf in position 1040: invalid start byte

While obviously processing purely binary content isn't the target of this project, this issue arose while anonymizing an nginx error.log which contained the following line:

2022/01/09 05:21:49 [info] 58271#58271: *55771 client sent invalid method while reading client request line, client: 192.0.2.0, server: foo.example.org, request: "<binary rubbish>"

Note that there's even an IP address in that line that needs to be anonymized!

So maybe the file shouldn't be read as UTF-8, or as string at all for that matter, but as bytes?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions