Skip to content

AttributeError: 'WarcIndexer' object has no attribute 'records' #8

Description

@martinvahi
warc_librarian@acstorage3334:/media/pi/Sinine230GiBUSB/warc_librarian $ Exception in thread Thread-5:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "./warcproxy.py", line 112, in run
    http_response = parse_http_response(record)
  File "./warcproxy.py", line 24, in parse_http_response
    remainder = message.feed(record.content[1])
  File "/home/warc_librarian/m_local/bin_p/warc_proxy/v2016_11_03/hanzo/httptools/messaging.py", line 576, in feed
    text = HTTPMessage.feed(self, text)
  File "/home/warc_librarian/m_local/bin_p/warc_proxy/v2016_11_03/hanzo/httptools/messaging.py", line 97, in feed
    text = self.feed_headers(text)
  File "/home/warc_librarian/m_local/bin_p/warc_proxy/v2016_11_03/hanzo/httptools/messaging.py", line 191, in feed_headers
    line, text = self.feed_line(text)
  File "/home/warc_librarian/m_local/bin_p/warc_proxy/v2016_11_03/hanzo/httptools/messaging.py", line 159, in feed_line
    text = str(self.buffer[pos:])
MemoryError

ERROR:tornado.application:Uncaught exception POST /load-warc (::1)
HTTPRequest(protocol='http', host='warc', method='POST', uri='/load-warc', version='HTTP/1.1', remote_ip='::1', headers={'Origin': 'http://warc', 'Content-Length': '102', 'Accept-Language': 'en-us;q=0.750', 'Accept-Encoding': 'gzip, deflate', 'Host': 'warc', 'Accept': 'application/json, text/javascript, */*; q=0.01', 'User-Agent': 'Mozilla/5.0 (X11; Linux) AppleWebKit/538.15 (KHTML, like Gecko) Chrome/18.0.1025.133 Safari/538.15 Midori/0.5', 'Connection': 'Keep-Alive', 'X-Requested-With': 'XMLHttpRequest', 'Referer': 'http://warc/static/list.html', 'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8'})
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/tornado/web.py", line 1346, in _when_complete
    callback()
  File "/usr/lib/python2.7/dist-packages/tornado/web.py", line 1367, in _execute_method
    self._when_complete(method(*self.path_args, **self.path_kwargs),
  File "./warcproxy.py", line 344, in post
    index_status = self.warc_proxy.load_warc_file(path)
  File "./warcproxy.py", line 142, in load_warc_file
    self.indices[path] = indexer.records
AttributeError: 'WarcIndexer' object has no attribute 'records'
ERROR:tornado.access:500 POST /load-warc (::1) 30.11ms

The ~560MiB sized WARC-file that probably was used, when this happened, MIGHT be available from
http://temporary.softf1.com/2017/bugs/www.clausewitz.com-2017-02-09-8df72096-00000.warc.gz
It might have happened with some other WARC-file, I'm not totally sure, but the referenced one also fails to load for what ever reason.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions