-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Description
While trying to synthesize pdfs I got the following error message. I was still able to get synthetic data, so even though the message appeared, I was able to produce data. Some of my original pdf's were unreadable, not sure if this had anything to do with the error.
Traceback (most recent call last):
File "/usr/local/lib/python3.10/concurrent/futures/process.py", line 243, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
File "/root/synthetic/synthetic/pdf/parser.py", line 281, in parse_pdf
synthesize_fn(pdf_file)
File "/root/synthetic/synthetic/pdf/parser.py", line 243, in synthesize_pdf
new_content_stream = parse_text(page, font_map, synthesizer)
File "/root/synthetic/synthetic/pdf/parser.py", line 184, in parse_text
text_block.set_unicode_text(text_id, modified_text)
File "/root/synthetic/synthetic/pdf/parser.py", line 50, in set_unicode_text
self.text_objects[text_id]['text'] = self._encode(text, font)
File "/root/synthetic/synthetic/pdf/parser.py", line 88, in _encode
return pikepdf.String(font.encode(s) if font else s.encode(errors=self.ERRORS))
File "/root/synthetic/synthetic/pdf/utils.py", line 14, in encode
_bytes += self.unicode_to_cid[c].to_bytes(length, 'big')
KeyError: 'F'
I used docker to produce the data:
docker run --network none -v /path/to/src_dir:/root/src_dir:ro -v /path/to/dst_dir:/root/dst_dir -it lucidtechai/synthetic pdf /root/src_dir /root/dst_dir
and Python 3.9
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels