-
Notifications
You must be signed in to change notification settings - Fork 215
Description
Summary and motivation
Docling is an extremely popular document information extraction library and in production uses cases, docling-serve can be used to host Docling in a more performant scalable fashion.
Detailed design
Follow the design of the already existing docling-haystack introducing a new DoclingServeConverter component accepting a list of str/pathlib.Path and producing a dict[str, list[Document]] but instead using the Docling serve endpoints following Docling serve's documentation. An run_async method would also be good to implement as the requests to Docling serve with be blocking.
I considered extending the existing Docling integration, but it has large dependencies (e.g., pytorch) which are unneeded when sending requests to Docling serve.
Checklist
If the request is accepted, ensure the following checklist is complete before closing this issue.
Tasks
- The code is documented with docstrings and was merged in the
mainbranch - Docs are published at https://docs.haystack.deepset.ai/
- There is a Github workflow running the tests for the integration nightly and at every PR
- A new label named like
integration:<your integration name>has been added to the list of labels for this repository - The labeler.yml file has been updated
- The package has been released on PyPI
- An integration tile with a usage example has been added to https://github.com/deepset-ai/haystack-integrations
- The integration has been listed in the Inventory section of this repo README
- The feature was announced through social media