Skip to content

Add new DoclingServe Integration #2960

@maxdswain

Description

@maxdswain

Summary and motivation

Docling is an extremely popular document information extraction library and in production uses cases, docling-serve can be used to host Docling in a more performant scalable fashion.

Detailed design

Follow the design of the already existing docling-haystack introducing a new DoclingServeConverter component accepting a list of str/pathlib.Path and producing a dict[str, list[Document]] but instead using the Docling serve endpoints following Docling serve's documentation. An run_async method would also be good to implement as the requests to Docling serve with be blocking.

I considered extending the existing Docling integration, but it has large dependencies (e.g., pytorch) which are unneeded when sending requests to Docling serve.

Checklist

If the request is accepted, ensure the following checklist is complete before closing this issue.

Tasks

  • The code is documented with docstrings and was merged in the main branch
  • Docs are published at https://docs.haystack.deepset.ai/
  • There is a Github workflow running the tests for the integration nightly and at every PR
  • A new label named like integration:<your integration name> has been added to the list of labels for this repository
  • The labeler.yml file has been updated
  • The package has been released on PyPI
  • An integration tile with a usage example has been added to https://github.com/deepset-ai/haystack-integrations
  • The integration has been listed in the Inventory section of this repo README
  • The feature was announced through social media

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2new integrationDiscuss the creation of a new integration in Core

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions