Skip to content

Docket worker cannot serialize litellm exceptions after task failure #231

@bsbodden

Description

@bsbodden

When a background task fails with a litellm exception (rate limit, timeout, connection error, etc.), the docket worker cannot serialize the exception for the result queue.

What happens

docket/worker.py calls cloudpickle.dumps(e) on any exception from a failed task. The dumps succeeds, but cloudpickle.loads() later tries to reconstruct the exception by calling ExceptionClass.__init__() without arguments. litellm exception classes require message, model, and llm_provider as positional args, so this raises TypeError.

The worker can't store or report the error. The task silently disappears.

Affected classes

All litellm exception types: APIConnectionError, RateLimitError, Timeout, ServiceUnavailableError, BadRequestError, AuthenticationError, NotFoundError, ContentPolicyViolationError, InternalServerError, BadGatewayError, PermissionDeniedError, UnprocessableEntityError, APIError, APIResponseValidationError, ContextWindowExceededError.

Reproduction

import cloudpickle
import litellm

exc = litellm.exceptions.RateLimitError(
    message="rate limited", model="gpt-4", llm_provider="openai"
)
data = cloudpickle.dumps(exc)
cloudpickle.loads(data)  # TypeError: __init__() missing required positional arguments

Fix

Monkey-patch __reduce__ on litellm exception classes so cloudpickle reconstructs them via Exception.__new__() + __dict__ restoration, bypassing __init__. See agent_memory_server/litellm_pickle_compat.py.

The root cause is upstream in litellm — their exception classes don't implement pickle protocol methods. Reported separately there.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions