async Responses API has severe latency variance for small concurrent requests

### Summary

We are seeing large latency outliers when using `openai-java` async Responses API calls in a Java service.

Most OpenAI requests complete in about `1-3s`, but some otherwise similar requests occasionally take `>30s`. These outliers cause our outer application timeout to fire even though the rest of our pipeline is fast.

This does not look like local CPU or executor blockage. In the slow cases, our service continues processing other work normally while the OpenAI call remains in flight.

### Environment

- `openai-java`: `4.24.1`
- Java: `21`
- Framework: `Quarkus 3.30.8`
- Client: `OpenAIOkHttpClientAsync`
- Model: `gpt-5.1`

### How we use the SDK

We use one shared `OpenAIClientAsync` and run an async pipeline with:

- one non-streaming classification call via `client.responses().create(...)`
- one streaming planner call via `client.responses().createStreaming(...)`
- several parallel non-streaming calls via `client.responses().create(...)`
- one final non-streaming call via `client.responses().create(...)`

All of this happens inside a single chat turn. In production-like traffic, multiple chat turns may also overlap in time.

### Expected behavior

We expect occasional variance, but not very large outliers for otherwise small JSON-style requests.

A request pattern where most calls finish in `1-3s` but some similar calls take `~30s` makes it hard to set a reasonable application timeout.

### Actual behavior

We are seeing severe tail latency on individual OpenAI calls:

- in one case, the final non streaming request took `27.439s`
- in another case, the planner stream took `35.109s`

Meanwhile, nearby requests in the same system complete much faster.

### Why this does not seem like a local thread/executor problem

During the slow request:

- other requests in the same service continue to run
- local search calls complete in about `0.3-0.7s`
- other OpenAI calls in nearby flows complete in about `1-3s`
- completion logs for the slow request appear on the OkHttp / stream handler side, suggesting the service is waiting on the remote OpenAI call rather than being blocked locally

### Example timings observed

Typical successful runs:

- calls often complete in `~1-3s`
- whole flow often completes in `~6-11s`

Outlier runs:

1. final call outlier:
- planner completed in `3.542s`
- parallel calls completed in `2.932s`
- internal api call completed in `0.317s`
- final call completed in `27.439s`

2. Planner stream outlier:
- flow started normally
- outer flow timed out at `25s`
- planner stream itself completed only after `35.109s`

### Important detail

The slow final call request is not a large prompt. The static prompt is very small and the payload is compact. shouldnt be more than 500 tokens. So this does not seem to be caused by an unusually large request on our side. The outputs for all these calls are also not large. each call will return a JSON with <200 chars. 

### Question

Is this level of tail latency variance expected for `responses().create(...)` / `responses().createStreaming(...)` with `gpt-5.1` under bursty concurrent usage?

If so:

- are there SDK-level recommendations for this traffic pattern?
- are there known best practices around concurrent async requests with one shared `OpenAIClientAsync`?
- is there a better model or request pattern for lower tail latency on small JSON-style calls?

### Minimal reproduction shape

The exact business logic is not important; the pattern is:

1. shared `OpenAIClientAsync`
2. start one async non-streaming call
3. start one streaming planner call
4. start several parallel JSON calls
5. start one more async JSON call
6. observe that most requests complete quickly, but occasional outliers take much longer than the rest

Pseudo-flow:

```java
OpenAIClientAsync client = OpenAIOkHttpClientAsync.builder()
    .apiKey(System.getenv("OPENAI_API_KEY"))
    .build(); // did not pass any OkHttp params here, could that be an issue? 

CompletableFuture<Response> classify =
    client.responses().create(classifyParams);

AsyncStreamResponse<ResponseStreamEvent> plannerStream =
    client.responses().createStreaming(plannerParams);

CompletableFuture<Void> plannerDone = plannerStream.onCompleteFuture();

CompletableFuture<Response> agent1 = plannerDone.thenCompose(v -> client.responses().create(agent1Params));
CompletableFuture<Response> agent2 = plannerDone.thenCompose(v -> client.responses().create(agent2Params));
CompletableFuture<Response> agent3 = plannerDone.thenCompose(v -> client.responses().create(agent3Params));

CompletableFuture<Response> agent4 =
    CompletableFuture.allOf(agent1, agent2, agent3)
        .thenCompose(v -> client.responses().create(agent4Params));

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

async Responses API has severe latency variance for small concurrent requests #722

Summary

Environment

How we use the SDK

Expected behavior

Actual behavior

Why this does not seem like a local thread/executor problem

Example timings observed

Important detail

Question

Minimal reproduction shape

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

async Responses API has severe latency variance for small concurrent requests #722

Description

Summary

Environment

How we use the SDK

Expected behavior

Actual behavior

Why this does not seem like a local thread/executor problem

Example timings observed

Important detail

Question

Minimal reproduction shape

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions