-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Problem
The query method waits for the complete response before returning anything to the caller. For long-form answers this introduces noticeable latency — the caller receives nothing until generation is fully complete. Every major AI SDK (Anthropic, OpenAI, Gemini) exposes streaming as a first-class feature.
Proposed Behaviour
Add a query_stream method that yields response chunks as they arrive rather than waiting for the full response.
async for chunk in client.query_stream("What is Python?"):
print(chunk, end="", flush=True)- Existing
query()is unchanged query_stream()returns an async generator- Citations and metadata returned at end of stream
Files to Modify
| File | Change |
|---|---|
src/brainus_ai/client.py |
Add query_stream async generator method |
src/brainus_ai/models.py |
Add streaming chunk model |
src/brainus_ai/__init__.py |
Export new types |
Acceptance Criteria
-
query_stream()yields text chunks progressively as they arrive - Existing
query()behaviour is unchanged - Citations accessible at end of stream
- Stream can be cancelled mid-way without errors
- Full type hints on all new methods and models
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels