Skip to content

feat: streaming support for query responses #2

@0xneobyte

Description

@0xneobyte

Problem

The query method waits for the complete response before returning anything to the caller. For long-form answers this introduces noticeable latency — the caller receives nothing until generation is fully complete. Every major AI SDK (Anthropic, OpenAI, Gemini) exposes streaming as a first-class feature.

Proposed Behaviour

Add a query_stream method that yields response chunks as they arrive rather than waiting for the full response.

async for chunk in client.query_stream("What is Python?"):
    print(chunk, end="", flush=True)
  • Existing query() is unchanged
  • query_stream() returns an async generator
  • Citations and metadata returned at end of stream

Files to Modify

File Change
src/brainus_ai/client.py Add query_stream async generator method
src/brainus_ai/models.py Add streaming chunk model
src/brainus_ai/__init__.py Export new types

Acceptance Criteria

  • query_stream() yields text chunks progressively as they arrive
  • Existing query() behaviour is unchanged
  • Citations accessible at end of stream
  • Stream can be cancelled mid-way without errors
  • Full type hints on all new methods and models

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions