Skip to content

feat(cli): quantcpp client (SSE streaming) + serve discoverability#47

Merged
unamedkr merged 1 commit intomainfrom
feat/cli-client-streaming
Apr 11, 2026
Merged

feat(cli): quantcpp client (SSE streaming) + serve discoverability#47
unamedkr merged 1 commit intomainfrom
feat/cli-client-streaming

Conversation

@unamedkr
Copy link
Copy Markdown
Collaborator

The HTTP server already supported OpenAI-style SSE streaming via `"stream": true`, but it wasn't easy to discover or test from the CLI.

New: `quantcpp client PROMPT`

```bash
quantcpp serve llama3.2:1b -p 8080 # in one terminal
quantcpp client "What is gravity?" # in another — streams tokens via SSE
quantcpp client "Hi" --no-stream # single JSON response
quantcpp client "Hi" --url http://other:8081
```

  • Default: SSE streaming (tokens print as they arrive)
  • `--no-stream`: single JSON response
  • Stdlib only (no requests/httpx dependency)

Improved: `quantcpp serve` startup

Now prints all endpoints, curl examples for both streaming and non-streaming modes, and the OpenAI Python SDK snippet.

Verified

  • Server SSE chunks delivered token-by-token ✓
  • `quantcpp client` decodes `data: {...}\n\n` chunks ✓
  • `--no-stream` returns single JSON ✓

Version 0.12.0 → 0.12.1.

The HTTP server already supported OpenAI-compatible SSE streaming
(controlled by `"stream": true` in the request body) but it wasn't
discoverable from the CLI. This PR makes it explicit and easy to use.

New: `quantcpp client PROMPT [--url ...] [--no-stream]`
- Sends a chat completion to a running quantcpp serve endpoint
- Default mode is streaming (SSE) — tokens print as they arrive
- --no-stream falls back to a single JSON response
- Stdlib only (urllib) — no extra dependencies

Improved: `quantcpp serve` startup output
- Now prints all three endpoints (chat/completions, models, health)
- Shows curl examples for both streaming and non-streaming modes
- Shows OpenAI Python SDK snippet for drop-in usage

Verified end-to-end: server streams token-by-token; client decodes
SSE chunks correctly; --no-stream returns single JSON.

README (EN/KO) and guide CTA updated to mention `quantcpp client`
and the streaming/non-streaming choice.

Version: 0.12.0 → 0.12.1.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@unamedkr unamedkr merged commit 9b8fc6e into main Apr 11, 2026
@unamedkr unamedkr deleted the feat/cli-client-streaming branch April 11, 2026 14:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant