From 9a6385a051d4a3f30d9463ddf97ca7309ba58155 Mon Sep 17 00:00:00 2001 From: MinglesAI Date: Tue, 31 Mar 2026 10:31:46 +0000 Subject: [PATCH 1/3] feat: port signing fix, tool emulation, retry, circuit breaker, UI + docs - gonka_client.py: add encode_with_low_s() standalone function, stream_read_timeout parameter, updated signing to hash payload before signing (Phase 3 signing) - tool_emulation.py: port tool emulation module (no DB deps); supports JSON and XML formats, streaming and non-streaming paths - retry.py: port retry utilities with exponential backoff - circuit_breaker.py: port circuit breaker (5-failure threshold, 60s recovery) - main.py: wire tool emulation, circuit breaker, retry into chat completions; GONKA_STREAM_READ_TIMEOUT setting; circuit breaker state in /health - app/static/index.html: updated UI from private repo, no mingles.ai references - test_tool_emulation.py: port test suite with adapted imports - README.md: clean self-hosted docs, all env vars, Docker example, removed gonka-gateway.mingles.ai link Closes #1 --- README.md | 199 ++- app/circuit_breaker.py | 181 ++ app/gonka_client.py | 170 +- app/main.py | 130 +- app/retry.py | 173 ++ app/static/index.html | 3611 +++++++++++++++++++++++++++++++++++----- app/tool_emulation.py | 530 ++++++ test_tool_emulation.py | 449 +++++ 8 files changed, 4854 insertions(+), 589 deletions(-) create mode 100644 app/circuit_breaker.py create mode 100644 app/retry.py create mode 100644 app/tool_emulation.py create mode 100644 test_tool_emulation.py diff --git a/README.md b/README.md index aa550a3..5199025 100644 --- a/README.md +++ b/README.md @@ -1,61 +1,54 @@ # Gonka OpenAI Proxy -OpenAI-compatible API proxy for Gonka that provides ChatGPT-like interface with API key authentication. - -👉 **Try it here:** -👉 https://gonka-gateway.mingles.ai/ +OpenAI-compatible API proxy for Gonka that provides a ChatGPT-like interface with API key authentication. Self-hosted, no database required — configured entirely via environment variables. ## Features -- **OpenAI-compatible API**: Compatible with OpenAI Python SDK and other OpenAI-compatible clients -- **API Key Authentication**: Secure access using API keys (like ChatGPT API) +- **OpenAI-compatible API**: Drop-in replacement for OpenAI Python SDK and other OpenAI-compatible clients +- **API Key Authentication**: Secure access using configurable API keys - **Streaming Support**: Supports both streaming and non-streaming responses +- **Tool Emulation**: Automatic prompt-based tool call emulation for models that don't support native tool calling +- **Circuit Breaker**: Prevents cascading failures when the Gonka backend is degraded +- **Retry with Backoff**: Automatic retry with exponential backoff on transient errors - **Web Interface**: Built-in web chat interface for testing -- **Automatic Model Loading**: Loads available models from Gonka API on startup - **Docker Support**: Ready-to-use Docker container -## Configuration +## Quick Start -Copy `.env.example` to `.env` and configure the following variables: +### Running Locally +1. Clone the repository: ```bash -# Gonka API Configuration -GONKA_PRIVATE_KEY=your_hex_private_key_here -GONKA_ADDRESS=your_gonka_address_bech32 -GONKA_ENDPOINT=https://host:port/v1 -GONKA_PROVIDER_ADDRESS=provider_gonka_address_bech32 - -# API Key for external access (like ChatGPT API) -API_KEY=sk-your-secret-api-key-here - -# Server Configuration (optional) -HOST=0.0.0.0 -PORT=8000 +git clone https://github.com/MinglesAI/gonka-proxy.git +cd gonka-proxy ``` -### Configuration Details - -#### GONKA_PROVIDER_ADDRESS - -**What is it?** `GONKA_PROVIDER_ADDRESS` is the provider (host) address in the Gonka network in bech32 format. It is used to sign requests to the Gonka API. - -**Where to get it?** - -1. **From provider documentation**: If you are using a specific Gonka provider, their address should be specified in their documentation or provider page. - -2. **From endpoint metadata**: The provider address is usually associated with the endpoint (`GONKA_ENDPOINT`). The provider should specify their Gonka address in the documentation or during registration. +2. Install dependencies: +```bash +pip install -r requirements.txt +``` -3. **Via Gonka Dashboard**: If you have access to the Gonka Dashboard, the provider address can be found in your connection information or node settings. +3. Create a `.env` file (see [Environment Variables](#environment-variables)): +```bash +cp .env.example .env +# Edit .env with your values +``` -4. **Contact the provider**: If you are using a public Gonka endpoint, contact the endpoint owner or Gonka support to get the provider address. +4. Run the server: +```bash +python -m app.main +``` -**Example**: The address usually looks like `gonka1...` (bech32 format), e.g., `gonka1abc123def456...` +Or with uvicorn directly: +```bash +uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload +``` -**Important**: The provider address is used in the cryptographic signature of each request, so it must be correct for successful authentication. +5. Open `http://localhost:8000/` in your browser to use the web chat interface. -## Running with Docker +### Running with Docker -1. Build the Docker image: +1. Build the image: ```bash docker build -t gonka-proxy . ``` @@ -65,36 +58,80 @@ docker build -t gonka-proxy . docker run -d \ --name gonka-proxy \ -p 8000:8000 \ - --env-file .env \ + -e GONKA_PRIVATE_KEY=your_hex_private_key \ + -e GONKA_ADDRESS=your_gonka_address_bech32 \ + -e GONKA_ENDPOINT=https://host:port/v1 \ + -e GONKA_PROVIDER_ADDRESS=provider_gonka_address_bech32 \ + -e API_KEY=sk-your-secret-api-key \ gonka-proxy ``` -## Running Locally - -1. Install dependencies: +Or with a `.env` file: ```bash -pip install -r requirements.txt +docker run -d \ + --name gonka-proxy \ + -p 8000:8000 \ + --env-file .env \ + gonka-proxy ``` -2. Set environment variables or create `.env` file +## Environment Variables -3. Run the server: -```bash -python -m app.main +| Variable | Required | Default | Description | +|---|---|---|---| +| `GONKA_PRIVATE_KEY` | ✅ | — | Your ECDSA private key in hex format (with or without `0x` prefix) | +| `GONKA_ADDRESS` | ✅ | — | Your Gonka address in bech32 format (e.g. `gonka1abc...`) | +| `GONKA_ENDPOINT` | ✅ | — | Gonka API base URL (e.g. `https://host:port/v1`) | +| `GONKA_PROVIDER_ADDRESS` | ✅ | — | Provider's Gonka address in bech32 format — used for request signing | +| `API_KEY` | ✅ | — | Secret key clients must send in the `Authorization` header | +| `HOST` | ❌ | `0.0.0.0` | Server bind address | +| `PORT` | ❌ | `8000` | Server port | +| `GONKA_STREAM_READ_TIMEOUT` | ❌ | `300.0` | Max seconds to wait for streaming data from backend | + +### Configuration Details + +#### GONKA_PRIVATE_KEY +Your ECDSA private key in hex format. Used to sign every request to the Gonka backend. +Example: `a1b2c3d4e5f6...` or `0xa1b2c3d4e5f6...` + +#### GONKA_ADDRESS +Your address in the Gonka network (bech32 format). Sent as the `X-Requester-Address` header. +Example: `gonka1qyqszqgpqyqszqgpqyqszqgp...` + +#### GONKA_ENDPOINT +The Gonka inference API endpoint. Must include the `/v1` path segment. +Example: `https://my-gonka-node.example.com/v1` + +#### GONKA_PROVIDER_ADDRESS +The **provider's** Gonka address (bech32 format). This is included in the cryptographic signature of every request and must match what the provider expects. Obtain this from your Gonka provider's documentation or contact page. +Example: `gonka1provideraddress...` + +#### API_KEY +The bearer token clients must include in requests to the proxy. +Example: `sk-my-secret-key-123` + +Clients send it as: +``` +Authorization: Bearer sk-my-secret-key-123 ``` -Or with uvicorn directly: +### Example `.env` file + ```bash -uvicorn app.main:app --host 0.0.0.0 --port 8000 +GONKA_PRIVATE_KEY=0xaabbccddeeff... +GONKA_ADDRESS=gonka1youraddress... +GONKA_ENDPOINT=https://my-gonka-node.example.com/v1 +GONKA_PROVIDER_ADDRESS=gonka1provideraddress... +API_KEY=sk-my-secret-api-key ``` ## Usage ### Web Interface -Access the web interface at `http://localhost:8000/` to test the API interactively. +Open `http://localhost:8000/` to access the built-in chat interface. -### Using OpenAI Python SDK +### OpenAI Python SDK ```python from openai import OpenAI @@ -114,20 +151,6 @@ response = client.chat.completions.create( print(response.choices[0].message.content) ``` -### Using curl - -```bash -curl http://localhost:8000/v1/chat/completions \ - -H "Content-Type: application/json" \ - -H "Authorization: Bearer sk-your-secret-key" \ - -d '{ - "model": "gonka-model", - "messages": [ - {"role": "user", "content": "Hello!"} - ] - }' -``` - ### Streaming ```python @@ -140,9 +163,7 @@ client = OpenAI( stream = client.chat.completions.create( model="gonka-model", - messages=[ - {"role": "user", "content": "Tell me a story"} - ], + messages=[{"role": "user", "content": "Tell me a story"}], stream=True ) @@ -151,27 +172,41 @@ for chunk in stream: print(chunk.choices[0].delta.content, end="") ``` +### curl + +```bash +curl http://localhost:8000/v1/chat/completions \ + -H "Content-Type: application/json" \ + -H "Authorization: Bearer sk-your-secret-key" \ + -d '{ + "model": "gonka-model", + "messages": [{"role": "user", "content": "Hello!"}] + }' +``` + ## API Endpoints -- `POST /v1/chat/completions` - Chat completions (OpenAI-compatible) -- `GET /v1/models` - List available models -- `GET /api/models` - Get available models (no auth, for web interface) -- `GET /health` - Health check endpoint (no auth required) -- `GET /` - Web chat interface +| Endpoint | Auth | Description | +|---|---|---| +| `POST /v1/chat/completions` | ✅ | Chat completions (OpenAI-compatible) | +| `GET /v1/models` | ✅ | List available models (OpenAI-compatible) | +| `GET /api/models` | ❌ | Get available models (for web interface) | +| `GET /health` | ❌ | Health check (includes circuit breaker state) | +| `GET /` | ❌ | Web chat interface | -## Authentication +## Architecture -All endpoints except `/health`, `/api/models`, and `/` require authentication using the `Authorization` header: +### Tool Emulation -``` -Authorization: Bearer sk-your-secret-key -``` +If the Gonka model doesn't support native tool calling (`tools` + `tool_choice`), the proxy automatically converts tool definitions into a system prompt and parses tool call JSON from the model's text response. This is transparent to the client — it still receives standard OpenAI-format `tool_calls` in the response. -Or simply: +### Circuit Breaker -``` -Authorization: sk-your-secret-key -``` +Wraps non-streaming Gonka backend calls. After 5 consecutive failures, the circuit opens and requests are rejected immediately with a `503` error for 60 seconds, then transitions to half-open to test recovery. + +### Retry + +Non-streaming requests are retried up to 2 times with exponential backoff on transient errors. ## License diff --git a/app/circuit_breaker.py b/app/circuit_breaker.py new file mode 100644 index 0000000..3bdb261 --- /dev/null +++ b/app/circuit_breaker.py @@ -0,0 +1,181 @@ +""" +Circuit Breaker implementation for resilient backend communication +""" +import time +import logging +from enum import Enum +from typing import Callable, Any, Optional +from functools import wraps + +logger = logging.getLogger(__name__) + + +class CircuitState(Enum): + """Circuit breaker states""" + CLOSED = "closed" # Normal operation - requests pass through + OPEN = "open" # Failing - reject requests immediately + HALF_OPEN = "half_open" # Testing - allow limited requests to test recovery + + +class CircuitBreakerOpenError(Exception): + """Raised when circuit breaker is OPEN""" + pass + + +class CircuitBreaker: + """ + Circuit breaker pattern implementation + + Prevents cascading failures by stopping requests to failing services + and allowing them to recover. + + States: + - CLOSED: Normal operation, requests pass through + - OPEN: Service is failing, reject requests immediately + - HALF_OPEN: Testing if service recovered, allow limited requests + """ + + def __init__( + self, + name: str, + failure_threshold: int = 5, + recovery_timeout: float = 60.0, + success_threshold: int = 2, + expected_exception: type = Exception + ): + """ + Initialize circuit breaker + + Args: + name: Circuit breaker name (for logging) + failure_threshold: Number of failures before opening circuit + recovery_timeout: Seconds to wait before trying half-open + success_threshold: Number of successes in half-open to close circuit + expected_exception: Exception type that triggers failure + """ + self.name = name + self.failure_threshold = failure_threshold + self.recovery_timeout = recovery_timeout + self.success_threshold = success_threshold + self.expected_exception = expected_exception + + self.failure_count = 0 + self.success_count = 0 + self.last_failure_time: Optional[float] = None + self.state = CircuitState.CLOSED + + async def call(self, func: Callable, *args, **kwargs) -> Any: + """ + Execute function with circuit breaker protection + + Args: + func: Async function to execute + *args, **kwargs: Arguments for function + + Returns: + Function result + + Raises: + CircuitBreakerOpenError: If circuit is OPEN + Exception: Original exception from function + """ + # Check if we should transition from OPEN to HALF_OPEN + if self.state == CircuitState.OPEN: + if self.last_failure_time and \ + time.time() - self.last_failure_time > self.recovery_timeout: + logger.info(f"Circuit breaker {self.name}: OPEN -> HALF_OPEN (testing recovery)") + self.state = CircuitState.HALF_OPEN + self.success_count = 0 + else: + raise CircuitBreakerOpenError( + f"Circuit breaker {self.name} is OPEN. " + f"Last failure: {self.last_failure_time}" + ) + + # Execute function + try: + result = await func(*args, **kwargs) + self._on_success() + return result + except self.expected_exception as e: + self._on_failure() + raise + + def _on_success(self): + """Handle successful request""" + if self.state == CircuitState.HALF_OPEN: + self.success_count += 1 + if self.success_count >= self.success_threshold: + logger.info(f"Circuit breaker {self.name}: HALF_OPEN -> CLOSED (recovered)") + self.state = CircuitState.CLOSED + self.failure_count = 0 + self.success_count = 0 + elif self.state == CircuitState.CLOSED: + # Reset failure count on success (gradual recovery) + if self.failure_count > 0: + self.failure_count = max(0, self.failure_count - 1) + + def _on_failure(self): + """Handle failed request""" + self.failure_count += 1 + self.last_failure_time = time.time() + + if self.state == CircuitState.HALF_OPEN: + # Failure in half-open -> back to OPEN + logger.warning(f"Circuit breaker {self.name}: HALF_OPEN -> OPEN (still failing)") + self.state = CircuitState.OPEN + self.success_count = 0 + elif self.state == CircuitState.CLOSED: + if self.failure_count >= self.failure_threshold: + logger.warning( + f"Circuit breaker {self.name}: CLOSED -> OPEN " + f"({self.failure_count} failures >= {self.failure_threshold})" + ) + self.state = CircuitState.OPEN + + def get_state(self) -> dict: + """Get current circuit breaker state""" + return { + "name": self.name, + "state": self.state.value, + "failure_count": self.failure_count, + "last_failure_time": self.last_failure_time, + "failure_threshold": self.failure_threshold, + "recovery_timeout": self.recovery_timeout + } + + def reset(self): + """Manually reset circuit breaker to CLOSED state""" + logger.info(f"Circuit breaker {self.name}: Manual reset") + self.state = CircuitState.CLOSED + self.failure_count = 0 + self.success_count = 0 + self.last_failure_time = None + + +def circuit_breaker_decorator( + name: str, + failure_threshold: int = 5, + recovery_timeout: float = 60.0 +): + """ + Decorator for circuit breaker pattern + + Usage: + @circuit_breaker_decorator("gonka_api", failure_threshold=5) + async def call_gonka_api(): + ... + """ + breaker = CircuitBreaker( + name=name, + failure_threshold=failure_threshold, + recovery_timeout=recovery_timeout + ) + + def decorator(func: Callable): + @wraps(func) + async def wrapper(*args, **kwargs): + return await breaker.call(func, *args, **kwargs) + wrapper.breaker = breaker # Attach breaker for access + return wrapper + return decorator diff --git a/app/gonka_client.py b/app/gonka_client.py index 212b3e7..49c525c 100644 --- a/app/gonka_client.py +++ b/app/gonka_client.py @@ -11,34 +11,49 @@ logger = logging.getLogger(__name__) +def encode_with_low_s(r: int, s: int, order: int) -> bytes: + """Encode ECDSA signature with low-S normalization""" + # Normalize s to low-S + if s > order // 2: + s = order - s + + # Convert to bytes (32 bytes each for r and s) + r_bytes = r.to_bytes(32, 'big') + s_bytes = s.to_bytes(32, 'big') + + return r_bytes + s_bytes + + class GonkaClient: """Client for making signed requests to Gonka API""" - + def __init__( self, private_key: str, address: str, endpoint: str, provider_address: str, - timeout: float = 60.0 + timeout: float = 60.0, + stream_read_timeout: float = 300.0, ): self.private_key = private_key self.address = address self.endpoint = endpoint.rstrip('/') self.provider_address = provider_address self.timeout = timeout - + self.stream_read_timeout = stream_read_timeout + # Initialize hybrid timestamp tracking self._wall_base = time.time_ns() self._perf_base = time.perf_counter_ns() - - # HTTP client + + # HTTP client: default timeout for non-streaming; streaming uses per-request timeout self.client = httpx.AsyncClient(timeout=timeout) - + def _hybrid_timestamp_ns(self) -> int: """Generate hybrid timestamp (monotonic + aligned to wall clock)""" return self._wall_base + (time.perf_counter_ns() - self._perf_base) - + def _sign_payload( self, payload_bytes: bytes, @@ -48,41 +63,45 @@ def _sign_payload( """Sign payload using ECDSA with SHA-256""" # Remove 0x prefix if present pk = self.private_key[2:] if self.private_key.startswith('0x') else self.private_key - sk = SigningKey.from_string(bytes.fromhex(pk), curve=SECP256k1) - - # Message bytes: payload || timestamp || provider_address - msg = payload_bytes + str(timestamp_ns).encode('utf-8') + provider_address.encode('utf-8') - - # Deterministic ECDSA over SHA-256 with low-S normalization - sig = sk.sign_deterministic(msg, hashfunc=hashlib.sha256) - r, s = sig[:32], sig[32:] - - order = SECP256k1.order - s_int = int.from_bytes(s, 'big') - if s_int > order // 2: - s_int = order - s_int - s = s_int.to_bytes(32, 'big') - - return base64.b64encode(r + s).decode('utf-8') - + signing_key = SigningKey.from_string(bytes.fromhex(pk), curve=SECP256k1) + + # Phase 3: Sign hash of payload instead of raw payload + payload_hash = hashlib.sha256(payload_bytes).hexdigest() + + # Build signature input: hash + timestamp + transfer_address + signature_input = payload_hash + signature_input += str(timestamp_ns) + signature_input += provider_address + + signature_bytes = signature_input.encode('utf-8') + + # Sign the message with deterministic ECDSA using low-S normalization + signature = signing_key.sign_deterministic( + signature_bytes, + hashfunc=hashlib.sha256, + sigencode=lambda r, s, order: encode_with_low_s(r, s, order) + ) + + return base64.b64encode(signature).decode('utf-8') + def _prepare_request(self, payload: Optional[dict]) -> Tuple[bytes, dict]: """Prepare request data (payload bytes, headers with signature)""" if payload is None: payload = {} - + payload_bytes = json.dumps(payload).encode('utf-8') timestamp_ns = self._hybrid_timestamp_ns() signature = self._sign_payload(payload_bytes, timestamp_ns, self.provider_address) - + headers = { "Content-Type": "application/json", "Authorization": signature, "X-Requester-Address": self.address, "X-Timestamp": str(timestamp_ns), } - + return payload_bytes, headers - + async def get_models(self) -> list: """Get available models from Gonka API""" try: @@ -94,7 +113,7 @@ async def get_models(self) -> list: except Exception as e: logger.warning(f"Failed to load models from Gonka API: {e}") return [] - + async def request( self, method: str, @@ -104,15 +123,8 @@ async def request( """Make a signed request to Gonka API (non-streaming)""" url = f"{self.endpoint}{path}" payload_bytes, headers = self._prepare_request(payload) - - # Log request body before sending - try: - request_body = json.loads(payload_bytes.decode('utf-8')) - logger.info(f"Gonka API Request: {method} {url}") - logger.info(f"Request body: {json.dumps(request_body, indent=2, ensure_ascii=False)}") - except Exception as e: - logger.warning(f"Failed to log request body: {e}") - + + logger.info(f"Gonka API Request: {method} {path}") try: response = await self.client.request( method, @@ -123,18 +135,12 @@ async def request( response.raise_for_status() return response.json() except httpx.HTTPStatusError as e: - # Log error response - try: - error_body = e.response.text - logger.error(f"Gonka API Error Response: {e.response.status_code}") - logger.error(f"Error response body: {error_body}") - except Exception: - logger.error(f"Gonka API Error Response: {e.response.status_code} (failed to read body)") + logger.error(f"Gonka API Error Response: {e.response.status_code}") raise except Exception as e: logger.error(f"Gonka API Request failed: {type(e).__name__}: {str(e)}") raise - + async def request_stream( self, method: str, @@ -144,49 +150,71 @@ async def request_stream( """Make a signed streaming request to Gonka API""" url = f"{self.endpoint}{path}" payload_bytes, headers = self._prepare_request(payload) - - # Log request body before sending - try: - request_body = json.loads(payload_bytes.decode('utf-8')) - logger.info(f"Gonka API Stream Request: {method} {url}") - logger.info(f"Request body: {json.dumps(request_body, indent=2, ensure_ascii=False)}") - except Exception as e: - logger.warning(f"Failed to log request body: {e}") - + + logger.info(f"Gonka API Stream Request: {method} {path}") try: + # Use longer read timeout for streaming so long generations don't get cut off + stream_timeout = httpx.Timeout(self.timeout, read=self.stream_read_timeout) async with self.client.stream( method, url, headers=headers, - content=payload_bytes + content=payload_bytes, + timeout=stream_timeout, ) as response: if response.status_code >= 400: - # Read error response body try: error_body = await response.aread() error_text = error_body.decode('utf-8', errors='replace') logger.error(f"Gonka API Stream Error Response: {response.status_code}") logger.error(f"Error response body: {error_text}") except Exception as read_err: - logger.error(f"Gonka API Stream Error Response: {response.status_code} (failed to read body: {read_err})") + logger.error( + f"Gonka API Stream Error Response: {response.status_code} " + f"(failed to read body: {read_err})" + ) response.raise_for_status() - - async for chunk in response.aiter_bytes(): - yield chunk + + total_bytes = 0 + chunk_count = 0 + completed_normally = False + try: + async for chunk in response.aiter_bytes(): + total_bytes += len(chunk) + chunk_count += 1 + yield chunk + completed_normally = True + logger.info( + f"Gonka API Stream completed: {method} {path} " + f"(chunks={chunk_count}, bytes={total_bytes})" + ) + finally: + if not completed_normally: + logger.info( + f"Gonka API Stream ended without full completion: {method} {path} " + f"(chunks={chunk_count}, bytes={total_bytes}) — client disconnect or stream closed" + ) except httpx.HTTPStatusError as e: - # Log error response (fallback for non-stream errors) - try: - error_body = e.response.text - logger.error(f"Gonka API Stream Error Response: {e.response.status_code}") - logger.error(f"Error response body: {error_body}") - except Exception: - logger.error(f"Gonka API Stream Error Response: {e.response.status_code} (failed to read body)") + logger.error(f"Gonka API Stream Error Response: {e.response.status_code}") + raise + except httpx.ReadTimeout: + logger.error( + "Gonka API Stream read timeout: backend did not send data within " + "stream_read_timeout; stream ended abruptly" + ) + raise + except httpx.ConnectError as e: + logger.error(f"Gonka API Stream connection error: {e}") + raise + except httpx.WriteTimeout: + logger.error("Gonka API Stream write timeout: request body send timed out") raise except Exception as e: - logger.error(f"Gonka API Stream Request failed: {type(e).__name__}: {str(e)}") + logger.error( + f"Gonka API Stream failed unexpectedly: {type(e).__name__}: {str(e)}" + ) raise - + async def close(self): """Close the HTTP client""" await self.client.aclose() - diff --git a/app/main.py b/app/main.py index 4a55df2..0070d0c 100644 --- a/app/main.py +++ b/app/main.py @@ -10,6 +10,13 @@ from app.gonka_client import GonkaClient from app.auth import verify_api_key +from app.tool_emulation import ( + emulate_tool_choice_auto, + process_response_with_tool_emulation, + process_stream_with_tool_emulation, +) +from app.circuit_breaker import CircuitBreaker, CircuitBreakerOpenError +from app.retry import retry_with_backoff, BACKEND_RETRY_CONFIG # Configure logging @@ -28,14 +35,17 @@ class Settings(BaseSettings): gonka_address: str = "" gonka_endpoint: str = "" gonka_provider_address: str = "" - + # API Key for external access api_key: str = "" - + # Server settings host: str = "0.0.0.0" port: int = 8000 - + + # Streaming read timeout (seconds); increase for slow/long generations + gonka_stream_read_timeout: float = 300.0 + class Config: env_file = ".env" case_sensitive = False @@ -48,16 +58,21 @@ class Config: gonka_client: Optional[GonkaClient] = None available_models: List[Dict] = [] +# Circuit breaker for Gonka backend +gonka_circuit_breaker = CircuitBreaker( + name="gonka_backend", + failure_threshold=5, + recovery_timeout=60.0, + success_threshold=2, +) + @asynccontextmanager async def lifespan(app: FastAPI): """Lifespan context manager for startup and shutdown events""" - # Startup global gonka_client, available_models - - # Initialize client and load models + try: - # Check if configuration is complete before loading models client = _create_gonka_client() if client: models = await client.get_models() @@ -69,10 +84,9 @@ async def lifespan(app: FastAPI): except Exception as e: logger.error(f"Failed to load models at startup: {e}") available_models = [] - + yield - - # Shutdown + if gonka_client: await gonka_client.close() @@ -92,7 +106,8 @@ def _create_gonka_client() -> Optional[GonkaClient]: private_key=settings.gonka_private_key, address=settings.gonka_address, endpoint=settings.gonka_endpoint, - provider_address=settings.gonka_provider_address + provider_address=settings.gonka_provider_address, + stream_read_timeout=settings.gonka_stream_read_timeout, ) return gonka_client @@ -109,8 +124,8 @@ def get_gonka_client() -> GonkaClient: if not settings.gonka_endpoint: missing.append("GONKA_ENDPOINT") if not settings.gonka_provider_address: - missing.append("GONKA_PROVIDER_ADDRESS (provider address in bech32 format, get it from the Gonka provider)") - + missing.append("GONKA_PROVIDER_ADDRESS") + raise HTTPException( status_code=500, detail=f"Gonka configuration incomplete. Missing: {', '.join(missing)}. " @@ -142,19 +157,17 @@ def get_gonka_client() -> GonkaClient: async def list_models(request: Request, api_key_valid: bool = Depends(verify_api_key)): """List available models (OpenAI-compatible endpoint)""" global available_models - - # Convert Gonka models format to OpenAI format + models_data = [] for model in available_models: model_id = model.get("id", "unknown") models_data.append({ "id": model_id, "object": "model", - "created": 1677610602, # Default timestamp + "created": 1677610602, "owned_by": "gonka" }) - - # If no models loaded, return default + if not models_data: models_data = [{ "id": "gonka-model", @@ -162,21 +175,19 @@ async def list_models(request: Request, api_key_valid: bool = Depends(verify_api "created": 1677610602, "owned_by": "gonka" }] - + return { "object": "list", "data": models_data } + # Models endpoint without auth (for web interface) @app.get("/api/models") async def get_models_no_auth(): """Get available models without authentication (for web interface)""" global available_models - - return { - "models": available_models - } + return {"models": available_models} # Chat completions endpoint @@ -187,34 +198,47 @@ async def chat_completions( ): """Chat completions endpoint (OpenAI-compatible)""" client = get_gonka_client() - + try: body = await request.json() - # Log incoming request body logger.info("Incoming chat completions request") - logger.info(f"Request body: {json.dumps(body, indent=2, ensure_ascii=False)}") except Exception as e: logger.error(f"Failed to parse request JSON: {e}") raise HTTPException(status_code=400, detail=f"Invalid JSON: {str(e)}") - + stream = body.get("stream", False) - + original_tools = body.get("tools") + + # Apply tool emulation if model doesn't support native tool calling + # (emulate_tool_choice_auto is a no-op when no tools are present) + emulated_body = emulate_tool_choice_auto(body) + tools_were_emulated = original_tools and "tools" not in emulated_body + try: if stream: - # Streaming response - proxy SSE from Gonka async def generate(): try: - async for chunk in client.request_stream( + raw_stream = client.request_stream( method="POST", path="/chat/completions", - payload=body - ): - # Yield chunk as-is (Gonka should return SSE format) - yield chunk + payload=emulated_body, + ) + if tools_were_emulated: + async for chunk in process_stream_with_tool_emulation( + raw_stream, original_tools + ): + yield chunk + else: + async for chunk in raw_stream: + yield chunk + except CircuitBreakerOpenError as e: + logger.error(f"Circuit breaker open: {e}") + error_payload = json.dumps({"error": {"message": str(e), "type": "service_unavailable"}}) + yield f"data: {error_payload}\n\ndata: [DONE]\n\n".encode() except Exception as e: logger.error(f"Streaming error: {type(e).__name__}: {str(e)}") raise - + return StreamingResponse( generate(), media_type="text/event-stream", @@ -225,13 +249,31 @@ async def generate(): } ) else: - # Non-streaming response - response = await client.request( - method="POST", - path="/chat/completions", - payload=body + # Non-streaming: wrap with circuit breaker + retry + async def do_request(): + return await gonka_circuit_breaker.call( + client.request, + method="POST", + path="/chat/completions", + payload=emulated_body, + ) + + response = await retry_with_backoff( + do_request, + max_retries=BACKEND_RETRY_CONFIG.max_retries, + initial_delay=BACKEND_RETRY_CONFIG.initial_delay, + max_delay=BACKEND_RETRY_CONFIG.max_delay, + exceptions=BACKEND_RETRY_CONFIG.exceptions, ) + + if tools_were_emulated: + response = process_response_with_tool_emulation(response, original_tools) + return response + + except CircuitBreakerOpenError as e: + logger.error(f"Circuit breaker open: {e}") + raise HTTPException(status_code=503, detail=f"Backend temporarily unavailable: {str(e)}") except Exception as e: logger.error(f"Chat completions error: {type(e).__name__}: {str(e)}") raise HTTPException( @@ -246,11 +288,16 @@ async def web_interface(): """Serve web chat interface""" return FileResponse("app/static/index.html") + # Health check endpoint (no auth required) @app.get("/health") async def health(): """Health check endpoint""" - return {"status": "ok"} + return { + "status": "ok", + "circuit_breaker": gonka_circuit_breaker.get_state(), + } + # Serve static files (must be last) app.mount("/static", StaticFiles(directory="app/static"), name="static") @@ -264,4 +311,3 @@ async def health(): port=settings.port, reload=False ) - diff --git a/app/retry.py b/app/retry.py new file mode 100644 index 0000000..ed5934a --- /dev/null +++ b/app/retry.py @@ -0,0 +1,173 @@ +""" +Retry utilities with exponential backoff +""" +import asyncio +import random +import logging +from typing import Callable, TypeVar, Optional, Tuple, Any + +logger = logging.getLogger(__name__) + +T = TypeVar('T') + + +async def retry_with_backoff( + func: Callable[[], T], + max_retries: int = 3, + initial_delay: float = 1.0, + max_delay: float = 60.0, + exponential_base: float = 2.0, + jitter: bool = True, + exceptions: Tuple[type, ...] = (Exception,), + on_retry: Optional[Callable[[int, Exception], None]] = None +) -> T: + """ + Retry function with exponential backoff + + Args: + func: Async function to retry (no arguments) + max_retries: Maximum number of retry attempts + initial_delay: Initial delay in seconds + max_delay: Maximum delay in seconds + exponential_base: Base for exponential backoff + jitter: Add random jitter to prevent thundering herd + exceptions: Tuple of exceptions to catch and retry + on_retry: Optional callback called on each retry (attempt, exception) + + Returns: + Function result + + Raises: + Last exception if all retries fail + """ + delay = initial_delay + last_exception = None + + for attempt in range(max_retries + 1): + try: + return await func() + except exceptions as e: + last_exception = e + + if attempt == max_retries: + logger.error( + f"Retry exhausted after {max_retries} attempts. " + f"Last error: {type(e).__name__}: {e}" + ) + raise + + # Calculate delay with exponential backoff + if jitter: + # Add random jitter: delay * base + random(0, 1) + delay = min( + delay * exponential_base + random.uniform(0, 1), + max_delay + ) + else: + delay = min(delay * exponential_base, max_delay) + + logger.warning( + f"Attempt {attempt + 1}/{max_retries} failed: {type(e).__name__}: {e}. " + f"Retrying in {delay:.2f}s" + ) + + if on_retry: + try: + on_retry(attempt + 1, e) + except Exception as callback_error: + logger.warning(f"Retry callback error: {callback_error}") + + await asyncio.sleep(delay) + + # Should never reach here, but just in case + if last_exception: + raise last_exception + raise RuntimeError("Retry failed without exception") + + +async def retry_with_backoff_args( + func: Callable[..., T], + *args, + max_retries: int = 3, + initial_delay: float = 1.0, + max_delay: float = 60.0, + exponential_base: float = 2.0, + jitter: bool = True, + exceptions: Tuple[type, ...] = (Exception,), + **kwargs +) -> T: + """ + Retry function with exponential backoff (supports arguments) + + Args: + func: Async function to retry + *args: Positional arguments for function + max_retries: Maximum number of retry attempts + initial_delay: Initial delay in seconds + max_delay: Maximum delay in seconds + exponential_base: Base for exponential backoff + jitter: Add random jitter + exceptions: Tuple of exceptions to catch and retry + **kwargs: Keyword arguments for function + + Returns: + Function result + + Raises: + Last exception if all retries fail + """ + async def wrapper(): + return await func(*args, **kwargs) + + return await retry_with_backoff( + wrapper, + max_retries=max_retries, + initial_delay=initial_delay, + max_delay=max_delay, + exponential_base=exponential_base, + jitter=jitter, + exceptions=exceptions + ) + + +class RetryConfig: + """Configuration for retry behavior""" + + def __init__( + self, + max_retries: int = 3, + initial_delay: float = 1.0, + max_delay: float = 60.0, + exponential_base: float = 2.0, + jitter: bool = True, + exceptions: Tuple[type, ...] = (Exception,) + ): + self.max_retries = max_retries + self.initial_delay = initial_delay + self.max_delay = max_delay + self.exponential_base = exponential_base + self.jitter = jitter + self.exceptions = exceptions + + +# Predefined retry configurations +HTTP_RETRY_CONFIG = RetryConfig( + max_retries=3, + initial_delay=1.0, + max_delay=30.0, + exceptions=(Exception,) # Catch all for HTTP errors +) + +DATABASE_RETRY_CONFIG = RetryConfig( + max_retries=3, + initial_delay=0.5, + max_delay=10.0, + exceptions=(Exception,) +) + +BACKEND_RETRY_CONFIG = RetryConfig( + max_retries=2, # Fewer retries for backend calls (circuit breaker handles failures) + initial_delay=1.0, + max_delay=20.0, + exceptions=(Exception,) +) diff --git a/app/static/index.html b/app/static/index.html index 78cc77f..55cacf8 100644 --- a/app/static/index.html +++ b/app/static/index.html @@ -3,7 +3,23 @@ - Gonka Chat - API Testing + Gonka AI Gateway - OpenAI-compatible LLM inference API with token-based billing + + + + + + + + - - -
-
-

🤖 Gonka Chat

-

OpenAI-compatible API proxy testing

-
- -
-
- - -
-
- - -
-
- - -
-
- -
-
- - - -

Start a conversation by sending a message

-
-
-
-
- + h4 { + font-size: 16px; + font-weight: 600; + color: var(--text-primary); + margin-top: 20px; + margin-bottom: 10px; + } + + /* Documentation styles */ + .docs-section { + margin-bottom: 40px; + } + + .docs-section h3 { + padding-bottom: 10px; + border-bottom: 2px solid var(--primary-color); + } + + .docs-code-block { + background: #f8f9fa; + border: 1px solid var(--divider); + border-radius: 8px; + padding: 20px; + margin: 15px 0; + overflow-x: auto; + } + + .docs-code-block code { + font-family: 'Courier New', monospace; + font-size: 14px; + line-height: 1.6; + color: var(--text-primary); + } + + .docs-code-block pre { + margin: 0; + white-space: pre-wrap; + word-break: break-word; + overflow-x: auto; + } + + .docs-code-block pre code { + background: none; + padding: 0; + } + + .docs-code-block-with-copy { + position: relative; + padding-top: 44px; + } + + .docs-code-copy-btn { + position: absolute; + top: 10px; + right: 10px; + padding: 6px 14px; + font-size: 13px; + font-weight: 500; + color: var(--primary-color); + background: var(--surface); + border: 1px solid var(--divider); + border-radius: 6px; + cursor: pointer; + transition: background 0.2s, color 0.2s; + } + + .docs-code-copy-btn:hover { + background: rgba(25, 118, 210, 0.08); + color: var(--primary-dark); + } + + .docs-code-copy-btn.copied { + color: var(--success); + border-color: var(--success); + } + + .docs-endpoint { + background: #e7f3ff; + padding: 15px; + border-radius: 8px; + margin: 15px 0; + border-left: 4px solid var(--primary-color); + } + + .docs-method { + display: inline-block; + padding: 4px 8px; + border-radius: 4px; + font-weight: 600; + font-size: 12px; + margin-right: 10px; + } + + .docs-method-get { + background: #28a745; + color: white; + } + + .docs-method-post { + background: #007bff; + color: white; + } + + .docs-note { + background: #fff3cd; + border: 1px solid #ffc107; + border-radius: 8px; + padding: 15px; + margin: 15px 0; + } + + .docs-warning { + background: #f8d7da; + border: 1px solid #dc3545; + border-radius: 8px; + padding: 15px; + margin: 15px 0; + } + + .docs-section ol, + .docs-section ul { + margin-left: 20px; + margin-top: 10px; + } + + .docs-section li { + margin-bottom: 8px; + } + + .docs-section code { + background: rgba(0, 0, 0, 0.05); + padding: 2px 6px; + border-radius: 4px; + font-family: 'Courier New', monospace; + font-size: 13px; + } + + /* Model selector */ + #chatModel { + padding: 10px 16px; + border: 1px solid var(--divider); + border-radius: 8px; + font-size: 14px; + background: var(--surface); + color: var(--text-primary); + cursor: pointer; + transition: all 0.2s ease; + } + + #chatModel:focus { + outline: none; + border-color: var(--primary-color); + box-shadow: 0 0 0 3px rgba(25, 118, 210, 0.1); + } + + /* Scrollbar styling */ + .chat-messages::-webkit-scrollbar, + .transactions-list::-webkit-scrollbar, + .api-keys-list::-webkit-scrollbar { + width: 8px; + } + + .chat-messages::-webkit-scrollbar-track, + .transactions-list::-webkit-scrollbar-track, + .api-keys-list::-webkit-scrollbar-track { + background: transparent; + } + + .chat-messages::-webkit-scrollbar-thumb, + .transactions-list::-webkit-scrollbar-thumb, + .api-keys-list::-webkit-scrollbar-thumb { + background: var(--divider); + border-radius: 4px; + } + + .chat-messages::-webkit-scrollbar-thumb:hover, + .transactions-list::-webkit-scrollbar-thumb:hover, + .api-keys-list::-webkit-scrollbar-thumb:hover { + background: var(--text-secondary); + } + + /* Responsive design */ + @media (max-width: 768px) { + .main-content { + grid-template-columns: 1fr; + } + + .sidebar { + order: 2; + } + + .content-area { + order: 1; + } + + .header { + flex-direction: column; + gap: 16px; + align-items: flex-start; + } + + .user-info { + width: 100%; + flex-wrap: wrap; + } + + .stats-grid { + grid-template-columns: 1fr; + } + } + + /* Loading states */ + .loading { + opacity: 0.6; + pointer-events: none; + } + + /* Smooth transitions */ + * { + transition: background-color 0.2s ease, border-color 0.2s ease, color 0.2s ease; + } + + /* Footer */ + .footer { + margin-top: 48px; + padding: 32px 0; + border-top: 1px solid var(--divider); + text-align: center; + } + + .footer-content { + max-width: 1200px; + margin: 0 auto; + padding: 0 24px; + } + + .footer-links { + display: flex; + justify-content: center; + gap: 24px; + flex-wrap: wrap; + margin-bottom: 16px; + } + + .footer-links a { + color: var(--text-secondary); + text-decoration: none; + font-size: 14px; + transition: color 0.2s ease; + display: flex; + align-items: center; + gap: 6px; + } + + .footer-links a:hover { + color: var(--primary-color); + } + + .footer-links a svg { + width: 16px; + height: 16px; + } + + .footer-text { + color: var(--text-secondary); + font-size: 12px; + margin-top: 16px; + } + + /* Deposit confirmation modal */ + #depositConfirmationModal { + position: fixed; + top: 0; + left: 0; + width: 100%; + height: 100%; + background: rgba(0, 0, 0, 0.5); + z-index: 10000; + display: flex; + align-items: center; + justify-content: center; + backdrop-filter: blur(4px); + } + + #depositConfirmationModal > div { + background: var(--surface); + padding: 32px; + border-radius: 12px; + max-width: 500px; + width: 90%; + box-shadow: var(--shadow-lg); + border: 1px solid var(--divider); + } + + #depositConfirmationModal h2 { + margin-bottom: 20px; + color: var(--text-primary); + } + + #depositConfirmationModal code { + display: block; + padding: 12px; + background: var(--background); + border-radius: 8px; + margin-bottom: 16px; + word-break: break-all; + font-family: 'Monaco', 'Menlo', 'Ubuntu Mono', monospace; + font-size: 13px; + border: 1px solid var(--divider); + } + + #depositConfirmationModal input { + width: 100%; + padding: 12px; + border: 1px solid var(--divider); + border-radius: 8px; + font-size: 14px; + background: var(--surface); + color: var(--text-primary); + transition: all 0.2s ease; + } + + #depositConfirmationModal input:focus { + outline: none; + border-color: var(--primary-color); + box-shadow: 0 0 0 3px rgba(25, 118, 210, 0.1); + } + + #depositConfirmationModal .modal-buttons { + display: flex; + gap: 12px; + margin-top: 20px; + } + + #depositConfirmationModal .modal-buttons button { + flex: 1; + } + + + +
+
+
+

Gonka Gonka AI Gateway

+

OpenAI-compatible LLM inference API with token-based billing

+

GNK: $4.00

+
+ +
+ +
+ + +
+ +
+

Dashboard

+
+
+
+

Balance

+
0 GNK
+
+
+

API Keys

+
0
+
+
+

Total Spent

+
0 ngonka
+
+
+

Total Inferences

+
0
+
+
+
+
+ + +
+

Chat

+
+ + +
+
+
+
+

Start a conversation

+
+
+
+ + +
+
+
+ + +
+

API Keys

+ +
+
+
+ + +
+

Transaction History

+
+
+ + +
+

📚 API Documentation

+

Complete guide to using Gonka AI Gateway API

+ +
+

Getting Started

+

Gonka AI Gateway provides an OpenAI-compatible API with Web3 authentication and token-based billing.

+
+ Note: You need to authorize and create an API key before using the API. +
+
+ +
+

Authentication

+

All API requests require authentication using an API key in the Authorization header:

+
+ Authorization: Bearer sk-your-api-key-here +
+

Getting Your API Key

+
    +
  1. Authorize on the dashboard
  2. +
  3. Navigate to "API Keys" section
  4. +
  5. Click "Create New API Key"
  6. +
  7. Save the key immediately - it won't be shown again!
  8. +
+
+ +
+

Base URL

+

The gateway is served over HTTPS (proxied via Traefik). All API requests should be made to:

+
+ http://localhost:8000/v1 +
+
+ +
+

API Endpoints

+
+ POST + /chat/completions +

Create a chat completion (OpenAI-compatible)

+
+

Request

+
+ POST /chat/completions
+Content-Type: application/json
+Authorization: Bearer sk-your-api-key

+{
+  "model": "Qwen/Qwen3-235B-A22B-Instruct-2507-FP8",
+  "messages": [
+    {"role": "user", "content": "Hello!"}
+  ],
+  "stream": false
+}
+
+

Response

+
+ {
+  "id": "chatcmpl-123",
+  "object": "chat.completion",
+  "created": 1677652288,
+  "choices": [{
+    "index": 0,
+    "message": {
+      "role": "assistant",
+      "content": "Hello! How can I help you?"
+    },
+    "finish_reason": "stop"
+  }],
+  "usage": {
+    "prompt_tokens": 9,
+    "completion_tokens": 12,
+    "total_tokens": 21
+  }
+}
+
+

Streaming

+

To enable streaming, set "stream": true in the request:

+
+ {
+  "model": "Qwen/Qwen3-235B-A22B-Instruct-2507-FP8",
+  "messages": [
+    {"role": "user", "content": "Tell me a story"}
+  ],
+  "stream": true
+}
+
+
+ GET + /models +

List available models

+
+

Request

+
+ GET /models
+Authorization: Bearer sk-your-api-key
+
+

Response

+
+ {
+  "object": "list",
+  "data": [
+    {
+      "id": "Qwen/Qwen3-235B-A22B-Instruct-2507-FP8",
+      "object": "model",
+      "created": 1677610602,
+      "owned_by": "gonka"
+    }
+  ]
+}
+
+
+ +
+

Using with OpenAI Python SDK

+

The API is fully compatible with the OpenAI Python SDK:

+
+ from openai import OpenAI

+client = OpenAI(
+  api_key="sk-your-api-key-here",
+  base_url="http://localhost:8000/v1"
+)

+response = client.chat.completions.create(
+  model="Qwen/Qwen3-235B-A22B-Instruct-2507-FP8",
+  messages=[
+    {"role": "user", "content": "Hello!"}
+  ]
+)

+print(response.choices[0].message.content)
+
+
+ +
+

Simple Python Example

+

Here's a simple example using Python's requests library:

+
+ import requests

+url = "http://localhost:8000/v1/chat/completions"
+headers = {
+  "Authorization": "Bearer sk-your-api-key-here",
+  "Content-Type": "application/json"
+}
+data = {
+  "model": "Qwen/Qwen3-235B-A22B-Instruct-2507-FP8",
+  "messages": [
+    {"role": "user", "content": "Hello! How are you?"}
+  ]
+}

+response = requests.post(url, headers=headers, json=data)
+result = response.json()
+print(result["choices"][0]["message"]["content"])
+
+
+ +
+

Using with curl

+
+ curl http://localhost:8000/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer sk-your-api-key" \
+  -d '{
+    "model": "Qwen/Qwen3-235B-A22B-Instruct-2507-FP8",
+    "messages": [
+      {"role": "user", "content": "Hello!"}
+    ]
+  }'
+
+
+ +
+

Using with n8n

+
+ n8n Workflow Example: Check out this ready-to-use workflow at https://n8n.io/workflows/12114 +
+
+ +
+

Token Billing

+

Each API request consumes GNK tokens from your balance. Tokens are calculated from the API response and balance is automatically deducted after each request.

+
+ Pricing Note: We are currently in a trial period on the Gonka network. During this time, inference prices are very low - just tiny fractions of GNK tokens. Final pricing will be announced after the trial period ends. +
+
+ Warning: If your balance is insufficient, the request will fail with a 402 Payment Required error. +
+

Checking Your Balance

+

You can check your balance on the dashboard or via the API:

+
+ GET /api/user/balance
+Authorization: Bearer sk-your-api-key
+
+
+ +
+

Rate Limits

+

Currently, there are no rate limits. However, please use the API responsibly.

+
+ +
+

Support

+

For issues or questions, please contact support or check the dashboard for your transaction history.

+
+
+ + +
+

Connect to OpenClaw

+

Use this guide to connect OpenClaw to the Gonka gateway: configure the provider on your OpenClaw node, then use the OpenClaw Telegram bot commands to switch to the Gonka model and check status.

+ +
+

1. Provider config (OpenClaw node)

+

Configure the Gonka provider on your OpenClaw node/gateway so the agent can use Gonka models. Edit openclaw.json or your models.json (wherever OpenClaw reads models.providers).

+

Replace:

+
    +
  • http://localhost:8000/v1 — with your gateway base URL if different.
  • +
  • sk-.......... — with your API key (from registration or dashboard).
  • +
+
+ +
{
+  "models": {
+    "providers": {
+      "gonka": {
+        "baseUrl": "http://localhost:8000/v1",
+        "apiKey": "sk-..........",
+        "auth": "api-key",
+        "api": "openai-completions",
+        "authHeader": true,
+        "models": [
+          {
+            "id": "Qwen/Qwen3-235B-A22B-Instruct-2507-FP8",
+            "name": "Qwen/Qwen3-235B-A22B-Instruct-2507-FP8",
+            "api": "openai-completions",
+            "reasoning": false,
+            "input": ["text"],
+            "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 },
+            "contextWindow": 200000,
+            "maxTokens": 8192
+          }
+        ]
+      }
+    }
+  }
+}
+
+

After editing: save the file and restart the OpenClaw gateway/node if it does not reload config automatically.

+
+ +
+

2. OpenClaw Telegram bot commands

+

These commands are used in the OpenClaw Telegram bot chat (not in the gateway config or API). They let you switch to the Gonka model and check that it is active.

+

/status

+

In the OpenClaw Telegram bot, send /status to see the current runtime state, including which model is in use.

+
+ +
🦞 OpenClaw 2026.2.15 (3fe22ea)
+🧠 Model: gonka/Qwen/Qwen3-235B-A22B-Instruct-2507-FP8 · 🔑 api-key sk-dvg…6GIAPE (models.json)
+🧮 Tokens: 18k in / 137 out
+📚 Context: 18k/200k (9%) · 🧹 Compactions: 0
+🧵 Session: agent:main:main • updated just now
+⚙️ Runtime: direct · Think: off · verbose
+🪢 Queue: collect (depth 0)
+
+

From this you can confirm that the Model line shows gonka/Qwen/Qwen3-235B-A22B-Instruct-2507-FP8 when the Gonka model is active.

+

/model <provider/model-id>

+

In the OpenClaw Telegram bot, send /model <provider/model-id> to switch the active model. To use the Gonka Qwen model, send:

+
+ +
/model gonka/Qwen/Qwen3-235B-A22B-Instruct-2507-FP8
+
+

Example: you send /model gonka/Qwen/Qwen3-235B-A22B-Instruct-2507-FP8, the agent replies Model set to gonka/Qwen/Qwen3-235B-A22B-Instruct-2507-FP8. Then send /status again to verify the Model line shows the Gonka model.

+
+ +
+

3. Summary

+ + + + + + + + + + + + + + + + + + + + + + + + + +
WhatWhereAction
Add Gonka providerOpenClaw node — openclaw.json or models configPut the models.providers.gonka block from section 1 into your config. Set baseUrl and apiKey.
Switch to Gonka modelOpenClaw Telegram botSend: /model gonka/Qwen/Qwen3-235B-A22B-Instruct-2507-FP8
Check current modelOpenClaw Telegram botSend: /status — look at the Model line in the reply.
+

So: config = on the node; /status and /model = in the OpenClaw Telegram bot chat.

+
+
-
+ + - diff --git a/app/tool_emulation.py b/app/tool_emulation.py new file mode 100644 index 0000000..8feeb8f --- /dev/null +++ b/app/tool_emulation.py @@ -0,0 +1,530 @@ +""" +Tool emulation module for models that don't support native tool calling. +Emulates tool_choice: "auto" by converting tools to prompts and parsing responses. +""" +import json +import re +import logging +import time +from typing import Dict, List, Optional, Any + +logger = logging.getLogger(__name__) + + +def format_tools_for_prompt(tools: List[Dict]) -> str: + """Convert tools list to human-readable prompt description""" + if not tools: + return "" + + descriptions = [] + for i, tool in enumerate(tools, 1): + func = tool.get("function", {}) + name = func.get("name", "") + desc = func.get("description", "") + params = func.get("parameters", {}) + + func_desc = f"{i}. Function: {name}" + if desc: + func_desc += f"\n Description: {desc}" + + if params: + props = params.get("properties", {}) + required = params.get("required", []) + if props: + func_desc += "\n Parameters:" + for param_name, param_info in props.items(): + param_type = param_info.get("type", "string") + param_desc = param_info.get("description", "") + is_required = param_name in required + req_marker = " (required)" if is_required else " (optional)" + func_desc += f"\n - {param_name} ({param_type}){req_marker}" + if param_desc: + func_desc += f": {param_desc}" + + descriptions.append(func_desc) + + return "\n\n".join(descriptions) + + +def create_tool_selection_prompt(tools: List[Dict]) -> str: + """Create system prompt for tool selection""" + tools_description = format_tools_for_prompt(tools) + + prompt = f"""You are a helpful assistant with access to the following functions: + +{tools_description} + +When the user's request requires calling a function, you MUST respond with a JSON object in this exact format: +{{ + "reasoning": "brief explanation of why you need to call a function", + "tool_calls": [ + {{ + "id": "call_abc123", + "type": "function", + "function": {{ + "name": "function_name", + "arguments": "{{\\"param1\\": \\"value1\\", \\"param2\\": \\"value2\\"}}" + }} + }} + ] +}} + +IMPORTANT RULES: +- The "arguments" field must be a valid JSON string (escaped) +- If multiple functions are needed, include all in the "tool_calls" array +- If no function is needed, respond normally with regular text (not JSON) +- Always use the exact function names from the list above +- Include all required parameters in the arguments + +Example for a single function call: +{{ + "reasoning": "User wants weather information", + "tool_calls": [ + {{ + "id": "call_123", + "type": "function", + "function": {{ + "name": "get_weather", + "arguments": "{{\\"location\\": \\"Moscow\\", \\"units\\": \\"celsius\\"}}" + }} + }} + ] +}}""" + + return prompt + + +def extract_json_from_text(text: str) -> Optional[Dict]: + """Extract JSON object from text response""" + if not text: + return None + + # Try to find JSON in code blocks first (most reliable) + code_block_pattern = r'```(?:json)?\s*(\{.*?\})\s*```' + matches = re.finditer(code_block_pattern, text, re.DOTALL) + for match in matches: + try: + json_str = match.group(1).strip() + parsed = json.loads(json_str) + if "tool_calls" in parsed: + return parsed + except (json.JSONDecodeError, AttributeError): + continue + + # Try to find JSON object with tool_calls - look for opening brace before tool_calls + # Find all positions of "tool_calls" + tool_calls_positions = [m.start() for m in re.finditer(r'"tool_calls"', text)] + + for pos in tool_calls_positions: + # Find the opening brace before this position + start_pos = text.rfind('{', 0, pos) + if start_pos == -1: + continue + + # Find the matching closing brace + brace_count = 0 + end_pos = start_pos + for i in range(start_pos, len(text)): + if text[i] == '{': + brace_count += 1 + elif text[i] == '}': + brace_count -= 1 + if brace_count == 0: + end_pos = i + 1 + break + + if end_pos > start_pos: + try: + json_str = text[start_pos:end_pos] + parsed = json.loads(json_str) + if "tool_calls" in parsed: + return parsed + except json.JSONDecodeError: + continue + + # Fallback: try to find any JSON object that might contain tool_calls + # Look for patterns like { ... "tool_calls": ... } + json_pattern = r'\{[^{}]*"tool_calls"[^{}]*\}' + # Try to expand to include nested braces + for match in re.finditer(r'\{', text): + start = match.start() + brace_count = 0 + end = start + + for i in range(start, min(start + 5000, len(text))): # Limit search + if text[i] == '{': + brace_count += 1 + elif text[i] == '}': + brace_count -= 1 + if brace_count == 0: + end = i + 1 + break + + if end > start: + try: + json_str = text[start:end] + if '"tool_calls"' in json_str: + parsed = json.loads(json_str) + if "tool_calls" in parsed: + return parsed + except json.JSONDecodeError: + continue + + return None + + +def _extract_one_tool_call_from_str(json_str: str) -> Optional[Dict]: + """Find first complete JSON object in string and parse as {name, arguments}. Returns None on failure.""" + brace_start = json_str.find("{") + if brace_start == -1: + return None + brace_count = 0 + brace_end = -1 + for i in range(brace_start, len(json_str)): + if json_str[i] == "{": + brace_count += 1 + elif json_str[i] == "}": + brace_count -= 1 + if brace_count == 0: + brace_end = i + 1 + break + if brace_end == -1: + return None + try: + parsed = json.loads(json_str[brace_start:brace_end]) + name = parsed.get("name") + arguments = parsed.get("arguments", {}) + if name: + return {"name": name, "arguments": arguments} + except (json.JSONDecodeError, AttributeError): + pass + return None + + +def extract_tool_calls_from_xml_tags(text: str) -> Optional[List[Dict]]: + """ + Extract tool calls from ... tags (OpenClaw/ChainKlawd format). + Also handles truncated stream: if is missing but JSON after is complete, parse it. + Each block contains JSON: {"name": "func_name", "arguments": {...}} + """ + if not text or "" not in text: + return None + + result = [] + pos = 0 + while True: + start_tag = text.find("", pos) + if start_tag == -1: + break + content_start = start_tag + len("") + end_tag = text.find("", content_start) + if end_tag == -1: + # Truncated or stream ended before - try to parse JSON from content to end of text + json_str = text[content_start:].strip() + pos = len(text) + else: + json_str = text[content_start:end_tag].strip() + pos = end_tag + len("") + + one = _extract_one_tool_call_from_str(json_str) + if one: + result.append(one) + if end_tag == -1: + break + + return result if result else None + + +def split_content_for_streaming(content: str, original_tools: List[Dict]) -> tuple: + """ + Split content into reasoning (text to stream) and tool_calls (to emit as delta). + Returns (reasoning_content, tool_calls_list or None). + If tool_calls found: reasoning = content before the tool_calls block. + If no tool_calls: reasoning = full content, tool_calls = None. + """ + if not content: + return ("", None) + + tool_calls = parse_tool_calls_from_response(content, original_tools) + if not tool_calls: + return (content, None) + + # Find where tool_calls block starts to extract reasoning + # Try XML format first (simpler): ... + xml_start = content.find("") + if xml_start >= 0: + return (content[:xml_start].rstrip(), tool_calls) + + # Try JSON format: {"tool_calls": [...]} + pos = content.find('"tool_calls"') + if pos > 0: + start = content.rfind('{', 0, pos) + if start >= 0: + return (content[:start].rstrip(), tool_calls) + + return (content, tool_calls) + + +def parse_tool_calls_from_response(response_content: str, original_tools: List[Dict]) -> Optional[List[Dict]]: + """Parse tool calls from model response and format as OpenAI tool_calls. + Supports: + 1. JSON with tool_calls array (emulation prompt format) + 2. {"name": "...", "arguments": {...}} (OpenClaw/ChainKlawd format) + """ + if not response_content: + return None + + # Get available function names for validation + available_functions = {tool["function"]["name"] for tool in original_tools} + tool_calls_data = None + + # Try JSON format first ({"tool_calls": [{"function": {...}}]}) + parsed_json = extract_json_from_text(response_content) + if parsed_json: + tool_calls_data = parsed_json.get("tool_calls", []) + + # Fallback: {"name": "...", "arguments": {...}} format + if not tool_calls_data: + xml_calls = extract_tool_calls_from_xml_tags(response_content) + if xml_calls: + tool_calls_data = [ + {"function": {"name": c["name"], "arguments": c["arguments"]}, "id": None} + for c in xml_calls + ] + + if not tool_calls_data: + return None + + # Format as OpenAI tool_calls + formatted_calls = [] + for i, call in enumerate(tool_calls_data): + func_info = call.get("function", {}) + func_name = func_info.get("name", "") + + # Validate function name (warn but still include so client can handle) + if available_functions and func_name not in available_functions: + logger.debug(f"Function {func_name} not in available tools list, including anyway") + + # Get arguments (should be JSON string) + arguments = func_info.get("arguments", "{}") + if isinstance(arguments, str): + # Try to parse to validate + try: + json.loads(arguments) + except json.JSONDecodeError: + logger.warning(f"Invalid JSON in arguments for {func_name}, using as-is") + elif isinstance(arguments, dict): + # Convert dict to JSON string + arguments = json.dumps(arguments) + + call_id = call.get("id", f"call_{i}_{int(time.time() * 1000)}") + + formatted_calls.append({ + "id": call_id, + "type": "function", + "function": { + "name": func_name, + "arguments": arguments if isinstance(arguments, str) else json.dumps(arguments) + } + }) + + return formatted_calls if formatted_calls else None + + +def emulate_tool_choice_auto(body: Dict) -> Dict: + """Emulate tool_choice: auto by converting tools to prompt instructions""" + tools = body.get("tools", []) + tool_choice = body.get("tool_choice") + + # Only emulate if tools are present + if not tools: + return body + + # Accept "auto" or None (None means default which is "auto") + if tool_choice is not None and tool_choice != "auto": + return body + + logger.info(f"Emulating tool_choice: auto for {len(tools)} tools") + + # Create modified request body (deep copy to avoid modifying original) + import copy + modified_body = copy.deepcopy(body) + + # Remove tools and tool_choice from request to model + if "tool_choice" in modified_body: + del modified_body["tool_choice"] + if "tools" in modified_body: + del modified_body["tools"] + + # Create system prompt with tool descriptions + system_prompt = create_tool_selection_prompt(tools) + + # Add or update system message + messages = modified_body.get("messages", []).copy() + + # Check if there's already a system message + has_system = messages and messages[0].get("role") == "system" + + if has_system: + # Prepend tool selection instructions to existing system message + existing_system = messages[0].get("content", "") + messages[0]["content"] = f"{system_prompt}\n\n{existing_system}" + else: + # Insert new system message at the beginning + messages.insert(0, { + "role": "system", + "content": system_prompt + }) + + modified_body["messages"] = messages + + # Store original tools separately (not in body, will be passed separately) + # Don't add _original_tools to body as it will be sent to API + + return modified_body + + +def process_response_with_tool_emulation(response: Dict, original_tools: Optional[List[Dict]] = None) -> Dict: + """Process model response and add tool_calls if detected""" + if not original_tools: + # Try to get from response metadata + original_tools = response.get("_original_tools") + + if not original_tools: + return response + + # Get response content + choices = response.get("choices", []) + if not choices: + return response + + message = choices[0].get("message", {}) + content = message.get("content", "") + + # Normalize: ML nodes (vLLM etc) may return content as array + if isinstance(content, list): + parts = [] + for block in content: + if isinstance(block, dict) and block.get("type") == "text" and "text" in block: + parts.append(block["text"]) + content = "\n".join(parts) if parts else "" + + if not content: + return response + + # Try to parse tool calls from response + tool_calls = parse_tool_calls_from_response(content, original_tools) + + if tool_calls: + tool_names = [tc["function"]["name"] for tc in tool_calls] + content_preview = (content or "")[:250].replace("\n", " ") + logger.info(f"Detected {len(tool_calls)} tool call(s): {tool_names} | content preview: {content_preview!r}") + + # Modify response to OpenAI format with tool_calls + # content=None: OpenAI spec - when tool_calls present, content is null + message["tool_calls"] = tool_calls + message["content"] = None + message["role"] = "assistant" + + # Update finish_reason + choices[0]["finish_reason"] = "tool_calls" + + return response + + +async def process_stream_with_tool_emulation( + stream, + original_tools: List[Dict], + chunk_callback=None +): + """ + Process streaming response with tool emulation. + Streams each event to the client immediately (so client gets data and does not timeout). + Accumulates content for parsing; when [DONE] and tool_calls are found, appends a tool_calls chunk. + Yields: raw bytes (SSE format). + """ + total_content = "" + first_chunk_data = None + sse_line_buffer = "" + + async for chunk in stream: + try: + chunk_str = chunk.decode("utf-8") + except Exception: + yield chunk + continue + + sse_line_buffer += chunk_str + + while "\n\n" in sse_line_buffer: + event, sse_line_buffer = sse_line_buffer.split("\n\n", 1) + event = event.strip() + if not event.startswith("data: "): + continue + data_str = event[6:].strip() + event_bytes = f"data: {data_str}\n\n".encode("utf-8") + + if data_str == "[DONE]": + if chunk_callback: + chunk_callback(total_content) + try: + reasoning, tool_calls = split_content_for_streaming(total_content, original_tools) + except Exception as e: + logger.warning(f"[TOOL EMULATION STREAM] Parse error: {e}") + tool_calls = None + + if tool_calls: + logger.info(f"[TOOL EMULATION STREAM] Detected {len(tool_calls)} tool call(s)") + try: + base = first_chunk_data or { + "id": "chatcmpl-tool-emu", + "object": "chat.completion.chunk", + "choices": [{"index": 0, "delta": {}, "finish_reason": None}], + } + if first_chunk_data and "model" in first_chunk_data: + base["model"] = first_chunk_data["model"] + tc_deltas = [ + { + "index": idx, + "id": tc.get("id") or f"call_{idx}_{int(time.time() * 1000)}", + "type": "function", + "function": {"name": tc["function"]["name"], "arguments": tc["function"]["arguments"]}, + } + for idx, tc in enumerate(tool_calls) + ] + tc_chunk = { + **base, + "choices": [ + { + "index": 0, + "delta": {"tool_calls": tc_deltas}, + "finish_reason": "tool_calls", + } + ], + } + yield f"data: {json.dumps(tc_chunk)}\n\n".encode("utf-8") + except Exception as e: + logger.exception(f"[TOOL EMULATION STREAM] Re-emit error: {e}") + yield event_bytes + return + + try: + chunk_data = json.loads(data_str) + if first_chunk_data is None: + first_chunk_data = chunk_data + choices = chunk_data.get("choices", []) + delta = choices[0].get("delta", {}) if choices else {} + if "content" in delta: + total_content += delta["content"] + except (json.JSONDecodeError, IndexError, KeyError): + pass + + yield event_bytes + + if chunk_callback: + chunk_callback(total_content) + + # Stream ended without [DONE] - yield remainder as-is so client gets everything + if sse_line_buffer: + yield sse_line_buffer.encode("utf-8") + diff --git a/test_tool_emulation.py b/test_tool_emulation.py new file mode 100644 index 0000000..ab15416 --- /dev/null +++ b/test_tool_emulation.py @@ -0,0 +1,449 @@ +#!/usr/bin/env python3 +""" +Test script for tool emulation functionality +Tests the emulation of tool_choice: "auto" for models that don't support native tool calling +""" +import sys +import os +import json + +# Add app to path +sys.path.insert(0, os.path.dirname(os.path.abspath(__file__))) + +from app.tool_emulation import ( + format_tools_for_prompt, + create_tool_selection_prompt, + extract_json_from_text, + extract_tool_calls_from_xml_tags, + parse_tool_calls_from_response, + emulate_tool_choice_auto, + process_response_with_tool_emulation +) + + +def test_format_tools_for_prompt(): + """Test formatting tools into prompt description""" + print("=" * 50) + print("Test 1: Format Tools for Prompt") + print("=" * 50) + + tools = [ + { + "type": "function", + "function": { + "name": "get_weather", + "description": "Get current weather", + "parameters": { + "type": "object", + "properties": { + "location": { + "type": "string", + "description": "City name" + }, + "units": { + "type": "string", + "enum": ["celsius", "fahrenheit"], + "description": "Temperature units" + } + }, + "required": ["location"] + } + } + } + ] + + result = format_tools_for_prompt(tools) + + if "get_weather" not in result: + print("✗ Function name not found in prompt") + return False + + if "Get current weather" not in result: + print("✗ Function description not found") + return False + + if "location" not in result: + print("✗ Parameter not found") + return False + + print("✓ Tools formatted correctly") + print(f"Prompt preview: {result[:100]}...") + print("✓ Test 1 passed!\n") + return True + + +def test_create_tool_selection_prompt(): + """Test creation of tool selection prompt""" + print("=" * 50) + print("Test 2: Create Tool Selection Prompt") + print("=" * 50) + + tools = [ + { + "type": "function", + "function": { + "name": "calculate", + "description": "Perform calculation", + "parameters": { + "type": "object", + "properties": { + "expression": {"type": "string", "description": "Math expression"} + }, + "required": ["expression"] + } + } + } + ] + + prompt = create_tool_selection_prompt(tools) + + if "calculate" not in prompt: + print("✗ Function name not in prompt") + return False + + if "tool_calls" not in prompt: + print("✗ tool_calls format not in prompt") + return False + + if "JSON" not in prompt: + print("✗ JSON format instruction not in prompt") + return False + + print("✓ Tool selection prompt created correctly") + print("✓ Test 2 passed!\n") + return True + + +def test_extract_json_from_text(): + """Test JSON extraction from text""" + print("=" * 50) + print("Test 3: Extract JSON from Text") + print("=" * 50) + + # Test 1: Simple JSON + text1 = 'Here is the response: {"tool_calls": [{"id": "call_1", "function": {"name": "test"}}]}' + result1 = extract_json_from_text(text1) + if not result1 or "tool_calls" not in result1: + print("✗ Failed to extract simple JSON") + return False + print("✓ Simple JSON extracted") + + # Test 2: JSON in code block + text2 = '```json\n{"tool_calls": [{"id": "call_2"}]}\n```' + result2 = extract_json_from_text(text2) + if not result2 or "tool_calls" not in result2: + print("✗ Failed to extract JSON from code block") + return False + print("✓ JSON from code block extracted") + + # Test 3: No JSON + text3 = "Just regular text without JSON" + result3 = extract_json_from_text(text3) + if result3 is not None: + print("✗ Should return None for text without JSON") + return False + print("✓ Correctly returns None for text without JSON") + + print("✓ Test 3 passed!\n") + return True + + +def test_parse_tool_calls_from_response(): + """Test parsing tool calls from response""" + print("=" * 50) + print("Test 4: Parse Tool Calls from Response") + print("=" * 50) + + original_tools = [ + { + "type": "function", + "function": { + "name": "get_weather", + "description": "Get weather" + } + }, + { + "type": "function", + "function": { + "name": "calculate", + "description": "Calculate" + } + } + ] + + # Test 1: Valid JSON response + response1 = '''{ + "reasoning": "User wants weather", + "tool_calls": [ + { + "id": "call_123", + "type": "function", + "function": { + "name": "get_weather", + "arguments": "{\\"location\\": \\"Moscow\\"}" + } + } + ] +}''' + + result1 = parse_tool_calls_from_response(response1, original_tools) + if not result1: + print("✗ Failed to parse valid tool calls") + return False + + if len(result1) != 1: + print(f"✗ Expected 1 tool call, got {len(result1)}") + return False + + if result1[0]["function"]["name"] != "get_weather": + print("✗ Wrong function name") + return False + + print("✓ Valid tool calls parsed correctly") + + # Test 2: Invalid function name + response2 = '''{ + "tool_calls": [ + { + "id": "call_456", + "function": { + "name": "unknown_function", + "arguments": "{}" + } + } + ] +}''' + + result2 = parse_tool_calls_from_response(response2, original_tools) + if result2 and len(result2) > 0: + print("✗ Should skip invalid function names") + return False + + print("✓ Invalid function names correctly skipped") + + # Test 3: No tool calls + response3 = "Just a regular text response" + result3 = parse_tool_calls_from_response(response3, original_tools) + if result3 is not None: + print("✗ Should return None when no tool calls found") + return False + + print("✓ Correctly returns None when no tool calls") + + # Test 4: OpenClaw/ChainKlawd format + response4 = '''I'll check the file. + +{"name": "read", "arguments": {"path": "/root/.openclaw/workspace/BOOTSTRAP.md"}} +''' + original_tools_read = [{"type": "function", "function": {"name": "read", "description": "Read file"}}] + result4 = parse_tool_calls_from_response(response4, original_tools_read) + if not result4 or len(result4) != 1: + print(f"✗ Failed to parse format, got {result4}") + return False + if result4[0]["function"]["name"] != "read": + print(f"✗ Wrong function name: {result4[0]['function']['name']}") + return False + args = json.loads(result4[0]["function"]["arguments"]) + if args.get("path") != "/root/.openclaw/workspace/BOOTSTRAP.md": + print(f"✗ Wrong arguments: {args}") + return False + print("✓ format parsed correctly") + + print("✓ Test 4 passed!\n") + return True + + +def test_emulate_tool_choice_auto(): + """Test emulation of tool_choice: auto""" + print("=" * 50) + print("Test 5: Emulate Tool Choice Auto") + print("=" * 50) + + tools = [ + { + "type": "function", + "function": { + "name": "test_func", + "description": "Test function" + } + } + ] + + body = { + "model": "test-model", + "messages": [{"role": "user", "content": "Hello"}], + "tool_choice": "auto", + "tools": tools + } + + result = emulate_tool_choice_auto(body) + + # Check that tool_choice and tools are removed + if "tool_choice" in result: + print("✗ tool_choice not removed") + return False + + if "tools" in result: + print("✗ tools not removed") + return False + + # Check that system message is added + messages = result.get("messages", []) + if not messages or messages[0].get("role") != "system": + print("✗ System message not added") + return False + + # Check that system message contains tool descriptions + system_content = messages[0].get("content", "") + if "test_func" not in system_content: + print("✗ Tool description not in system message") + return False + + if "tool_calls" not in system_content: + print("✗ Tool calls format not in system message") + return False + + print("✓ Tool choice emulation works correctly") + print("✓ Test 5 passed!\n") + return True + + +def test_emulate_without_tools(): + """Test that emulation doesn't modify request without tools""" + print("=" * 50) + print("Test 6: Emulate Without Tools") + print("=" * 50) + + body = { + "model": "test-model", + "messages": [{"role": "user", "content": "Hello"}] + } + + result = emulate_tool_choice_auto(body) + + if result != body: + print("✗ Request modified when no tools present") + return False + + print("✓ Request unchanged when no tools") + print("✓ Test 6 passed!\n") + return True + + +def test_process_response_with_tool_emulation(): + """Test processing response with tool emulation""" + print("=" * 50) + print("Test 7: Process Response with Tool Emulation") + print("=" * 50) + + original_tools = [ + { + "type": "function", + "function": { + "name": "get_weather", + "description": "Get weather" + } + } + ] + + # Test 1: Response with tool calls + response1 = { + "choices": [{ + "message": { + "role": "assistant", + "content": '''{ + "tool_calls": [ + { + "id": "call_1", + "function": { + "name": "get_weather", + "arguments": "{\\"location\\": \\"Paris\\"}" + } + } + ] +}''' + } + }] + } + + result1 = process_response_with_tool_emulation(response1, original_tools) + + message = result1["choices"][0]["message"] + if "tool_calls" not in message: + print("✗ tool_calls not added to response") + return False + + if message.get("content") is not None: + print("✗ content should be None when tool_calls present") + return False + + if len(message["tool_calls"]) != 1: + print(f"✗ Expected 1 tool call, got {len(message['tool_calls'])}") + return False + + print("✓ Response processed correctly with tool calls") + + # Test 2: Response without tool calls + response2 = { + "choices": [{ + "message": { + "role": "assistant", + "content": "Just a regular response" + } + }] + } + + result2 = process_response_with_tool_emulation(response2, original_tools) + + if "tool_calls" in result2["choices"][0]["message"]: + print("✗ tool_calls should not be added when not present") + return False + + print("✓ Response unchanged when no tool calls") + print("✓ Test 7 passed!\n") + return True + + +def run_all_tests(): + """Run all tool emulation tests""" + print("\n" + "=" * 50) + print("Testing Tool Emulation Functionality") + print("=" * 50 + "\n") + + tests = [ + test_format_tools_for_prompt, + test_create_tool_selection_prompt, + test_extract_json_from_text, + test_parse_tool_calls_from_response, + test_emulate_tool_choice_auto, + test_emulate_without_tools, + test_process_response_with_tool_emulation + ] + + passed = 0 + failed = 0 + + for test in tests: + try: + if test(): + passed += 1 + else: + failed += 1 + except Exception as e: + print(f"✗ Test {test.__name__} raised exception: {e}") + import traceback + traceback.print_exc() + failed += 1 + + print("=" * 50) + print(f"Test Results: {passed} passed, {failed} failed") + print("=" * 50) + + return failed == 0 + + +if __name__ == "__main__": + success = run_all_tests() + sys.exit(0 if success else 1) + + From d8c28cec86b14f1103d59aa6c3b7d102bf81f02c Mon Sep 17 00:00:00 2001 From: MinglesAI Date: Tue, 31 Mar 2026 14:42:10 +0000 Subject: [PATCH 2/3] chore: add docker-compose.yml for easy local deployment --- docker-compose.yml | 13 +++++++++++++ 1 file changed, 13 insertions(+) create mode 100644 docker-compose.yml diff --git a/docker-compose.yml b/docker-compose.yml new file mode 100644 index 0000000..af80861 --- /dev/null +++ b/docker-compose.yml @@ -0,0 +1,13 @@ +services: + gonka-proxy: + build: . + ports: + - "8000:8000" + env_file: + - .env + restart: unless-stopped + healthcheck: + test: ["CMD", "curl", "-f", "http://localhost:8000/health"] + interval: 10s + timeout: 5s + retries: 3 From 7b6db2dbfc5c4119e3c13465cec71c21086b8bbc Mon Sep 17 00:00:00 2001 From: MinglesAI Date: Tue, 31 Mar 2026 18:18:45 +0000 Subject: [PATCH 3/3] fix: remove login/auth UI from public index.html MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Remove Google OAuth login button and flow - Remove Keplr/Leap wallet connect button and flow - Remove balance display, deposit button, logout button - Remove Dashboard, API Keys, Transactions pages/nav - Remove session management (saveSession/loadSession/clearSession) - Remove billing/balance docs section - Remove GNK price display and token cost tracking - Add API key input field in Chat (env-var key for proxy access) - Simplify sendChatMessage to read key from input - Fix footer: CloudMine URL → https://cloudmine.mingles.ai/ - Fix footer: remove Support (t.me) link - Fix footer: add Gateway link → https://gonka-gateway.mingles.ai/ --- app/static/index.html | 2192 ++--------------------------------------- 1 file changed, 82 insertions(+), 2110 deletions(-) diff --git a/app/static/index.html b/app/static/index.html index 55cacf8..f7c7c9d 100644 --- a/app/static/index.html +++ b/app/static/index.html @@ -3,22 +3,14 @@ - Gonka AI Gateway - OpenAI-compatible LLM inference API with token-based billing + Gonka AI Gateway - OpenAI-compatible LLM proxy - - @@ -962,70 +825,33 @@

Gonka Gonka AI Gateway

-

OpenAI-compatible LLM inference API with token-based billing

-

GNK: $4.00

-
-
- -
-

Dashboard

-
-
-
-

Balance

-
0 GNK
-
-
-

API Keys

-
0
-
-
-

Total Spent

-
0 ngonka
-
-
-

Total Inferences

-
0
-
-
-
-
-

Chat

+
+ + +
-
+ + +
@@ -1036,22 +862,8 @@

Chat

-
-
- - -
-

API Keys

- -
-
-
- - -
-

Transaction History

-
-
+
+
@@ -1060,25 +872,15 @@

📚 API Documentation

Getting Started

-

Gonka AI Gateway provides an OpenAI-compatible API with Web3 authentication and token-based billing.

-
- Note: You need to authorize and create an API key before using the API. -
+

Gonka AI Gateway provides an OpenAI-compatible API. Configure the API_KEY environment variable on the proxy server; clients authenticate using that key.

Authentication

-

All API requests require authentication using an API key in the Authorization header:

+

All API requests require the API key in the Authorization header:

Authorization: Bearer sk-your-api-key-here
-

Getting Your API Key

-
    -
  1. Authorize on the dashboard
  2. -
  3. Navigate to "API Keys" section
  4. -
  5. Click "Create New API Key"
  6. -
  7. Save the key immediately - it won't be shown again!
  8. -
@@ -1230,23 +1032,6 @@

Using with n8n

-
-

Token Billing

-

Each API request consumes GNK tokens from your balance. Tokens are calculated from the API response and balance is automatically deducted after each request.

-
- Pricing Note: We are currently in a trial period on the Gonka network. During this time, inference prices are very low - just tiny fractions of GNK tokens. Final pricing will be announced after the trial period ends. -
-
- Warning: If your balance is insufficient, the request will fail with a 402 Payment Required error. -
-

Checking Your Balance

-

You can check your balance on the dashboard or via the API:

-
- GET /api/user/balance
-Authorization: Bearer sk-your-api-key
-
-
-

Rate Limits

Currently, there are no rate limits. However, please use the API responsibly.

@@ -1254,7 +1039,7 @@

Rate Limits

Support

-

For issues or questions, please contact support or check the dashboard for your transaction history.

+

For issues or questions, join our Discord or open an issue on GitHub.

@@ -1364,858 +1149,54 @@

3. Summary

+