diff --git a/CHANGELOG.md b/CHANGELOG.md index 90e6172..71954c5 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -8,7 +8,7 @@ The format is intentionally lightweight and human-readable. Group entries by rel ### Added -- Added `contract: image-provider` plus an OpenAI-compatible `POST /v1/images/generations` path for image-capable providers +- Added `contract: image-provider` plus OpenAI-compatible `POST /v1/images/generations` and `POST /v1/images/edits` paths for image-capable providers - Added a shipped Dockerfile and tag-driven release-artifacts workflow for Python distributions, GHCR images, and optional PyPI publishing - Added public community-health and security baseline files: Code of Conduct, Security Policy, issue templates, PR template, Dependabot, and CodeQL - Added generic onboarding helpers (`foundrygate-bootstrap`, `foundrygate-doctor`) and a publish-dry-run workflow for GHCR and Python package validation diff --git a/README.md b/README.md index 160a02a..7ace5fc 100644 --- a/README.md +++ b/README.md @@ -40,7 +40,7 @@ FoundryGate is a local OpenAI-compatible router/proxy for OpenClaw and other cli ## Why FoundryGate - OpenAI-compatible API: expose `/v1/models` and `/v1/chat/completions` to OpenClaw or any OpenAI-style client. -- Modality growth path: the runtime now includes an OpenAI-compatible `POST /v1/images/generations` path for providers marked as image-capable. +- Modality growth path: the runtime now includes OpenAI-compatible image generation and image editing paths for providers marked as image-capable. - Single endpoint, multiple providers: clients call one local base URL while FoundryGate chooses the upstream provider. - Multi-provider routing: use `auto` for routing or target a provider directly by model id. - Multi-dimensional routing: score providers across locality, context headroom, token limits, cache metadata, latency, and recent failure state during provider selection. @@ -217,6 +217,24 @@ curl -fsS http://127.0.0.1:8090/v1/images/generations \ }' ``` +### `POST /v1/images/edits` + +OpenAI-compatible image editing endpoint. + +- expects `multipart/form-data` +- currently supports one required `image` upload plus an optional `mask` +- `model: "auto"` selects the best loaded provider with `capabilities.image_editing: true` +- `model: ""` routes directly to a loaded image-edit-capable provider + +```bash +curl -fsS http://127.0.0.1:8090/v1/images/edits \ + -F 'model=auto' \ + -F 'prompt=Remove the background and keep the subject centered' \ + -F 'image=@input.png' \ + -F 'mask=@mask.png' \ + -F 'size=1024x1024' +``` + ### Additional Stable Operational Endpoints - `POST /api/route` @@ -513,14 +531,15 @@ providers: ### Image Provider Contract -FoundryGate also supports `contract: image-provider` for OpenAI-compatible backends that expose `POST /images/generations`. +FoundryGate also supports `contract: image-provider` for OpenAI-compatible backends that expose image generation or image editing paths. What the current runtime guarantees for `image-provider`: - backend must be `openai-compat` - `capabilities.image_generation` is normalized to `true` -- explicit `image_editing: true` can be declared for future editing support +- explicit `image_editing: true` enables `POST /v1/images/edits` - `model: "auto"` on `POST /v1/images/generations` selects only providers with image-generation capability +- `model: "auto"` on `POST /v1/images/edits` selects only providers with image-editing capability Example: diff --git a/config.yaml b/config.yaml index 266ca0d..edeed3f 100644 --- a/config.yaml +++ b/config.yaml @@ -188,6 +188,7 @@ providers: # model: "gpt-image-1" # tier: default # capabilities: + # # image_generation is enabled automatically by the contract # image_editing: true # ── Anthropic ─────────────────────────────────────────────────────────── diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md index 6ef50fa..28df513 100644 --- a/docs/ARCHITECTURE.md +++ b/docs/ARCHITECTURE.md @@ -94,6 +94,7 @@ The main operational endpoints are: - `GET /v1/models` - `POST /v1/chat/completions` - `POST /v1/images/generations` +- `POST /v1/images/edits` - `POST /api/route` - `GET /api/stats` - `GET /api/recent` diff --git a/docs/INTEGRATIONS.md b/docs/INTEGRATIONS.md index 86053ed..4cb75c3 100644 --- a/docs/INTEGRATIONS.md +++ b/docs/INTEGRATIONS.md @@ -85,7 +85,7 @@ When onboarding a new client: These are roadmap items or early foundations: -- image generation routing through `POST /v1/images/generations` for providers that declare `contract: image-provider` +- image generation and image editing routing through `POST /v1/images/generations` and `POST /v1/images/edits` for providers that declare `contract: image-provider` - optional request hooks for context or optimization - richer CLI-sidecar adapters - provider and client onboarding helpers diff --git a/docs/TROUBLESHOOTING.md b/docs/TROUBLESHOOTING.md index 7aacc25..ff373bd 100644 --- a/docs/TROUBLESHOOTING.md +++ b/docs/TROUBLESHOOTING.md @@ -83,9 +83,9 @@ If the worker is healthy but still loses route selection, inspect `POST /api/rou - `locality_score` - `latency_score` -## Image generation fails +## Image generation or image editing fails -Check whether any loaded provider actually exposes image-generation capability: +Check whether any loaded provider actually exposes the required image capability: ```bash curl -fsS http://127.0.0.1:8090/v1/models @@ -101,7 +101,17 @@ curl -fsS https://api.example.com/v1/images/generations \ -d '{"model":"gpt-image-1","prompt":"test"}' ``` -If `model: "auto"` still fails, verify that at least one loaded provider reports `capabilities.image_generation: true`. +For image editing, validate the upstream edit surface too: + +```bash +curl -fsS https://api.example.com/v1/images/edits \ + -H 'Authorization: Bearer YOUR_KEY' \ + -F 'model=gpt-image-1' \ + -F 'prompt=test' \ + -F 'image=@input.png' +``` + +If `model: "auto"` still fails, verify that at least one loaded provider reports `capabilities.image_generation: true` for generation or `capabilities.image_editing: true` for editing. ## Many-agent OpenClaw traffic is not separated diff --git a/foundrygate/main.py b/foundrygate/main.py index c1c50fa..0b92c7b 100644 --- a/foundrygate/main.py +++ b/foundrygate/main.py @@ -15,6 +15,7 @@ from fastapi import FastAPI, Request from fastapi.responses import HTMLResponse, JSONResponse, StreamingResponse +from starlette.datastructures import UploadFile from .config import Config, load_config from .hooks import AppliedHooks, HookExecutionError, RequestHookContext, apply_request_hooks @@ -302,14 +303,79 @@ def _collect_image_request_fields(body: dict[str, Any]) -> dict[str, Any]: return fields +def _parse_optional_positive_int(value: Any, *, field_name: str) -> int | None: + """Return one optional positive integer field from request data.""" + if value in (None, ""): + return None + try: + parsed = int(value) + except (TypeError, ValueError) as exc: + raise ValueError(f"Field '{field_name}' must be a positive integer") from exc + if parsed <= 0: + raise ValueError(f"Field '{field_name}' must be a positive integer") + return parsed + + +def _extract_image_edit_request_fields(form_data: dict[str, Any]) -> dict[str, Any]: + """Return the validated scalar fields for one image-edit request.""" + prompt = form_data.get("prompt") + if not isinstance(prompt, str) or not prompt.strip(): + raise ValueError("Image editing requires a non-empty 'prompt' field") + + model = form_data.get("model") + if model is None: + model = "auto" + elif not isinstance(model, str) or not model.strip(): + raise ValueError("Field 'model' must be a non-empty string when provided") + + payload: dict[str, Any] = { + "prompt": prompt.strip(), + "model": model.strip() if isinstance(model, str) else "auto", + } + + n = _parse_optional_positive_int(form_data.get("n"), field_name="n") + if n is not None: + payload["n"] = n + + for key in ("size", "response_format", "user"): + value = form_data.get(key) + if isinstance(value, str) and value.strip(): + payload[key] = value.strip() + + return payload + + +async def _read_uploaded_file( + value: Any, *, field_name: str, required: bool +) -> dict[str, Any] | None: + """Read one uploaded file into a normalized payload.""" + if value is None: + if required: + raise ValueError(f"Image editing requires file field '{field_name}'") + return None + + if not isinstance(value, UploadFile): + raise ValueError(f"Field '{field_name}' must be an uploaded file") + + content = await value.read() + if not content: + raise ValueError(f"Uploaded file '{field_name}' must not be empty") + + return { + "filename": value.filename or field_name, + "content": content, + "content_type": value.content_type or "application/octet-stream", + } + + async def _resolve_image_route_preview( - body: dict[str, Any], headers: dict[str, str] + body: dict[str, Any], headers: dict[str, str], *, capability: str = "image_generation" ) -> tuple[RoutingDecision, str, str, list[str], str, AppliedHooks, dict[str, Any]]: """Resolve one image-generation request without calling a provider.""" body, hook_state = await _apply_request_hooks(body, headers) prompt = body.get("prompt") if not isinstance(prompt, str) or not prompt.strip(): - raise ValueError("Image generation requires a non-empty 'prompt' string") + raise ValueError("Image request requires a non-empty 'prompt' string") model_requested = str(body.get("model", "auto")) client_profile, profile_hints = _resolve_client_profile( @@ -323,19 +389,19 @@ async def _resolve_image_route_preview( provider = _providers.get(model_requested) if not provider: raise ValueError(f"Unknown image provider '{model_requested}'") - if not provider.capabilities.get("image_generation"): - raise ValueError(f"Provider '{model_requested}' does not support image generation") + if not provider.capabilities.get(capability): + raise ValueError(f"Provider '{model_requested}' does not support {capability}") decision = RoutingDecision( provider_name=model_requested, layer="direct", - rule_name="explicit-image-model", + rule_name=f"explicit-{capability}-model", confidence=1.0, reason=f"Directly requested image provider: {model_requested}", - details={"required_capability": "image_generation"}, + details={"required_capability": capability}, ) else: decision = _router.route_capability_request( - capability="image_generation", + capability=capability, request_text=prompt, model_requested=model_requested, client_profile=client_profile, @@ -346,7 +412,7 @@ async def _resolve_image_route_preview( provider_health={name: p.health.to_dict() for name, p in _providers.items()}, ) if not decision: - raise ValueError("No image-generation provider is available") + raise ValueError(f"No provider with capability '{capability}' is available") return ( decision, @@ -354,7 +420,7 @@ async def _resolve_image_route_preview( client_tag, _build_attempt_order( decision.provider_name, - required_capabilities=["image_generation"], + required_capabilities=[capability], ), model_requested, hook_state, @@ -678,6 +744,115 @@ async def image_generations(request: Request): ) +@app.post("/v1/images/edits") +async def image_edits(request: Request): + """OpenAI-compatible image editing endpoint.""" + try: + form = await request.form() + form_data = dict(form.multi_items()) + body = _extract_image_edit_request_fields(form_data) + image = await _read_uploaded_file(form_data.get("image"), field_name="image", required=True) + mask = await _read_uploaded_file(form_data.get("mask"), field_name="mask", required=False) + except ValueError as exc: + return _invalid_request_response("Invalid image editing request", exc=exc) + except Exception as exc: + logger.warning("Failed to parse image editing form: %s", exc) + return _invalid_request_response("Invalid image editing request") + + headers = _collect_routing_headers(request) + try: + ( + decision, + client_profile, + client_tag, + attempt_order, + model_requested, + hook_state, + effective_body, + ) = await _resolve_image_route_preview(body, headers, capability="image_editing") + except HookExecutionError as exc: + return _request_hook_error_response(exc) + except ValueError as exc: + return _invalid_request_response("Invalid image editing request", exc=exc) + + prompt = effective_body["prompt"].strip() + errors: list[str] = [] + + for provider_name in attempt_order: + provider = _providers.get(provider_name) + if not provider: + continue + if not provider.health.healthy and provider_name != attempt_order[0]: + continue + + try: + result = await provider.edit_image( + prompt, + image=image, + mask=mask, + n=effective_body.get("n", 1), + size=effective_body.get("size"), + response_format=effective_body.get("response_format"), + user=effective_body.get("user"), + ) + if _config.metrics.get("enabled") and isinstance(result, dict): + _metrics.log_request( + provider=provider_name, + model=provider.model, + layer=decision.layer, + rule_name=decision.rule_name, + latency_ms=(result.get("_foundrygate") or {}).get("latency_ms", 0), + requested_model=model_requested, + client_profile=client_profile, + client_tag=client_tag, + decision_reason=decision.reason, + confidence=decision.confidence, + attempt_order=attempt_order, + ) + + resp = JSONResponse(result) + resp.headers["X-FoundryGate-Provider"] = provider_name + resp.headers["X-FoundryGate-Profile"] = client_profile + resp.headers["X-FoundryGate-Layer"] = decision.layer + resp.headers["X-FoundryGate-Rule"] = decision.rule_name + resp.headers["X-FoundryGate-Hooks"] = ",".join(hook_state.applied_hooks) + resp.headers["X-FoundryGate-Hook-Errors"] = str(len(hook_state.errors)) + return resp + except ProviderError as exc: + errors.append(f"{provider_name}: {exc.detail}") + logger.warning( + "Image editing provider %s failed: %s, trying next...", + provider_name, + exc.detail[:200], + ) + if _config.metrics.get("enabled"): + _metrics.log_request( + provider=provider_name, + model=provider.model, + layer=decision.layer, + rule_name=decision.rule_name, + success=False, + error=exc.detail[:500], + requested_model=model_requested, + client_profile=client_profile, + client_tag=client_tag, + decision_reason=decision.reason, + confidence=decision.confidence, + attempt_order=attempt_order, + ) + + return JSONResponse( + { + "error": { + "message": f"All image editing providers failed: {'; '.join(errors)}", + "type": "provider_error", + "attempts": errors, + } + }, + status_code=502, + ) + + @app.get("/dashboard", response_class=HTMLResponse) async def dashboard(): """Minimal self-contained dashboard – no build step, no deps.""" diff --git a/foundrygate/providers.py b/foundrygate/providers.py index d84b04e..16331e1 100644 --- a/foundrygate/providers.py +++ b/foundrygate/providers.py @@ -196,6 +196,101 @@ async def generate_image( self.health.record_failure(f"Connection error: {e}") raise ProviderError(self.name, 0, f"Connection error: {e}") from e + async def edit_image( + self, + prompt: str, + *, + image: dict[str, Any], + mask: dict[str, Any] | None = None, + model_override: str | None = None, + n: int = 1, + size: str | None = None, + response_format: str | None = None, + user: str | None = None, + extra_fields: dict[str, Any] | None = None, + ) -> dict: + """Send an OpenAI-compatible image editing request.""" + if self.backend_type != "openai-compat": + raise ProviderError( + self.name, + 0, + f"Image editing is not implemented for backend '{self.backend_type}'", + ) + + model = model_override or self.model + data: dict[str, str] = { + "model": model, + "prompt": prompt, + "n": str(n), + } + if size: + data["size"] = size + if response_format: + data["response_format"] = response_format + if user: + data["user"] = user + if extra_fields: + for key, value in extra_fields.items(): + if value is None: + continue + data[key] = str(value) + + files = [ + ( + "image", + ( + image["filename"], + image["content"], + image.get("content_type") or "application/octet-stream", + ), + ) + ] + if mask: + files.append( + ( + "mask", + ( + mask["filename"], + mask["content"], + mask.get("content_type") or "application/octet-stream", + ), + ) + ) + + headers = {"Authorization": f"Bearer {self.api_key}"} + if "openrouter" in self.base_url: + headers["HTTP-Referer"] = "https://foundrygate.local" + headers["X-Title"] = "FoundryGate" + + url = f"{self.base_url}/images/edits" + t0 = time.time() + + try: + resp = await self._client.post(url, data=data, files=files, headers=headers) + latency = (time.time() - t0) * 1000 + + if resp.status_code >= 400: + error_text = resp.text[:500] + self.health.record_failure(f"HTTP {resp.status_code}: {error_text}") + raise ProviderError(self.name, resp.status_code, error_text) + + self.health.record_success(latency) + data = resp.json() + data["_foundrygate"] = { + "provider": self.name, + "model": model, + "latency_ms": round(latency, 1), + "modality": "image_editing", + } + return data + + except httpx.TimeoutException as e: + self.health.record_failure(f"Timeout: {e}") + raise ProviderError(self.name, 0, f"Timeout: {e}") from e + except httpx.ConnectError as e: + self.health.record_failure(f"Connection error: {e}") + raise ProviderError(self.name, 0, f"Connection error: {e}") from e + # ── OpenAI-compatible completion ─────────────────────────── async def complete( diff --git a/pyproject.toml b/pyproject.toml index 2dd96e3..fdff72d 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -27,6 +27,7 @@ dependencies = [ "httpx>=0.27.0", "pyyaml>=6.0.2", "python-dotenv>=1.0.1", + "python-multipart>=0.0.9", ] [project.optional-dependencies] diff --git a/requirements.txt b/requirements.txt index ca6c424..c3edb0e 100644 --- a/requirements.txt +++ b/requirements.txt @@ -4,3 +4,4 @@ httpx>=0.27.0 pyyaml>=6.0.2 python-dotenv>=1.0.1 aiosqlite>=0.20.0 +python-multipart>=0.0.9 diff --git a/skills/foundrygate/SKILL.md b/skills/foundrygate/SKILL.md index 5813911..81999bc 100644 --- a/skills/foundrygate/SKILL.md +++ b/skills/foundrygate/SKILL.md @@ -76,6 +76,17 @@ curl -s http://127.0.0.1:8090/v1/images/generations \ }' | python3 -m json.tool ``` +### /foundrygate image-edit +Send one image-edit request with a local input file and optional mask. + +```bash +curl -s http://127.0.0.1:8090/v1/images/edits \ + -F "model=auto" \ + -F "prompt=PROMPT_HERE" \ + -F "image=@INPUT_IMAGE.png" \ + -F "mask=@OPTIONAL_MASK.png" | python3 -m json.tool +``` + ### /foundrygate recent Show the last 10 requests with provider, layer, rule, tokens, cost, and status. diff --git a/tests/test_providers.py b/tests/test_providers.py index 0e22551..19b5b14 100644 --- a/tests/test_providers.py +++ b/tests/test_providers.py @@ -266,3 +266,65 @@ async def _fake_post(url, json=None, headers=None, **kw): assert captured["json"]["user"] == "tester" assert result["_foundrygate"]["provider"] == "image-cloud" assert result["_foundrygate"]["modality"] == "image_generation" + + @pytest.mark.asyncio + async def test_openai_image_editing_posts_to_edits_endpoint(self): + backend = ProviderBackend( + "image-cloud", + { + "contract": "image-provider", + "backend": "openai-compat", + "base_url": "https://api.example.com/v1", + "api_key": "secret", + "model": "gpt-image-1", + "capabilities": {"image_editing": True}, + }, + ) + captured: dict = {} + + class _FakeResp: + status_code = 200 + + def json(self): + return {"created": 1, "data": [{"b64_json": "edited"}]} + + async def _fake_post(url, data=None, files=None, headers=None, **kw): + captured["url"] = url + captured["data"] = data or {} + captured["files"] = files or [] + captured["headers"] = headers or {} + return _FakeResp() + + backend._client.post = _fake_post # type: ignore[attr-defined] + + result = await backend.edit_image( + "remove the background", + image={ + "filename": "input.png", + "content": b"image-bytes", + "content_type": "image/png", + }, + mask={ + "filename": "mask.png", + "content": b"mask-bytes", + "content_type": "image/png", + }, + n=2, + size="1024x1024", + response_format="b64_json", + user="tester", + ) + + assert captured["url"] == "https://api.example.com/v1/images/edits" + assert captured["data"]["model"] == "gpt-image-1" + assert captured["data"]["prompt"] == "remove the background" + assert captured["data"]["n"] == "2" + assert captured["data"]["size"] == "1024x1024" + assert captured["data"]["response_format"] == "b64_json" + assert captured["data"]["user"] == "tester" + assert captured["files"][0][0] == "image" + assert captured["files"][0][1][0] == "input.png" + assert captured["files"][1][0] == "mask" + assert captured["files"][1][1][0] == "mask.png" + assert result["_foundrygate"]["provider"] == "image-cloud" + assert result["_foundrygate"]["modality"] == "image_editing" diff --git a/tests/test_route_introspection.py b/tests/test_route_introspection.py index 3dd6117..3c48308 100644 --- a/tests/test_route_introspection.py +++ b/tests/test_route_introspection.py @@ -40,6 +40,7 @@ async def aclose(self): import foundrygate.main as main_module from foundrygate.config import load_config from foundrygate.main import ( + _extract_image_edit_request_fields, _refresh_local_worker_probes, _resolve_image_route_preview, _resolve_route_preview, @@ -120,6 +121,8 @@ def preview_config(tmp_path, monkeypatch): api_key: "secret" model: "gpt-image-1" tier: default + capabilities: + image_editing: true client_profiles: enabled: true default: generic @@ -169,6 +172,7 @@ def preview_config(tmp_path, monkeypatch): "cloud": True, "network_zone": "public", "image_generation": True, + "image_editing": True, }, ), }, @@ -260,6 +264,57 @@ async def test_image_preview_selects_image_provider(self, preview_config): assert hook_state.applied_hooks == [] assert effective_body["prompt"] == "Draw a blueprint-style gateway diagram." + @pytest.mark.asyncio + async def test_image_edit_preview_selects_edit_capable_provider(self, preview_config): + ( + decision, + profile_name, + client_tag, + attempt_order, + model_requested, + hook_state, + effective_body, + ) = await _resolve_image_route_preview( + { + "model": "auto", + "prompt": "Remove the background and keep the subject.", + }, + {}, + capability="image_editing", + ) + + assert model_requested == "auto" + assert profile_name == "generic" + assert client_tag == "generic" + assert decision.provider_name == "image-cloud" + assert decision.details["required_capability"] == "image_editing" + assert attempt_order == ["image-cloud"] + assert hook_state.applied_hooks == [] + assert effective_body["prompt"] == "Remove the background and keep the subject." + + def test_extract_image_edit_request_fields_requires_prompt(self): + with pytest.raises(ValueError, match="non-empty 'prompt'"): + _extract_image_edit_request_fields({"model": "auto"}) + + def test_extract_image_edit_request_fields_parses_scalars(self): + payload = _extract_image_edit_request_fields( + { + "model": "image-cloud", + "prompt": "Retouch the lighting", + "n": "2", + "size": "1024x1024", + "response_format": "b64_json", + "user": "tester", + } + ) + + assert payload["model"] == "image-cloud" + assert payload["prompt"] == "Retouch the lighting" + assert payload["n"] == 2 + assert payload["size"] == "1024x1024" + assert payload["response_format"] == "b64_json" + assert payload["user"] == "tester" + class TestLocalWorkerProbeRefresh: @pytest.mark.asyncio