From 0fe6490fc94f43440dc474b07671ddd77f897a51 Mon Sep 17 00:00:00 2001 From: Matthias Vallentin Date: Tue, 17 Feb 2026 21:22:34 +0100 Subject: [PATCH 01/12] Document accept_http and update from_http and http Add documentation for the new `accept_http` operator that starts an HTTP server and forwards incoming requests as events. Update `from_http` to reflect its new client-only design with the `$response` pipeline variable replacing `metadata_field`, and the removal of `server=true` mode. Add `accept_http` to the See Also section of the `http` operator. Co-Authored-By: Claude Opus 4.6 --- .../docs/reference/operators/accept_http.mdx | 146 +++++++++++++++++ .../docs/reference/operators/from_http.mdx | 148 +++++------------- src/content/docs/reference/operators/http.mdx | 1 + 3 files changed, 183 insertions(+), 112 deletions(-) create mode 100644 src/content/docs/reference/operators/accept_http.mdx diff --git a/src/content/docs/reference/operators/accept_http.mdx b/src/content/docs/reference/operators/accept_http.mdx new file mode 100644 index 000000000..8b96c5b0c --- /dev/null +++ b/src/content/docs/reference/operators/accept_http.mdx @@ -0,0 +1,146 @@ +--- +title: accept_http +category: Inputs/Events +example: 'accept_http "0.0.0.0:8080" { read_json }' +--- + +import Op from '@components/see-also/Op.astro'; +import Guide from '@components/see-also/Guide.astro'; +import Integration from '@components/see-also/Integration.astro'; + +Accepts incoming HTTP requests and forwards them as events. + +```tql +accept_http url:string, [responses=record, max_request_size=int, + max_connections=int, tls=record] + { … } +``` + +## Description + +The `accept_http` operator starts an HTTP/1.1 server on the given address and +forwards incoming requests as events. Each request spawns a sub-pipeline that +processes the request body independently. + +The sub-pipeline has access to a `$request` variable containing the request +metadata. + +### `url: string` + +The endpoint to listen on. Must have the form `:`. Use `0.0.0.0` to +accept connections on all interfaces. + +### `responses = record (optional)` + +Specify custom responses for endpoints on the server. For example, + +```tql +responses = { + "/resource/create": { code: 200, content_type: "text/html", body: "Created!" }, + "/resource/delete": { code: 401, content_type: "text/html", body: "Unauthorized!" } +} +``` + +creates two special routes on the server with different responses. + +Each route must be a record with `code`, `content_type`, and `body` fields. + +Requests to an unspecified endpoint are responded with HTTP Status `200 OK`. + +### `max_request_size = int (optional)` + +The maximum size of an incoming request to accept. + +Defaults to `10MiB`. + +### `max_connections = int (optional)` + +The maximum number of simultaneous incoming connections to accept. + +Defaults to `10`. + +import TLSOptions from '@partials/operators/TLSOptions.mdx'; + + + +### `{ … }` + +The pipeline to run for each incoming HTTP request. Inside the pipeline, the +`$request` variable is available as a record with the following fields: + +| Field | Type | Description | +| :--------- | :------- | :----------------------------------- | +| `headers` | `record` | The request headers. | +| `query` | `record` | The query parameters of the request. | +| `path` | `string` | The path requested. | +| `fragment` | `string` | The URI fragment of the request. | +| `method` | `string` | The HTTP method of the request. | +| `version` | `string` | The HTTP version of the request. | +| `body` | `blob` | The raw request body. | + +## Examples + +### Accept JSON requests on port 8080 + +Listen on all interfaces and parse incoming request bodies as JSON: + +```tql +accept_http "0.0.0.0:8080" { + read_json +} +``` + +Send a request to the endpoint via `curl`: + +```bash +echo '{"key": "value"}' | curl localhost:8080 --data-binary @- -H 'Content-Type: application/json' +``` + +### Filter requests by path + +Use the `$request` variable to filter or route requests: + +```tql +accept_http "0.0.0.0:8080" { + read_json + where $request.path == "/events" and $request.method == "POST" +} +``` + +### Custom responses per endpoint + +Return different HTTP responses based on the request path: + +```tql +accept_http "0.0.0.0:8080", + responses={ + "/webhook": { + code: 201, + content_type: "text/plain", + body: "accepted", + }, + } { + read_json + where $request.path == "/webhook" +} +``` + +### Accept HTTPS requests with TLS + +```tql +accept_http "0.0.0.0:8443", + tls={ + certfile: "/path/to/cert.pem", + keyfile: "/path/to/key.pem", + } { + read_json +} +``` + +## See Also + +- from_http +- http +- serve +- collecting/fetch-via-http-and-apis +- http diff --git a/src/content/docs/reference/operators/from_http.mdx b/src/content/docs/reference/operators/from_http.mdx index 8ca5085f8..242ef6d51 100644 --- a/src/content/docs/reference/operators/from_http.mdx +++ b/src/content/docs/reference/operators/from_http.mdx @@ -1,25 +1,23 @@ --- title: from_http category: Inputs/Events -example: 'from_http "0.0.0.0:8080"' +example: 'from_http "https://example.com/api"' --- -Sends and receives HTTP/1.1 requests. +Sends an HTTP/1.1 request and returns the response as events. ```tql from_http url:string, [method=string, body=record|string|blob, encode=string, - headers=record, metadata_field=field, error_field=field, - paginate=record->string|string, paginate_delay=duration, - connection_timeout=duration, max_retry_count=int, - retry_delay=duration, tls=record] -from_http url:string, server=true, [metadata_field=field, responses=record, - max_request_size=int, max_connections=int, tls=record] + headers=record, error_field=field, paginate=string, + paginate_delay=duration, connection_timeout=duration, + max_retry_count=int, retry_delay=duration, tls=record] + { … } ``` ## Description -The `from_http` operator issues HTTP requests or spins up an HTTP/1.1 server on -a given address and forwards received requests as events. +The `from_http` operator issues an HTTP request and returns the response as +events. :::tip[Format and Compression Inference] @@ -32,76 +30,19 @@ If neither the URL nor the HTTP headers provide enough information, you can expl ### `url: string` -URL to listen on or to connect to. - -Must have the form `:` when `server=true`. +URL to connect to. import HTTPClientOptions from '@partials/operators/HTTPClientOptions.mdx'; -### `metadata_field = field (optional)` - -Field to insert metadata into when using the parsing pipeline. - -The response metadata (when using the client mode) has the following schema: - -| Field | Type | Description | -| :-------- | :------- | :------------------------------------ | -| `code` | `uint64` | The HTTP status code of the response. | -| `headers` | `record` | The response headers. | - -The request metadata (when using the server mode) has the following schema: - -| Field | Type | Description | -| :--------- | :------- | :----------------------------------- | -| `headers` | `record` | The request headers. | -| `query` | `record` | The query parameters of the request. | -| `path` | `string` | The path requested. | -| `fragment` | `string` | The URI fragment of the request. | -| `method` | `string` | The HTTP method of the request. | -| `version` | `string` | The HTTP version of the request. | - ### `error_field = field (optional)` Field to insert the response body for HTTP error responses (status codes not in the 2xx or 3xx range). When set, any HTTP response with a status code outside the 200–399 range will -have its body stored in this field as a `blob`. Otherwise, error responses, -alongside the original event, are skipped and an error is emitted. - -### `server = bool (optional)` - -Whether to spin up an HTTP server or act as an HTTP client. - -Defaults to `false`, i.e., the HTTP client. - -### `responses = record (optional)` - -Specify custom responses for endpoints on the server. For example, - -```tql -responses = { - "/resource/create": { code: 200, content_type: "text/html", body: "Created!" }, - "/resource/delete": { code: 401, content_type: "text/html", body: "Unauthorized!" } -} -``` - -creates two special routes on the server with different responses. - -Requests to an unspecified endpoint are responded with HTTP Status `200 OK`. - -### `max_request_size = int (optional)` - -The maximum size of an incoming request to accept. - -Defaults to `10MiB`. - -### `max_connections = int (optional)` - -The maximum number of simultaneous incoming connections to accept. - -Defaults to `10`. +have its body stored in this field as a `blob`. Otherwise, error responses are +skipped and an error is emitted. import TLSOptions from '@partials/operators/TLSOptions.mdx'; @@ -113,6 +54,14 @@ A pipeline that receives the response body as bytes, allowing parsing per request. This is especially useful in scenarios where the response body can be parsed into multiple events. +Inside the pipeline, the `$response` variable is available as a record with the +following fields: + +| Field | Type | Description | +| :-------- | :------- | :------------------------------------ | +| `code` | `uint64` | The HTTP status code of the response. | +| `headers` | `record` | The response headers. | + If not provided, the operator will attempt to infer the parsing operator from the `Content-Type` header. Should this inference fail (e.g., unsupported or missing `Content-Type`), the operator raises an error. @@ -149,17 +98,26 @@ head 1 } ``` -### Paginate with a Lambda - -Use the `paginate` parameter with a lambda to extract the next page URL from the -response body: +### Send a POST request with JSON body ```tql -from_http "https://api.example.com/data", paginate=(x => x.next_url?) +from_http "https://httpbin.org/post", body={key: "value"}, encode="json" { + read_json +} ``` -This sends a GET request to the initial URL and evaluates the `x.next_url` field -in the response to determine the next URL for subsequent requests. +### Access response metadata + +Use the `$response` variable inside a parsing pipeline to access the HTTP +response code and headers: + +```tql +from_http "https://example.com/api", method="put" { + read_json + where $response.code == 200 + response = $response +} +``` ### Paginate via Link Headers @@ -185,43 +143,9 @@ from_http "https://api.example.com/data", max_retry_count=3, retry_delay=2s This tries up to 3 times, waiting 2 seconds between each retry. -### Listen on port 8080 - -Spin up a server with: - -```tql -from_http "0.0.0.0:8080", server=true, metadata_field=metadata -``` - -Send a request to the HTTP endpoint via `curl`: - -```sh -echo '{"key": "value"}' | gzip | curl localhost:8080 --data-binary @- -H 'Content-Encoding: gzip' -H 'Content-Type: application/json' -``` - -Observe the request in the Tenzir pipeline, parsed and decompressed: - -```tql -{ - key: "value", - metadata: { - headers: { - Host: "localhost:8080", - "User-Agent": "curl/8.13.0", - Accept: "*/*", - "Content-Encoding": "gzip", - "Content-Length": "37", - "Content-Type": "application/json", - }, - path: "/", - method: "post", - version: "HTTP/1.1", - }, -} -``` - ## See Also +- accept_http - http - serve - collecting/fetch-via-http-and-apis diff --git a/src/content/docs/reference/operators/http.mdx b/src/content/docs/reference/operators/http.mdx index f2db358c6..2e8ea0fdf 100644 --- a/src/content/docs/reference/operators/http.mdx +++ b/src/content/docs/reference/operators/http.mdx @@ -193,5 +193,6 @@ header with `rel=next`, such as GitHub, GitLab, and Jira. ## See Also +- accept_http - from_http - collecting/fetch-via-http-and-apis From 3af03a64e4cfc3acacb60c7289eeb068e7640b83 Mon Sep 17 00:00:00 2001 From: Matthias Vallentin Date: Wed, 18 Feb 2026 09:16:00 +0100 Subject: [PATCH 02/12] Update HTTP API guide for new operator APIs - Replace `metadata_field` with `$response` variable for `from_http` - Move lambda pagination examples to `http` operator (from_http only supports `paginate="link"` now) - Fix TLS syntax to use `tls={...}` record format - Fix incorrect parameter names (`max_retries` -> `max_retry_count`, `timeout` -> `connection_timeout`) - Update data enrichment and error handling examples - Clarify which features are `from_http`-only vs `http`-only Co-Authored-By: Claude Opus 4.6 --- .../collecting/fetch-via-http-and-apis.mdx | 152 +++++++++++------- 1 file changed, 90 insertions(+), 62 deletions(-) diff --git a/src/content/docs/guides/collecting/fetch-via-http-and-apis.mdx b/src/content/docs/guides/collecting/fetch-via-http-and-apis.mdx index d413f055a..4f2825f36 100644 --- a/src/content/docs/guides/collecting/fetch-via-http-and-apis.mdx +++ b/src/content/docs/guides/collecting/fetch-via-http-and-apis.mdx @@ -150,18 +150,26 @@ API tokens, as in the above example. ### TLS and Security -Enable TLS by setting the `tls` parameter to `true` and configure client -certificates using the `certfile` and `keyfile` parameters: +Configure TLS by passing a record to the `tls` parameter with certificate +paths: ```tql from_http "https://secure-api.example.com/data", - tls=true, - certfile="/path/to/client.crt", - keyfile="/path/to/client.key" + tls={ + certfile: "/path/to/client.crt", + keyfile: "/path/to/client.key", + } ``` Use these options when APIs require client certificate authentication. +To skip peer verification (e.g., for self-signed certificates in development): + +```tql +from_http "https://dev-api.example.com/data", + tls={skip_peer_verification: true} +``` + ### Timeout and Retry Configuration Configure timeouts and retry behavior by setting the `connection_timeout`, @@ -169,8 +177,8 @@ Configure timeouts and retry behavior by setting the `connection_timeout`, ```tql from_http "https://api.example.com/data", - timeout=10s, - max_retries=3, + connection_timeout=10s, + max_retry_count=3, retry_delay=2s ``` @@ -183,62 +191,45 @@ Use HTTP requests to enrich existing data with information from external APIs. ### Preserving Input Context Keep original event data while adding API responses by specifying the -`response_field` parameter to control where the response is stored: +`response_field` parameter on the [`http`](/reference/operators/http) operator to +control where the response is stored: ```tql from { domain: "example.com", severity: "HIGH", - api_url: "https://threat-intel.example.com/lookup", - response_field: "threat_data", } -http f"{api_url}?domain={domain}", response_field=response_field +http f"https://threat-intel.example.com/lookup?domain={domain}", + response_field=threat_data ``` This approach preserves your original data and adds API responses in a specific field. -### Adding Metadata +### Accessing Response Metadata -Capture HTTP response metadata by specifying the `metadata_field` parameter to -store status codes and headers separately from the response body: +With `from_http`, use the `$response` variable inside a parsing pipeline to +access HTTP status codes and headers: ```tql -from_http "https://api.example.com/status", metadata_field=http_meta +from_http "https://api.example.com/status" { + read_json + status_code = $response.code + server = $response.headers.Server +} ``` -The metadata includes status codes and response headers for debugging and -monitoring. - -## Pagination and Bulk Processing - -Handle APIs that return large datasets across multiple pages. - -### Lambda-Based Pagination - -Implement automatic pagination by providing a lambda function to the `paginate` -parameter that extracts the next page URL from the response: +With the `http` operator, use the `metadata_field` parameter instead: ```tql -from_http "https://api.example.com/search?q=query", - paginate=(response => "next_page_url" if response.has_more) +from {url: "https://api.example.com/status"} +http url, metadata_field=http_meta +where http_meta.code >= 200 and http_meta.code < 300 ``` -The operator continues making requests as long as the pagination lambda function -returns a valid URL. - -### Complex Pagination Logic - -Handle APIs with custom pagination schemes by building pagination URLs -dynamically using expressions that reference response data: - -```tql -let $base_url = "https://api.example.com/items" -from_http f"{$base_url}?page=1", - paginate=(x => f"{$base_url}?page={x.page + 1}" if x.page < x.total_pages), -``` +## Pagination and Bulk Processing -This example builds pagination URLs dynamically based on response data. +Handle APIs that return large datasets across multiple pages. ### Link Header Pagination @@ -271,6 +262,36 @@ from {url: "https://api.github.com/repos/tenzir/tenzir/issues?per_page=10"} http url, paginate="link" ``` +### Lambda-Based Pagination + +The [`http`](/reference/operators/http) operator supports lambda-based +pagination for APIs that use custom pagination schemes. Provide a lambda +function to the `paginate` parameter that extracts the next page URL from the +response: + +```tql +from {} +http "https://api.example.com/search?q=query", + paginate=(x => x.next_url if x.has_more) +``` + +The operator continues making requests as long as the pagination lambda function +returns a valid URL. + +### Complex Pagination Logic + +Handle APIs with custom pagination schemes by building pagination URLs +dynamically using expressions that reference response data: + +```tql +let $base_url = "https://api.example.com/items" +from {} +http f"{$base_url}?page=1", + paginate=(x => f"{$base_url}?page={x.page + 1}" if x.page < x.total_pages) +``` + +This example builds pagination URLs dynamically based on response data. + ### Rate Limiting Control request frequency by configuring the `paginate_delay` parameter to add @@ -278,15 +299,11 @@ delays between requests and the `parallel` parameter to limit concurrent requests: ```tql -from { - url: "https://api.example.com/data", - paginate_delay: 500ms, - parallel: 2 -} +from {url: "https://api.example.com/data"} http url, - paginate="next_url" if has_next, - paginate_delay=paginate_delay, - parallel=parallel + paginate=(x => x.next_url if x.has_next), + paginate_delay=500ms, + parallel=2 ``` Use `paginate_delay` and `parallel` to manage request rates appropriately. @@ -310,9 +327,11 @@ scenarios. Monitor API health and response times: ```tql -from_http "https://api.example.com/health", metadata_field=metadata -select date=metadata.headers.Date.parse_time("%a, %d %b %Y %H:%M:%S %Z") -latency = now() - date +from_http "https://api.example.com/health" { + read_json + date = $response.headers.Date.parse_time("%a, %d %b %Y %H:%M:%S %Z") + latency = now() - date +} ``` The above example parses the `Date` header from the HTTP response via @@ -320,8 +339,6 @@ The above example parses the `Date` header from the HTTP response via compares it to the current wallclock time using the now function. -Nit: `%T` is a shortcut for `%H:%M:%S`. - ## Error Handling Handle API errors and failures gracefully in your data pipelines. @@ -334,18 +351,28 @@ between retries: ```tql from_http "https://unreliable-api.example.com/data", - max_retries=5, + max_retry_count=5, retry_delay=2s ``` ### Status Code Handling -Check HTTP status codes by capturing metadata and filtering based on the -`code` field to handle different response types: +Check HTTP status codes using the `$response` variable to handle different +response types: ```tql -from_http "https://api.example.com/data", metadata_field=metadata -where metadata.code >= 200 and metadata.code < 300 +from_http "https://api.example.com/data" { + read_json + where $response.code >= 200 and $response.code < 300 +} +``` + +With the `http` operator, use `metadata_field` instead: + +```tql +from {url: "https://api.example.com/data"} +http url, metadata_field=meta +where meta.code >= 200 and meta.code < 300 ``` ## Best Practices @@ -358,8 +385,9 @@ Follow these practices for reliable and efficient API integration: handling transient failures. 3. **Respect rate limits**. Use `parallel` and `paginate_delay` to control request rates. -4. **Handle errors gracefully**. Check status codes in metadata - (`metadata_field`) and implement fallback logic. +4. **Handle errors gracefully**. Use `$response` in `from_http` parsing + pipelines or `metadata_field` with `http` to check status codes and implement + fallback logic. 5. **Secure credentials**. Access API keys and tokens via [secrets](/explanations/secrets), not in code. 6. **Monitor API usage**. Track response times and error rates for From 4193188ebc7d16a24ae3f7e52c7c3e2c435db6c3 Mon Sep 17 00:00:00 2001 From: Matthias Vallentin Date: Wed, 18 Feb 2026 09:32:00 +0100 Subject: [PATCH 03/12] Update HTTP integration page for new operator landscape Restructure around use cases: fetching data, accepting requests, and sending data. Add accept_http coverage, TLS examples for both client and server, and webhook/enrichment/pagination examples. Co-Authored-By: Claude Opus 4.6 --- src/content/docs/integrations/http.mdx | 97 +++++++++++++++++++++----- 1 file changed, 79 insertions(+), 18 deletions(-) diff --git a/src/content/docs/integrations/http.mdx b/src/content/docs/integrations/http.mdx index 18bf50774..688eb7922 100644 --- a/src/content/docs/integrations/http.mdx +++ b/src/content/docs/integrations/http.mdx @@ -4,47 +4,108 @@ sidebar: label: HTTP(S) --- -Tenzir supports HTTP and HTTPS, both as sender and receiver. +[HTTP](https://en.wikipedia.org/wiki/HTTP) is the foundation of data exchange +on the web. Tenzir provides operators for both sides of an HTTP conversation: +fetching data from APIs and accepting incoming requests. + +## Fetching Data from APIs When retrieving data from an API or website, you prepare your HTTP request and get back the HTTP response body as your pipeline data: ![HTTP from](http-from.svg) +Use [`from_http`](/reference/operators/from_http) to issue a one-shot HTTP +request, or [`http`](/reference/operators/http) to enrich events flowing through +a pipeline with HTTP responses. Both operators automatically infer the response +format from the URL extension or `Content-Type` header. + +## Accepting Incoming Requests + +Use [`accept_http`](/reference/operators/accept_http) to spin up an HTTP server +that turns incoming requests into pipeline events. This is useful for receiving +webhooks, building custom API endpoints, or ingesting data pushed by external +systems. + +## Sending Data to APIs + When sending data from a pipeline to an API or website, the events in the -pipeline make up the HTTP request body. If the HTTP status code is not 2\*\*, -you will get a warning. +pipeline make up the HTTP request body: + +![HTTP to](http-to.svg) + +Use [`save_http`](/reference/operators/save_http) with a write operator to send +pipeline data as an HTTP request body. -![HTTP from](http-to.svg) +## SSL/TLS -In both cases, you can only provide static header data. +All HTTP operators support TLS. Pass `tls={}` to enable TLS with defaults, or +provide specific options: -Use from_http to perform HTTP requests or -run an HTTP server. This operator automatically tries to infer the format from the -`Content-Type` header. For sending, use -save_http with a write operator. +```tql +from_http "https://api.example.com/data", + tls={ + certfile: "/path/to/client.crt", + keyfile: "/path/to/client.key", + } +``` + +For server-side TLS with [`accept_http`](/reference/operators/accept_http): + +```tql +accept_http "0.0.0.0:8443", + tls={ + certfile: "/path/to/server.crt", + keyfile: "/path/to/server.key", + } { + read_json +} +``` ## Examples -### Perform a GET request with URL parameters +### Fetch JSON from an API ```tql -from_http "http://example.com:8888/api?query=tenzir" +from_http "https://api.example.com/data?query=tenzir" ``` -### Perform a POST request with JSON body +### Send a POST request with JSON body ```tql -from_http "http://example.com:8888/api", method="post", body={query: "tenzir"} +from_http "https://api.example.com/data", + method="post", + body={query: "tenzir"} ``` -### Call a webhook API with pipeline data +### Enrich events with API lookups ```tql -from { - x: 42, - y: "foo", +from {domain: "example.com"} +http f"https://api.example.com/lookup?q={domain}", + response_field=result +``` + +### Accept webhook events + +```tql +accept_http "0.0.0.0:8080" { + read_json + where $request.path == "/webhook" } +``` + +### Paginate through an API + +```tql +from_http "https://api.github.com/repos/tenzir/tenzir/issues?per_page=10", + paginate="link" +``` + +### Push data to a remote API + +```tql +from {x: 42, y: "foo"} write_json -save_http "http://example.com:8888/api", method="POST" +save_http "https://api.example.com/ingest", method="POST" ``` From 148b25f65f05eae99af47434036132028f30a893 Mon Sep 17 00:00:00 2001 From: Matthias Vallentin Date: Wed, 18 Feb 2026 09:40:59 +0100 Subject: [PATCH 04/12] Trim HTTP integration page to high-level overview Remove duplicated examples and link to the guide instead. Keep the page focused on explaining what operators exist and when to use each. Co-Authored-By: Claude Opus 4.6 --- src/content/docs/integrations/http.mdx | 74 ++------------------------ 1 file changed, 5 insertions(+), 69 deletions(-) diff --git a/src/content/docs/integrations/http.mdx b/src/content/docs/integrations/http.mdx index 688eb7922..cdf3369db 100644 --- a/src/content/docs/integrations/http.mdx +++ b/src/content/docs/integrations/http.mdx @@ -20,6 +20,10 @@ request, or [`http`](/reference/operators/http) to enrich events flowing through a pipeline with HTTP responses. Both operators automatically infer the response format from the URL extension or `Content-Type` header. +See the [Fetch via HTTP and APIs](/guides/collecting/fetch-via-http-and-apis) +guide for practical examples covering authentication, pagination, error +handling, and data enrichment. + ## Accepting Incoming Requests Use [`accept_http`](/reference/operators/accept_http) to spin up an HTTP server @@ -40,72 +44,4 @@ pipeline data as an HTTP request body. ## SSL/TLS All HTTP operators support TLS. Pass `tls={}` to enable TLS with defaults, or -provide specific options: - -```tql -from_http "https://api.example.com/data", - tls={ - certfile: "/path/to/client.crt", - keyfile: "/path/to/client.key", - } -``` - -For server-side TLS with [`accept_http`](/reference/operators/accept_http): - -```tql -accept_http "0.0.0.0:8443", - tls={ - certfile: "/path/to/server.crt", - keyfile: "/path/to/server.key", - } { - read_json -} -``` - -## Examples - -### Fetch JSON from an API - -```tql -from_http "https://api.example.com/data?query=tenzir" -``` - -### Send a POST request with JSON body - -```tql -from_http "https://api.example.com/data", - method="post", - body={query: "tenzir"} -``` - -### Enrich events with API lookups - -```tql -from {domain: "example.com"} -http f"https://api.example.com/lookup?q={domain}", - response_field=result -``` - -### Accept webhook events - -```tql -accept_http "0.0.0.0:8080" { - read_json - where $request.path == "/webhook" -} -``` - -### Paginate through an API - -```tql -from_http "https://api.github.com/repos/tenzir/tenzir/issues?per_page=10", - paginate="link" -``` - -### Push data to a remote API - -```tql -from {x: 42, y: "foo"} -write_json -save_http "https://api.example.com/ingest", method="POST" -``` +provide a record with specific options like `certfile` and `keyfile`. From 594f29427f2eee3d3425670ed06c0ce1b411fbca Mon Sep 17 00:00:00 2001 From: Matthias Vallentin Date: Wed, 18 Feb 2026 09:42:11 +0100 Subject: [PATCH 05/12] Remove legacy save_http/load_http from HTTP integration page Co-Authored-By: Claude Opus 4.6 --- src/content/docs/integrations/http.mdx | 10 ---------- 1 file changed, 10 deletions(-) diff --git a/src/content/docs/integrations/http.mdx b/src/content/docs/integrations/http.mdx index cdf3369db..442f3c544 100644 --- a/src/content/docs/integrations/http.mdx +++ b/src/content/docs/integrations/http.mdx @@ -31,16 +31,6 @@ that turns incoming requests into pipeline events. This is useful for receiving webhooks, building custom API endpoints, or ingesting data pushed by external systems. -## Sending Data to APIs - -When sending data from a pipeline to an API or website, the events in the -pipeline make up the HTTP request body: - -![HTTP to](http-to.svg) - -Use [`save_http`](/reference/operators/save_http) with a write operator to send -pipeline data as an HTTP request body. - ## SSL/TLS All HTTP operators support TLS. Pass `tls={}` to enable TLS with defaults, or From 429720ea6ea7bcc56566f962715c33426137a436 Mon Sep 17 00:00:00 2001 From: Matthias Vallentin Date: Wed, 18 Feb 2026 09:46:09 +0100 Subject: [PATCH 06/12] Clarify from_http vs http and fix pagination examples Add a section explaining that from_http and http are both HTTP clients with nearly identical options, differing only in being a source vs transformation operator. Replace nonsensical `from {} / http` patterns with realistic examples that show http enriching actual input data. Co-Authored-By: Claude Opus 4.6 --- .../collecting/fetch-via-http-and-apis.mdx | 64 +++++++++---------- 1 file changed, 30 insertions(+), 34 deletions(-) diff --git a/src/content/docs/guides/collecting/fetch-via-http-and-apis.mdx b/src/content/docs/guides/collecting/fetch-via-http-and-apis.mdx index 4f2825f36..312e3ff69 100644 --- a/src/content/docs/guides/collecting/fetch-via-http-and-apis.mdx +++ b/src/content/docs/guides/collecting/fetch-via-http-and-apis.mdx @@ -7,32 +7,33 @@ This guide shows you how to fetch data from HTTP APIs using the http operators. You'll learn to make GET requests, handle authentication, and implement pagination for large result sets. +## Choosing the Right Operator + +Tenzir has two HTTP client operators that share nearly identical options: + +- [`from_http`](/reference/operators/from_http) is a **source** operator that + starts a pipeline with an HTTP request. Use it for standalone API calls. +- [`http`](/reference/operators/http) is a **transformation** operator that + enriches events flowing through a pipeline with HTTP responses. Use it when + you have existing data and want to make per-event API lookups. + +Most examples in this guide use `from_http`. Unless noted otherwise, the same +options work with `http` as well. + ## Basic API Requests Start with these fundamental patterns for making HTTP requests to APIs. ### Simple GET Requests -To fetch data from an API endpoint, pass the URL as the first parameter to the -`from_http` operator: +To fetch data from an API endpoint, pass the URL as the first parameter: ```tql from_http "https://api.example.com/data" ``` The operator makes a GET request by default and forwards the response as an -event. The `from_http` operator is an input operator, i.e., it starts a -pipeline. The companion operator `http` is a transformation, allowing you to -specify the URL as a field by referencing an event field that contains the URL: - -```tql -from {url: "https://api.example.com/data"} -http url -``` - -This pattern is useful when processing multiple URLs or when URLs are generated -dynamically. Most of our subsequent examples use `from_http`, as the operator -options are very similar. +event. ### Parsing the HTTP Response Body @@ -264,34 +265,29 @@ http url, paginate="link" ### Lambda-Based Pagination -The [`http`](/reference/operators/http) operator supports lambda-based -pagination for APIs that use custom pagination schemes. Provide a lambda -function to the `paginate` parameter that extracts the next page URL from the -response: +The [`http`](/reference/operators/http) operator additionally supports +lambda-based pagination for APIs with custom pagination schemes. Provide a +lambda function to the `paginate` parameter that extracts the next page URL from +the response: ```tql -from {} -http "https://api.example.com/search?q=query", +from {query: "tenzir"} +http f"https://api.example.com/search?q={query}", paginate=(x => x.next_url if x.has_more) ``` -The operator continues making requests as long as the pagination lambda function -returns a valid URL. +The operator continues making requests as long as the pagination lambda returns +a valid URL. -### Complex Pagination Logic - -Handle APIs with custom pagination schemes by building pagination URLs -dynamically using expressions that reference response data: +You can also build pagination URLs dynamically: ```tql -let $base_url = "https://api.example.com/items" -from {} -http f"{$base_url}?page=1", - paginate=(x => f"{$base_url}?page={x.page + 1}" if x.page < x.total_pages) +let $base = "https://api.example.com/items" +from {category: "security"} +http f"{$base}?category={category}&page=1", + paginate=(x => f"{$base}?category={category}&page={x.page + 1}" if x.page < x.total_pages) ``` -This example builds pagination URLs dynamically based on response data. - ### Rate Limiting Control request frequency by configuring the `paginate_delay` parameter to add @@ -299,8 +295,8 @@ delays between requests and the `parallel` parameter to limit concurrent requests: ```tql -from {url: "https://api.example.com/data"} -http url, +from {domain: "example.com"} +http f"https://api.example.com/scan?q={domain}", paginate=(x => x.next_url if x.has_next), paginate_delay=500ms, parallel=2 From 6730eb99ec33d1f08850140a6634146a137ae5f2 Mon Sep 17 00:00:00 2001 From: Matthias Vallentin Date: Wed, 18 Feb 2026 12:29:00 +0100 Subject: [PATCH 07/12] Add to_http operator docs and cross-link HTTP operators Co-Authored-By: Claude Opus 4.6 --- src/content/docs/integrations/http.mdx | 10 + .../docs/reference/operators/accept_http.mdx | 1 + .../docs/reference/operators/from_http.mdx | 1 + src/content/docs/reference/operators/http.mdx | 1 + .../docs/reference/operators/to_http.mdx | 221 ++++++++++++++++++ 5 files changed, 234 insertions(+) create mode 100644 src/content/docs/reference/operators/to_http.mdx diff --git a/src/content/docs/integrations/http.mdx b/src/content/docs/integrations/http.mdx index 442f3c544..4b0b7f4d5 100644 --- a/src/content/docs/integrations/http.mdx +++ b/src/content/docs/integrations/http.mdx @@ -24,6 +24,16 @@ See the [Fetch via HTTP and APIs](/guides/collecting/fetch-via-http-and-apis) guide for practical examples covering authentication, pagination, error handling, and data enrichment. +## Sending Data to Webhooks and APIs + +Use [`to_http`](/reference/operators/to_http) to send events as HTTP requests to +a webhook or API endpoint. Each input event is sent as a separate request, with +the event JSON-encoded as the body by default. This is useful for pushing alerts +to webhooks, forwarding events to SIEMs, or calling external APIs for each +event. + +![HTTP to](http-to.svg) + ## Accepting Incoming Requests Use [`accept_http`](/reference/operators/accept_http) to spin up an HTTP server diff --git a/src/content/docs/reference/operators/accept_http.mdx b/src/content/docs/reference/operators/accept_http.mdx index 8b96c5b0c..ddb73d516 100644 --- a/src/content/docs/reference/operators/accept_http.mdx +++ b/src/content/docs/reference/operators/accept_http.mdx @@ -141,6 +141,7 @@ accept_http "0.0.0.0:8443", - from_http - http +- to_http - serve - collecting/fetch-via-http-and-apis - http diff --git a/src/content/docs/reference/operators/from_http.mdx b/src/content/docs/reference/operators/from_http.mdx index 242ef6d51..b8e0d2eee 100644 --- a/src/content/docs/reference/operators/from_http.mdx +++ b/src/content/docs/reference/operators/from_http.mdx @@ -147,6 +147,7 @@ This tries up to 3 times, waiting 2 seconds between each retry. - accept_http - http +- to_http - serve - collecting/fetch-via-http-and-apis - enrichment/enrich-with-threat-intel diff --git a/src/content/docs/reference/operators/http.mdx b/src/content/docs/reference/operators/http.mdx index 2e8ea0fdf..1a9730d47 100644 --- a/src/content/docs/reference/operators/http.mdx +++ b/src/content/docs/reference/operators/http.mdx @@ -195,4 +195,5 @@ header with `rel=next`, such as GitHub, GitLab, and Jira. - accept_http - from_http +- to_http - collecting/fetch-via-http-and-apis diff --git a/src/content/docs/reference/operators/to_http.mdx b/src/content/docs/reference/operators/to_http.mdx new file mode 100644 index 000000000..a09c679b9 --- /dev/null +++ b/src/content/docs/reference/operators/to_http.mdx @@ -0,0 +1,221 @@ +--- +title: to_http +category: Outputs/Events +example: 'to_http "https://example.com/webhook"' +--- + +import Op from '@components/see-also/Op.astro'; +import Guide from '@components/see-also/Guide.astro'; +import Integration from '@components/see-also/Integration.astro'; + +Sends events as HTTP requests to a webhook or API endpoint. + +```tql +to_http url:string, [method=string, body=record|string|blob, encode=string, + headers=record, paginate=string, paginate_delay=duration, + parallel=int, tls=record, connection_timeout=duration, + max_retry_count=int, retry_delay=duration] + { … } +``` + +## Description + +The `to_http` operator sends each input event as an HTTP request to a webhook or +API endpoint. By default, it JSON-encodes the entire event as the request body +and sends it as a POST request. + +The operator is fire-and-forget: non-success HTTP status codes do not cause +pipeline errors. Use the hook pipeline to observe and react to responses. + +### `url: string` + +URL to send the request to. This is an expression evaluated per event, so you +can use field values to construct the URL dynamically. + +### `method = string (optional)` + +One of the following HTTP methods to use: + +- `get` +- `head` +- `post` +- `put` +- `del` +- `connect` +- `options` +- `trace` + +Defaults to `post`. + +### `body = blob|record|string (optional)` + +Body to send with the HTTP request. + +If the value is a `record`, then the body is encoded according to the `encode` +option and an appropriate `Content-Type` is set for the request. + +If not specified, the entire input event is JSON-encoded as the request body. + +### `encode = string (optional)` + +Specifies how to encode `record` bodies. Supported values: + +- `json` +- `form` + +Defaults to `json`. + +### `headers = record (optional)` + +Record of headers to send with the request. This is an expression evaluated per +event, so you can use field values. + +### `paginate = string (optional)` + +The string `"link"` to automatically follow pagination links in the HTTP `Link` +response header per +[RFC 8288](https://datatracker.ietf.org/doc/html/rfc8288). The operator parses +`Link` headers and follows the `rel=next` relation to fetch the next page. +Pagination stops when the response no longer contains a `rel=next` link or when +a non-success status code is received. + +### `paginate_delay = duration (optional)` + +The duration to wait between consecutive pagination requests. + +Defaults to `0s`. + +### `parallel = int (optional)` + +Maximum number of requests that can be in progress at any time. + +Defaults to `1`. + +import TLSOptions from '@partials/operators/TLSOptions.mdx'; + + + +### `connection_timeout = duration (optional)` + +Timeout for the connection. + +Defaults to `5s`. + +### `max_retry_count = int (optional)` + +The maximum times to retry a failed request. Every request has its own retry +count. + +Defaults to `0`. + +### `retry_delay = duration (optional)` + +The duration to wait between each retry. + +Defaults to `1s`. + +### `{ … } (optional)` + +A hook pipeline that runs for each completed HTTP request. Use this to observe +responses, log failures, or trigger side effects. + +The hook pipeline receives events with the following fields: + +| Field | Type | Description | +| :--------- | :------- | :------------------------------------------------- | +| `request` | `record` | The original input event. | +| `url` | `string` | The URL the request was sent to. | +| `ok` | `bool` | `true` if the status code is in the 200–399 range. | +| `response` | `record` | The HTTP response (see below). Present on reply. | +| `error` | `string` | Error message. Present on connection failure. | + +The `response` record contains: + +| Field | Type | Description | +| :-------- | :------- | :------------------------------------ | +| `code` | `uint64` | The HTTP status code of the response. | +| `headers` | `record` | The response headers. | +| `body` | `blob` | The response body. | + +The hook pipeline must return events. It cannot be a sink. + +## Examples + +### Send events to a webhook + +Send each event as a JSON POST request: + +```tql +from {message: "hello", severity: "info"} +to_http "https://example.com/webhook" +``` + +The entire event is JSON-encoded as the request body. + +### Use a custom body + +Override the default body with a string: + +```tql +from {foo: "bar"} +to_http "https://example.com/api", body="custom-payload" +``` + +### Send form-encoded data + +```tql +from {user: "alice"} +to_http "https://example.com/api", + body={name: "alice", role: "admin"}, + encode="form" +``` + +### Set a custom method and headers + +```tql +from {foo: "bar"} +to_http "https://example.com/api", + method="put", + headers={"X-Custom": "value"} +``` + +### Observe responses with a hook pipeline + +Use a hook pipeline to react to HTTP responses: + +```tql +from {alert: "intrusion detected"} +to_http "https://siem.example.com/ingest" { + where not ok + publish "failed-webhooks" +} +``` + +The hook pipeline receives one event per HTTP request with the response details. +In this example, failed requests are published to a topic for monitoring. + +### Send events with TLS + +```tql +from {data: "sensitive"} +to_http "https://secure.example.com/api", + tls={skip_peer_verification: true} +``` + +### Send requests in parallel + +Increase throughput by sending multiple requests concurrently: + +```tql +load_file "events.json" +read_json +to_http "https://example.com/ingest", parallel=4 +``` + +## See Also + +- from_http +- http +- accept_http +- collecting/fetch-via-http-and-apis +- http From 964a4b5a9ed85d4c06f8b89a55e781546bc04eb2 Mon Sep 17 00:00:00 2001 From: Matthias Vallentin Date: Wed, 18 Feb 2026 15:18:07 +0100 Subject: [PATCH 08/12] Remove hook pipeline from to_http docs Co-Authored-By: Claude Opus 4.6 --- .../docs/reference/operators/to_http.mdx | 43 +------------------ 1 file changed, 1 insertion(+), 42 deletions(-) diff --git a/src/content/docs/reference/operators/to_http.mdx b/src/content/docs/reference/operators/to_http.mdx index a09c679b9..2de375be3 100644 --- a/src/content/docs/reference/operators/to_http.mdx +++ b/src/content/docs/reference/operators/to_http.mdx @@ -15,7 +15,6 @@ to_http url:string, [method=string, body=record|string|blob, encode=string, headers=record, paginate=string, paginate_delay=duration, parallel=int, tls=record, connection_timeout=duration, max_retry_count=int, retry_delay=duration] - { … } ``` ## Description @@ -25,7 +24,7 @@ API endpoint. By default, it JSON-encodes the entire event as the request body and sends it as a POST request. The operator is fire-and-forget: non-success HTTP status codes do not cause -pipeline errors. Use the hook pipeline to observe and react to responses. +pipeline errors. ### `url: string` @@ -114,31 +113,6 @@ The duration to wait between each retry. Defaults to `1s`. -### `{ … } (optional)` - -A hook pipeline that runs for each completed HTTP request. Use this to observe -responses, log failures, or trigger side effects. - -The hook pipeline receives events with the following fields: - -| Field | Type | Description | -| :--------- | :------- | :------------------------------------------------- | -| `request` | `record` | The original input event. | -| `url` | `string` | The URL the request was sent to. | -| `ok` | `bool` | `true` if the status code is in the 200–399 range. | -| `response` | `record` | The HTTP response (see below). Present on reply. | -| `error` | `string` | Error message. Present on connection failure. | - -The `response` record contains: - -| Field | Type | Description | -| :-------- | :------- | :------------------------------------ | -| `code` | `uint64` | The HTTP status code of the response. | -| `headers` | `record` | The response headers. | -| `body` | `blob` | The response body. | - -The hook pipeline must return events. It cannot be a sink. - ## Examples ### Send events to a webhook @@ -179,21 +153,6 @@ to_http "https://example.com/api", headers={"X-Custom": "value"} ``` -### Observe responses with a hook pipeline - -Use a hook pipeline to react to HTTP responses: - -```tql -from {alert: "intrusion detected"} -to_http "https://siem.example.com/ingest" { - where not ok - publish "failed-webhooks" -} -``` - -The hook pipeline receives one event per HTTP request with the response details. -In this example, failed requests are published to a topic for monitoring. - ### Send events with TLS ```tql From ba40348f4ef4c3bcc38f02e13f589e5db93e7e65 Mon Sep 17 00:00:00 2001 From: Matthias Vallentin Date: Wed, 18 Feb 2026 17:09:03 +0100 Subject: [PATCH 09/12] Add serve_http docs and expose-data-as-server guide Co-Authored-By: Claude Opus 4.6 --- .../guides/routing/expose-data-as-server.mdx | 93 ++++++++++++ src/content/docs/integrations/http.mdx | 21 ++- .../docs/reference/operators/accept_http.mdx | 1 + .../docs/reference/operators/from_http.mdx | 1 + src/content/docs/reference/operators/http.mdx | 2 + .../docs/reference/operators/serve_http.mdx | 132 ++++++++++++++++++ .../docs/reference/operators/to_http.mdx | 1 + src/sidebar.ts | 1 + 8 files changed, 247 insertions(+), 5 deletions(-) create mode 100644 src/content/docs/guides/routing/expose-data-as-server.mdx create mode 100644 src/content/docs/reference/operators/serve_http.mdx diff --git a/src/content/docs/guides/routing/expose-data-as-server.mdx b/src/content/docs/guides/routing/expose-data-as-server.mdx new file mode 100644 index 000000000..dd106c33a --- /dev/null +++ b/src/content/docs/guides/routing/expose-data-as-server.mdx @@ -0,0 +1,93 @@ +--- +title: Expose data as a server +--- + +import Op from '@components/see-also/Op.astro'; + +This guide shows you how to make pipeline data available to external consumers +by spinning up servers. You'll learn to stream events over HTTP and configure +server endpoints for different use cases. + +## Spin up an HTTP server + +Use [`serve_http`](/reference/operators/serve_http) at the end of a pipeline to +start an HTTP server that streams events as NDJSON to connected clients: + +```tql +from_file "example.yaml" +serve_http "0.0.0.0:8080" +``` + +Any HTTP client connecting to `http://host:8080/` receives a continuous NDJSON +stream. Each event is JSON-encoded on a single line, separated by newlines: + +```bash +curl http://localhost:8080/ +``` + +```json +{"timestamp":"2025-01-15T10:30:00Z","src_ip":"192.168.1.100","event":"login"} +{"timestamp":"2025-01-15T10:30:01Z","src_ip":"10.0.0.50","event":"file_access"} +``` + +Multiple clients can connect simultaneously — each receives a copy of every +event. + +### Custom path and method + +By default, clients connect with a GET request to `/`. Customize both: + +```tql +from_file "alerts.json" +serve_http "0.0.0.0:9090", path="/alerts", method="post" +``` + +Clients must now POST to `/alerts` to receive the stream. Requests to other +paths return a `404` response. + +### Health check endpoints + +Production deployments often need health check endpoints for load balancers +or monitoring. Use the `responses` option to serve static content alongside +the event stream: + +```tql +subscribe "my-topic" +serve_http "0.0.0.0:8080", + responses={ + "/health": {code: 200, content_type: "text/plain", body: "ok"}, + "/ready": {code: 200, content_type: "application/json", body: "{\"status\":\"ready\"}"}, + } +``` + +Clients hitting `/health` or `/ready` get the static response immediately. +Clients connecting to `/` (the default stream path) receive the event stream. + +### Connection limits + +Control the maximum number of simultaneous client connections: + +```tql +from_file "data.csv" +serve_http "0.0.0.0:8080", max_connections=10 +``` + +Connections beyond the limit are rejected with a `503` response. + +### TLS encryption + +Serve data over HTTPS by providing TLS certificates: + +```tql +from_file "secret.json" +serve_http "0.0.0.0:8443", + tls={ + certfile: "/path/to/cert.pem", + keyfile: "/path/to/key.pem", + } +``` + +## See Also + +- serve_http +- to_http diff --git a/src/content/docs/integrations/http.mdx b/src/content/docs/integrations/http.mdx index 4b0b7f4d5..3fcd53c41 100644 --- a/src/content/docs/integrations/http.mdx +++ b/src/content/docs/integrations/http.mdx @@ -5,10 +5,11 @@ sidebar: --- [HTTP](https://en.wikipedia.org/wiki/HTTP) is the foundation of data exchange -on the web. Tenzir provides operators for both sides of an HTTP conversation: -fetching data from APIs and accepting incoming requests. +on the web. Tenzir provides operators for all sides of an HTTP conversation: +fetching data from APIs, sending events to webhooks, streaming data to clients, +and accepting incoming requests. -## Fetching Data from APIs +## Fetching data from APIs When retrieving data from an API or website, you prepare your HTTP request and get back the HTTP response body as your pipeline data: @@ -24,7 +25,7 @@ See the [Fetch via HTTP and APIs](/guides/collecting/fetch-via-http-and-apis) guide for practical examples covering authentication, pagination, error handling, and data enrichment. -## Sending Data to Webhooks and APIs +## Sending data to webhooks and APIs Use [`to_http`](/reference/operators/to_http) to send events as HTTP requests to a webhook or API endpoint. Each input event is sent as a separate request, with @@ -34,7 +35,17 @@ event. ![HTTP to](http-to.svg) -## Accepting Incoming Requests +## Streaming data to HTTP clients + +Use [`serve_http`](/reference/operators/serve_http) to start an HTTP server that +streams pipeline events as NDJSON to connected clients. Each client receives a +copy of every event. This is useful when external systems need to pull data from +your pipeline over HTTP. + +See the [Expose data as a server](/guides/routing/expose-data-as-server) guide +for practical examples covering health checks, connection limits, and TLS. + +## Accepting incoming requests Use [`accept_http`](/reference/operators/accept_http) to spin up an HTTP server that turns incoming requests into pipeline events. This is useful for receiving diff --git a/src/content/docs/reference/operators/accept_http.mdx b/src/content/docs/reference/operators/accept_http.mdx index ddb73d516..c78096a49 100644 --- a/src/content/docs/reference/operators/accept_http.mdx +++ b/src/content/docs/reference/operators/accept_http.mdx @@ -142,6 +142,7 @@ accept_http "0.0.0.0:8443", - from_http - http - to_http +- serve_http - serve - collecting/fetch-via-http-and-apis - http diff --git a/src/content/docs/reference/operators/from_http.mdx b/src/content/docs/reference/operators/from_http.mdx index b8e0d2eee..accba435a 100644 --- a/src/content/docs/reference/operators/from_http.mdx +++ b/src/content/docs/reference/operators/from_http.mdx @@ -148,6 +148,7 @@ This tries up to 3 times, waiting 2 seconds between each retry. - accept_http - http - to_http +- serve_http - serve - collecting/fetch-via-http-and-apis - enrichment/enrich-with-threat-intel diff --git a/src/content/docs/reference/operators/http.mdx b/src/content/docs/reference/operators/http.mdx index 1a9730d47..fce16f782 100644 --- a/src/content/docs/reference/operators/http.mdx +++ b/src/content/docs/reference/operators/http.mdx @@ -196,4 +196,6 @@ header with `rel=next`, such as GitHub, GitLab, and Jira. - accept_http - from_http - to_http +- serve_http - collecting/fetch-via-http-and-apis +- routing/expose-data-as-server diff --git a/src/content/docs/reference/operators/serve_http.mdx b/src/content/docs/reference/operators/serve_http.mdx new file mode 100644 index 000000000..a042f0ae0 --- /dev/null +++ b/src/content/docs/reference/operators/serve_http.mdx @@ -0,0 +1,132 @@ +--- +title: serve_http +category: Outputs/Events +example: 'serve_http "0.0.0.0:8080"' +--- + +import Op from '@components/see-also/Op.astro'; +import Guide from '@components/see-also/Guide.astro'; +import Integration from '@components/see-also/Integration.astro'; + +Starts an HTTP server that streams events as NDJSON to connected clients. + +```tql +serve_http url:string, [path=string, method=string, responses=record, + max_connections=int, tls=record] +``` + +## Description + +The `serve_http` operator starts an HTTP server and streams pipeline events as +[NDJSON](https://github.com/ndjson/ndjson-spec) to any HTTP client that +connects. Each connected client receives a copy of every event. + +The operator waits for at least one client to connect before delivering data. +When the pipeline finishes, the server shuts down and all client connections are +closed. + +Clients that connect to a path other than the stream path, or use the wrong HTTP +method, receive a `404` or `405` response respectively. Use the `responses` +option to serve static content on additional paths, such as health checks. + +### `url: string` + +The endpoint to listen on. Must have the form `:`. Use `0.0.0.0` to +accept connections on all interfaces. + +### `path = string (optional)` + +The URL path that clients connect to for the event stream. + +Defaults to `"/"`. + +### `method = string (optional)` + +The HTTP method that clients must use to connect to the stream. + +Defaults to `"GET"`. + +### `responses = record (optional)` + +Serve fixed responses on auxiliary paths that are separate from the event stream. +This is useful for health checks or status endpoints. For example: + +```tql +responses={ + "/health": {code: 200, content_type: "text/plain", body: "ok"}, +} +``` + +Clients hitting `/health` receive the static `body` defined here, while clients +connecting to the stream path receive the live NDJSON event stream from the +pipeline. + +Each route must be a record with `code`, `content_type`, and `body` fields. The +stream path itself cannot appear in `responses`. + +### `max_connections = int (optional)` + +The maximum number of simultaneous client connections to accept. + +Defaults to `128`. + +import TLSOptions from '@partials/operators/TLSOptions.mdx'; + + + +## Examples + +### Stream events to HTTP clients + +Serve events on port 8080. Any HTTP client connecting to `http://host:8080/` +receives the events as NDJSON: + +```tql +export +serve_http "0.0.0.0:8080" +``` + +Connect with `curl`: + +```bash +curl http://localhost:8080/ +``` + +### Serve on a custom path and method + +Require clients to POST to `/events`: + +```tql +export +serve_http "0.0.0.0:8080", path="/events", method="post" +``` + +### Add a health check endpoint + +Expose a health endpoint alongside the event stream: + +```tql +export +serve_http "0.0.0.0:8080", + responses={ + "/health": {code: 200, content_type: "text/plain", body: "ok"}, + } +``` + +### Serve over HTTPS + +```tql +export +serve_http "0.0.0.0:8443", + tls={ + certfile: "/path/to/cert.pem", + keyfile: "/path/to/key.pem", + } +``` + +## See Also + +- accept_http +- to_http +- routing/expose-data-as-server +- http diff --git a/src/content/docs/reference/operators/to_http.mdx b/src/content/docs/reference/operators/to_http.mdx index 2de375be3..5dfa58c41 100644 --- a/src/content/docs/reference/operators/to_http.mdx +++ b/src/content/docs/reference/operators/to_http.mdx @@ -176,5 +176,6 @@ to_http "https://example.com/ingest", parallel=4 - from_http - http - accept_http +- serve_http - collecting/fetch-via-http-and-apis - http diff --git a/src/sidebar.ts b/src/sidebar.ts index 654c1391a..5bbaefa7e 100644 --- a/src/sidebar.ts +++ b/src/sidebar.ts @@ -148,6 +148,7 @@ export const guides = [ collapsed: true, items: [ "guides/routing/send-to-destinations", + "guides/routing/expose-data-as-server", "guides/routing/split-and-merge-streams", "guides/routing/load-balance-pipelines", ], From 44427c0dc7e40f9e3ce0d8d5e6fdfeac62e86c72 Mon Sep 17 00:00:00 2001 From: Matthias Vallentin Date: Thu, 19 Feb 2026 11:28:02 +0100 Subject: [PATCH 10/12] Document restored HTTP metadata and error-field behavior --- .../docs/reference/operators/accept_http.mdx | 22 +++++++++++++++++-- .../docs/reference/operators/from_http.mdx | 18 ++++++++++++--- src/content/docs/reference/operators/http.mdx | 11 +++++++++- 3 files changed, 45 insertions(+), 6 deletions(-) diff --git a/src/content/docs/reference/operators/accept_http.mdx b/src/content/docs/reference/operators/accept_http.mdx index c78096a49..42851685a 100644 --- a/src/content/docs/reference/operators/accept_http.mdx +++ b/src/content/docs/reference/operators/accept_http.mdx @@ -11,8 +11,8 @@ import Integration from '@components/see-also/Integration.astro'; Accepts incoming HTTP requests and forwards them as events. ```tql -accept_http url:string, [responses=record, max_request_size=int, - max_connections=int, tls=record] +accept_http url:string, [responses=record, metadata_field=field, + max_request_size=int, max_connections=int, tls=record] { … } ``` @@ -47,6 +47,24 @@ Each route must be a record with `code`, `content_type`, and `body` fields. Requests to an unspecified endpoint are responded with HTTP Status `200 OK`. +### `metadata_field = field (optional)` + +Field to insert request metadata into emitted events. + +The inserted metadata has the same shape as `$request`: + +| Field | Type | Description | +| :--------- | :------- | :----------------------------------- | +| `headers` | `record` | The request headers. | +| `query` | `record` | The query parameters of the request. | +| `path` | `string` | The path requested. | +| `fragment` | `string` | The URI fragment of the request. | +| `method` | `string` | The HTTP method of the request. | +| `version` | `string` | The HTTP version of the request. | +| `body` | `blob` | The raw request body. | + +This works both with and without a parsing sub-pipeline. + ### `max_request_size = int (optional)` The maximum size of an incoming request to accept. diff --git a/src/content/docs/reference/operators/from_http.mdx b/src/content/docs/reference/operators/from_http.mdx index accba435a..6a4f7d67f 100644 --- a/src/content/docs/reference/operators/from_http.mdx +++ b/src/content/docs/reference/operators/from_http.mdx @@ -8,8 +8,8 @@ Sends an HTTP/1.1 request and returns the response as events. ```tql from_http url:string, [method=string, body=record|string|blob, encode=string, - headers=record, error_field=field, paginate=string, - paginate_delay=duration, connection_timeout=duration, + headers=record, metadata_field=field, error_field=field, + paginate=string, paginate_delay=duration, connection_timeout=duration, max_retry_count=int, retry_delay=duration, tls=record] { … } ``` @@ -36,13 +36,25 @@ import HTTPClientOptions from '@partials/operators/HTTPClientOptions.mdx'; +### `metadata_field = field (optional)` + +Field to insert response metadata into emitted events. + +The metadata has the following schema: + +| Field | Type | Description | +| :-------- | :------- | :------------------------------------ | +| `code` | `uint64` | The HTTP status code of the response. | +| `headers` | `record` | The response headers. | + ### `error_field = field (optional)` Field to insert the response body for HTTP error responses (status codes not in the 2xx or 3xx range). When set, any HTTP response with a status code outside the 200–399 range will have its body stored in this field as a `blob`. Otherwise, error responses are -skipped and an error is emitted. +skipped and an error is emitted. If `metadata_field` is also set, error events +also include response metadata in that field. import TLSOptions from '@partials/operators/TLSOptions.mdx'; diff --git a/src/content/docs/reference/operators/http.mdx b/src/content/docs/reference/operators/http.mdx index fce16f782..4ccf7584e 100644 --- a/src/content/docs/reference/operators/http.mdx +++ b/src/content/docs/reference/operators/http.mdx @@ -46,7 +46,7 @@ Defaults to `this`. ### `metadata_field = field (optional)` -Field to insert metadata into when using the parsing pipeline. +Field to insert response metadata into emitted events. The metadata has the following schema: @@ -62,6 +62,8 @@ Field to insert the response body for HTTP error responses (status codes not in When set, any HTTP response with a status code outside the 200–399 range will have its body stored in this field as a `blob`. Otherwise, error responses, alongside the original event, are skipped and a warning is emitted. +If `metadata_field` is also set, emitted error events also include response +metadata. ### `parallel = int (optional)` @@ -77,6 +79,13 @@ A pipeline that receives the response body as bytes, allowing parsing per request. This is especially useful in scenarios where the response body can be parsed into multiple events. +Inside the pipeline, the following dollar variables are available: + +| Variable | Type | Description | +| :---------- | :------- | :-------------------------- | +| `$request` | `record` | The current input event. | +| `$response` | `record` | The HTTP response metadata. | + If not provided, the operator will attempt to infer the parsing operator from the `Content-Type` header. Should this inference fail (e.g., unsupported or missing `Content-Type`), the operator raises a warning and skips the request. From 8f439d5ad3b485324d9441a9afee95f6c4de6969 Mon Sep 17 00:00:00 2001 From: Matthias Vallentin Date: Mon, 16 Mar 2026 20:11:17 +0100 Subject: [PATCH 11/12] Document serve_http backlog policy Document the new `on_backlog` option for `serve_http` and explain the `buffer`, `block`, and `drop` behaviors. Add an example that shows how to apply backpressure until a client connects. --- .../docs/reference/operators/serve_http.mdx | 33 ++++++++++++++++--- 1 file changed, 28 insertions(+), 5 deletions(-) diff --git a/src/content/docs/reference/operators/serve_http.mdx b/src/content/docs/reference/operators/serve_http.mdx index a042f0ae0..644346f43 100644 --- a/src/content/docs/reference/operators/serve_http.mdx +++ b/src/content/docs/reference/operators/serve_http.mdx @@ -12,18 +12,20 @@ Starts an HTTP server that streams events as NDJSON to connected clients. ```tql serve_http url:string, [path=string, method=string, responses=record, - max_connections=int, tls=record] + on_backlog=string, max_connections=int, tls=record] ``` ## Description The `serve_http` operator starts an HTTP server and streams pipeline events as [NDJSON](https://github.com/ndjson/ndjson-spec) to any HTTP client that -connects. Each connected client receives a copy of every event. +connects. Each connected client receives a copy of every event that the server +delivers while that client is connected. -The operator waits for at least one client to connect before delivering data. -When the pipeline finishes, the server shuts down and all client connections are -closed. +By default, the operator buffers events while no clients are connected. When +the pipeline finishes, the server shuts down and all client connections are +closed. Use `on_backlog` to change what happens when the pipeline produces +events while no clients are connected. Clients that connect to a path other than the stream path, or use the wrong HTTP method, receive a `404` or `405` response respectively. Use the `responses` @@ -64,6 +66,18 @@ pipeline. Each route must be a record with `code`, `content_type`, and `body` fields. The stream path itself cannot appear in `responses`. +### `on_backlog = string (optional)` + +Controls what `serve_http` does when the pipeline produces events while no +clients are connected. + +Accepted values are: + +- `"buffer"`: Buffer events and replay them to the next client. This is the + default. +- `"block"`: Apply backpressure until a client connects. +- `"drop"`: Drop events until a client connects. + ### `max_connections = int (optional)` The maximum number of simultaneous client connections to accept. @@ -113,6 +127,15 @@ serve_http "0.0.0.0:8080", } ``` +### Block until a client connects + +Use `on_backlog="block"` when you need backpressure instead of buffering: + +```tql +export +serve_http "0.0.0.0:8080", on_backlog="block" +``` + ### Serve over HTTPS ```tql From 89e315c685c52b45dace726bb4d814008b4dc053 Mon Sep 17 00:00:00 2001 From: Matthias Vallentin Date: Mon, 16 Mar 2026 20:40:43 +0100 Subject: [PATCH 12/12] Use semantic components in updated docs --- .../collecting/fetch-via-http-and-apis.mdx | 47 ++++++++--------- .../guides/routing/expose-data-as-server.mdx | 4 +- src/content/docs/integrations/http.mdx | 40 +++++++------- src/content/docs/integrations/mysql.mdx | 5 +- src/content/docs/integrations/tcp.mdx | 52 ++++++++++++++++--- 5 files changed, 92 insertions(+), 56 deletions(-) diff --git a/src/content/docs/guides/collecting/fetch-via-http-and-apis.mdx b/src/content/docs/guides/collecting/fetch-via-http-and-apis.mdx index 312e3ff69..170fa9771 100644 --- a/src/content/docs/guides/collecting/fetch-via-http-and-apis.mdx +++ b/src/content/docs/guides/collecting/fetch-via-http-and-apis.mdx @@ -2,20 +2,19 @@ title: Fetch via HTTP and APIs --- -This guide shows you how to fetch data from HTTP APIs using the -from_http and -http operators. You'll learn to make GET -requests, handle authentication, and implement pagination for large result sets. +This guide shows you how to fetch data from HTTP APIs using the from_http +and http operators. You'll learn to make GET requests, handle +authentication, and implement pagination for large result sets. ## Choosing the Right Operator Tenzir has two HTTP client operators that share nearly identical options: -- [`from_http`](/reference/operators/from_http) is a **source** operator that - starts a pipeline with an HTTP request. Use it for standalone API calls. -- [`http`](/reference/operators/http) is a **transformation** operator that - enriches events flowing through a pipeline with HTTP responses. Use it when - you have existing data and want to make per-event API lookups. +- from_http is a **source** operator that starts a pipeline with an +HTTP request. Use it for standalone API calls. +- http is a **transformation** operator that enriches events flowing +through a pipeline with HTTP responses. Use it when you have existing data +and want to make per-event API lookups. Most examples in this guide use `from_http`. Unless noted otherwise, the same options work with `http` as well. @@ -146,8 +145,9 @@ from_http "https://api.example.com/data", headers={ ``` Headers help you authenticate with APIs and specify request formats. Use the -secret function to retrieve sensitive -API tokens, as in the above example. + +secret function to retrieve sensitive API tokens, as in the above +example. ### TLS and Security @@ -192,8 +192,8 @@ Use HTTP requests to enrich existing data with information from external APIs. ### Preserving Input Context Keep original event data while adding API responses by specifying the -`response_field` parameter on the [`http`](/reference/operators/http) operator to -control where the response is stored: +`response_field` parameter on the http operator to control where the +response is stored: ```tql from { @@ -224,8 +224,8 @@ With the `http` operator, use the `metadata_field` parameter instead: ```tql from {url: "https://api.example.com/status"} -http url, metadata_field=http_meta -where http_meta.code >= 200 and http_meta.code < 300 +http url, metadata_field=meta +where meta.code >= 200 and meta.code < 300 ``` ## Pagination and Bulk Processing @@ -263,12 +263,11 @@ from {url: "https://api.github.com/repos/tenzir/tenzir/issues?per_page=10"} http url, paginate="link" ``` -### Lambda-Based Pagination +### Lambda-based pagination -The [`http`](/reference/operators/http) operator additionally supports -lambda-based pagination for APIs with custom pagination schemes. Provide a -lambda function to the `paginate` parameter that extracts the next page URL from -the response: +The http operator also supports lambda-based pagination for APIs with +custom pagination schemes. Provide a lambda function to the `paginate` +parameter that extracts the next page URL from the response: ```tql from {query: "tenzir"} @@ -331,9 +330,9 @@ from_http "https://api.example.com/health" { ``` The above example parses the `Date` header from the HTTP response via -parse_time into a timestamp and then -compares it to the current wallclock time using the -now function. + +parse_time into a timestamp and then compares it to the current +wallclock time using the now function. ## Error Handling @@ -385,7 +384,7 @@ Follow these practices for reliable and efficient API integration: pipelines or `metadata_field` with `http` to check status codes and implement fallback logic. 5. **Secure credentials**. Access API keys and tokens via - [secrets](/explanations/secrets), not in code. + secrets, not in code. 6. **Monitor API usage**. Track response times and error rates for performance. 7. **Leverage automatic format inference**. Use descriptive file extensions in diff --git a/src/content/docs/guides/routing/expose-data-as-server.mdx b/src/content/docs/guides/routing/expose-data-as-server.mdx index dd106c33a..1545d8d5a 100644 --- a/src/content/docs/guides/routing/expose-data-as-server.mdx +++ b/src/content/docs/guides/routing/expose-data-as-server.mdx @@ -10,8 +10,8 @@ server endpoints for different use cases. ## Spin up an HTTP server -Use [`serve_http`](/reference/operators/serve_http) at the end of a pipeline to -start an HTTP server that streams events as NDJSON to connected clients: +Use serve_http at the end of a pipeline to start an HTTP server that +streams events as NDJSON to connected clients: ```tql from_file "example.yaml" diff --git a/src/content/docs/integrations/http.mdx b/src/content/docs/integrations/http.mdx index 3fcd53c41..0a2403040 100644 --- a/src/content/docs/integrations/http.mdx +++ b/src/content/docs/integrations/http.mdx @@ -16,41 +16,37 @@ get back the HTTP response body as your pipeline data: ![HTTP from](http-from.svg) -Use [`from_http`](/reference/operators/from_http) to issue a one-shot HTTP -request, or [`http`](/reference/operators/http) to enrich events flowing through -a pipeline with HTTP responses. Both operators automatically infer the response -format from the URL extension or `Content-Type` header. +Use from_http to issue a one-shot HTTP request, or http to +enrich events flowing through a pipeline with HTTP responses. Both operators +automatically infer the response format from the URL extension or +`Content-Type` header. -See the [Fetch via HTTP and APIs](/guides/collecting/fetch-via-http-and-apis) -guide for practical examples covering authentication, pagination, error -handling, and data enrichment. +See collecting/fetch-via-http-and-apis for practical examples +covering authentication, pagination, error handling, and data enrichment. ## Sending data to webhooks and APIs -Use [`to_http`](/reference/operators/to_http) to send events as HTTP requests to -a webhook or API endpoint. Each input event is sent as a separate request, with -the event JSON-encoded as the body by default. This is useful for pushing alerts -to webhooks, forwarding events to SIEMs, or calling external APIs for each -event. +Use to_http to send events as HTTP requests to a webhook or API +endpoint. Each input event is sent as a separate request, with the event +JSON-encoded as the body by default. This is useful for pushing alerts to +webhooks, forwarding events to SIEMs, or calling external APIs for each event. ![HTTP to](http-to.svg) ## Streaming data to HTTP clients -Use [`serve_http`](/reference/operators/serve_http) to start an HTTP server that -streams pipeline events as NDJSON to connected clients. Each client receives a -copy of every event. This is useful when external systems need to pull data from -your pipeline over HTTP. +Use serve_http to start an HTTP server that streams pipeline events as +NDJSON to connected clients. Each client receives a copy of every event. This +is useful when external systems need to pull data from your pipeline over HTTP. -See the [Expose data as a server](/guides/routing/expose-data-as-server) guide -for practical examples covering health checks, connection limits, and TLS. +See routing/expose-data-as-server for practical examples +covering health checks, connection limits, and TLS. ## Accepting incoming requests -Use [`accept_http`](/reference/operators/accept_http) to spin up an HTTP server -that turns incoming requests into pipeline events. This is useful for receiving -webhooks, building custom API endpoints, or ingesting data pushed by external -systems. +Use accept_http to spin up an HTTP server that turns incoming requests +into pipeline events. This is useful for receiving webhooks, building custom +API endpoints, or ingesting data pushed by external systems. ## SSL/TLS diff --git a/src/content/docs/integrations/mysql.mdx b/src/content/docs/integrations/mysql.mdx index 2199bb694..09e7192b3 100644 --- a/src/content/docs/integrations/mysql.mdx +++ b/src/content/docs/integrations/mysql.mdx @@ -3,7 +3,6 @@ title: MySQL --- import Op from '@components/see-also/Op.astro'; -import { Steps } from '@astrojs/starlight/components'; [MySQL](https://www.mysql.com/) is an open-source relational database management system widely used for web applications, data warehousing, and enterprise @@ -11,7 +10,9 @@ applications. ![MySQL Diagram](mysql.excalidraw) -Tenzir connects to MySQL over the network using the MySQL wire protocol. Tenzir communicates with MySQL via the host and port you specify in the from_mysql operator. This means: +Tenzir connects to MySQL over the network using the MySQL wire protocol. Tenzir +communicates with MySQL via the host and port you specify in the from_mysql +operator. This means: - **Network**: Tenzir and MySQL can run on the same machine (using `localhost`) or on different machines in the same network. You just need to make sure that diff --git a/src/content/docs/integrations/tcp.mdx b/src/content/docs/integrations/tcp.mdx index f6e1eda45..d3bb452fb 100644 --- a/src/content/docs/integrations/tcp.mdx +++ b/src/content/docs/integrations/tcp.mdx @@ -13,7 +13,9 @@ connections, and serving data to connected clients. Use the IP address `0.0.0.0` to listen on all available network interfaces. :::tip[URL support] -The URL schemes `tcp://` and `tcps://` dispatch to load_tcp and save_tcp for seamless URL-style use via from and to. +The URL schemes `tcp://` and `tcps://` dispatch to load_tcp and + +save_tcp for seamless URL-style use via from and to. ::: ## Connecting to remote endpoints @@ -38,8 +40,46 @@ See collecting/get-data-from-the-network for practical examples. ## SSL/TLS -All TCP operators support TLS via the `tls` option. Pass an empty record -(`tls={}`) for defaults, or provide specific options like `certfile` and -`keyfile`. - like `certfile` and -`keyfile`. +To enable TLS, pass `tls={}` to enable TLS with defaults or provide a record +with specific options like `certfile` and `keyfile`. Both from_tcp +and load_tcp support server-side TLS for accepting encrypted +connections. + +For testing purposes, you can quickly generate a self-signed certificate as +follows: + +```bash +openssl req -x509 -newkey rsa:2048 -keyout key_and_cert.pem -out key_and_cert.pem -days 365 -nodes +``` + +An easy way to test a TLS connection is to try connecting via OpenSSL: + +```bash +openssl s_client 127.0.0.1:443 +``` + +## Examples + +### Listen for incoming JSON over TCP + +```tql +from_tcp "0.0.0.0:8090" { + read_json +} +``` + +### Read data by connecting to a remote TCP server + +```tql +from "tcp://127.0.0.1:443", connect=true { + read_json +} +``` + +### Listen on localhost with TLS enabled + +```tql +from_tcp "127.0.0.1:443", tls={certfile: "cert.pem", keyfile: "key.pem"} { + read_json +} +```