feat: QWP ingestion protocol#130
Conversation
… columns Cover all column types over QWP/UDP transport with server-side validation: TimestampMicros, TimestampNanos, datetime (TIMESTAMP / TIMESTAMP_NS), f64 arrays (contiguous + transposed), decimal into pre-created DECIMAL(18,3) table, and VARCHAR string columns. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Auto-flush: bytes threshold triggers, row count triggers, disabled mode only sends on explicit flush. Buffer lifecycle: reuse after flush, independent buffers don't interfere, flush(clear=False) retains content, standalone Buffer.qwp() clear and reuse across multiple flushes. Edge cases: multi-table rows in one flush, Unicode symbols/columns (Zürich, 你好世界, 🚀), None values silently skipped, empty flush is no-op, double close, context manager flushes on exit, ServerTimestamp vs explicit TimestampNanos, max_name_len enforcement. Stress: 500 rows exercising multiple datagrams and auto-flush. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Auto-flush interval: verify time-based trigger after 500ms. Datagram splitting: 30 rows with max_datagram_size=200, single flush forces Rust to split across multiple datagrams — all arrive. Interleave HTTP and QWP/UDP senders in same process. from_env construction via QDB_CLIENT_CONF env var. Sender reuse: close and re-create, both sessions' data arrives. Large string: 1000-char value roundtrips through QWP/UDP. Symbols-only row (no columns). Mixed timestamps: ServerTimestamp and explicit TimestampNanos in same batch. DataFrame with datetime column as designated timestamp (at='ts'). new_buffer inherits init_buf_size and max_name_len from sender. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Mirrors the existing HTTP test_decimal_pyarrow but over QWP/UDP: PyArrow decimal128(18,2) column flushed via sender.dataframe() into a pre-created DECIMAL(18,3) table, values verified server-side. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Zero, negative zero, max precision (999999999999999.999), min non-zero (0.001), NaN/Inf rejection, multiple decimal columns with different scales, and PyArrow decimal128 with null values in DataFrame. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Buffer type mismatch: ILP buffer rejected by QWP sender, QWP buffer rejected by HTTP sender — Rust layer returns clear error messages. Network edge cases: wrong port flushes silently (UDP fire-and-forget), unresolvable host fails at establish() with DNS error. Data shape: wide row (50 columns + 5 symbols), row ordering across multiple datagrams (100 rows split with max_datagram_size=200), tiny datagram (max_datagram_size=1) rejected at flush. Load: 2000 rows rapid-fire with pure auto-flush, accept >= 90% arrival (UDP may drop under load). Config: protocol_version in QWP conf string rejected. Concurrency: two senders from different threads to same port. Lifecycle: double establish rejected, establish after close rejected. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The designated timestamp column type is TIMESTAMP_NS (not TIMESTAMP) on newer QuestDB when using TimestampNanos. Use col_types dict lookup instead of full columns list comparison to avoid brittleness. Verified: 77/77 system tests pass with QDB_REPO_PATH. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- sender.rst overview: add QWP/UDP alongside ILP/TCP and ILP/HTTP - sender.rst tips: note QWP/UDP for fire-and-forget use cases - conf.rst: note protocol_version is not applicable for QWP/UDP Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Covers: no delivery guarantee, no error feedback, buffer inspection differences (bytes() empty, len() is estimate), standalone buffer requirements (Buffer.qwp()), auto-flush byte defaults, datagram size limit errors, and protocol_version inapplicability. Also adds QWP/UDP row to the protocol version auto-detection table. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds QWP/UDP and QWP/WebSocket support across C bindings, Cython runtime and stubs, Buffer factory refactor, Sender QWP/WebSocket runtime APIs, docs/examples, manifest, CI/build updates, and extensive unit/system tests. ChangesQWP Protocol Core Implementation
Documentation and Examples
Unit and Integration Tests
Maintenance
Sequence DiagramsequenceDiagram
participant ClientPy as Python Sender API
participant IngressPy as ingress.pyx runtime
participant LineSenderC as line_sender C lib
participant Trampoline as C trampoline
ClientPy->>IngressPy: Sender.flush / publish / flush_and_get_fsn
IngressPy->>LineSenderC: call C publish/flush APIs
LineSenderC->>Trampoline: emit QWPWS error/event view (if any)
Trampoline->>IngressPy: invoke Python error handler callback
IngressPy->>ClientPy: expose diagnostics via poll_qwp_ws_error / raise IngressServerRejectionError
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
examples/qwp_udp.py (1)
10-14: 💤 Low valueConsider noting that 1400 is the default
max_datagram_size.The
max_datagram_size=1400parameter is set to its default value. While being explicit in examples is fine, you might consider adding a comment to note this is the default, helping users understand they can omit it or adjust it as needed.📝 Optional comment addition
with Sender( Protocol.QwpUdp, host, port, - max_datagram_size=1400) as sender: + max_datagram_size=1400) as sender: # default is 1400🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/qwp_udp.py` around lines 10 - 14, The example explicitly passes max_datagram_size=1400 which is the default; update the Sender usage (the call to Sender with Protocol.QwpUdp) to either remove the parameter or add a short inline comment (next to max_datagram_size=1400) indicating "1400 is the default" so readers know it can be omitted or tuned as needed.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@examples/qwp_udp.py`:
- Around line 10-14: The example explicitly passes max_datagram_size=1400 which
is the default; update the Sender usage (the call to Sender with
Protocol.QwpUdp) to either remove the parameter or add a short inline comment
(next to max_datagram_size=1400) indicating "1400 is the default" so readers
know it can be omitted or tuned as needed.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: d1faad2b-3030-485c-bb29-bf483b70cb9c
📒 Files selected for processing (12)
c-questdb-clientdocs/conf.rstdocs/examples.rstdocs/installation.rstdocs/sender.rstexamples.manifest.yamlexamples/qwp_udp.pysrc/questdb/ingress.pyisrc/questdb/ingress.pyxsrc/questdb/line_sender.pxdtest/system_test.pytest/test.py
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/questdb/ingress.pyx (2)
2755-2760:⚠️ Potential issue | 🟠 Major | ⚡ Quick winMake sender setup rollback-safe when buffer reservation fails.
reserve_buffer()can now raise from_new_buffer_for_sender(). If that happens afterline_sender_build()succeeds,establish()leaves_impland_optslive on the sameSender, so retryingestablish()can leak or double-initialize native state.Suggested fix
if self._impl == NULL: raise c_err_to_py(err) + line_sender_opts_free(self._opts) + self._opts = NULL if self._buffer is None: - self._buffer = self._new_buffer_for_sender() - - line_sender_opts_free(self._opts) - self._opts = NULL + try: + self._buffer = self._new_buffer_for_sender() + except Exception: + line_sender_close(self._impl) + self._impl = NULL + raise
2214-2218:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winReject unknown
qwp_ws_progressvalues explicitly.
QwpWsProgress.parse()returnsNonefor an unmatched tag, so this turns bad user input into anAttributeErroron.c_valueinstead of a config error.Suggested fix
if qwp_ws_progress is not None: - c_qwp_ws_progress = QwpWsProgress.parse(qwp_ws_progress).c_value + progress = QwpWsProgress.parse(qwp_ws_progress) + if progress is None: + raise IngressError( + IngressErrorCode.ConfigError, + f'"qwp_ws_progress" has invalid value: {qwp_ws_progress!r}') + c_qwp_ws_progress = progress.c_value if not line_sender_opts_qwpws_progress( self._opts, c_qwp_ws_progress, &err): raise c_err_to_py(err)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/questdb/ingress.pyx` around lines 2214 - 2218, QwpWsProgress.parse(qwp_ws_progress) can return None for unknown tags, which currently leads to an AttributeError when accessing .c_value; update the logic around QwpWsProgress.parse(...) in ingress.pyx to check the parse result for None, and if None raise a clear config/validation error (rather than accessing .c_value), otherwise assign c_qwp_ws_progress = parsed.c_value and call line_sender_opts_qwpws_progress(self._opts, c_qwp_ws_progress, &err) as before; reference QwpWsProgress.parse, c_qwp_ws_progress, line_sender_opts_qwpws_progress, and c_err_to_py when implementing the explicit None check and error raise.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@setup.py`:
- Line 149: The build currently unconditionally appends the Rust feature
"insecure-skip-verify" to cargo_args; change this so "insecure-skip-verify" is
only added when an explicit opt-in environment variable is set (e.g.,
RUST_INSECURE_SKIP_VERIFY or ALLOW_INSECURE_SKIP_VERIFY), otherwise append only
"confstr-ffi". Locate the cargo_args assembly in setup.py where the line adds
['--features', 'confstr-ffi,insecure-skip-verify'] and replace it with logic
that conditionally includes "insecure-skip-verify" based on os.environ; also
document the env var in the packaging README. After that, inspect the Rust
workspace (search for the feature name "insecure-skip-verify" in the Rust
submodules/crates) to confirm where the feature is defined/used and ensure it
does not alter TLS verification for default release builds when the env opt-in
is not set.
In `@src/questdb/ingress.pyx`:
- Around line 2352-2366: The code currently treats bool as an int for
retry_max_backoff, so True/False are accepted; update the type checks in the
retry_max_backoff handling (symbols: retry_max_backoff,
line_sender_opts_retry_max_backoff, cp_timedelta, _timedelta_to_millis,
c_err_to_py, _fqn) to explicitly exclude bools—for example, change the int
branch to require isinstance(retry_max_backoff, int) and not
isinstance(retry_max_backoff, bool) (or check type(...) is int), and ensure bool
values fall through to the TypeError branch so passing True/False raises a
TypeError with the same message.
---
Outside diff comments:
In `@src/questdb/ingress.pyx`:
- Around line 2214-2218: QwpWsProgress.parse(qwp_ws_progress) can return None
for unknown tags, which currently leads to an AttributeError when accessing
.c_value; update the logic around QwpWsProgress.parse(...) in ingress.pyx to
check the parse result for None, and if None raise a clear config/validation
error (rather than accessing .c_value), otherwise assign c_qwp_ws_progress =
parsed.c_value and call line_sender_opts_qwpws_progress(self._opts,
c_qwp_ws_progress, &err) as before; reference QwpWsProgress.parse,
c_qwp_ws_progress, line_sender_opts_qwpws_progress, and c_err_to_py when
implementing the explicit None check and error raise.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 52ba2082-709e-4246-808f-df47556aa5e5
📒 Files selected for processing (9)
c-questdb-clientdocs/conf.rstdocs/sender.rstsetup.pysrc/questdb/ingress.pyisrc/questdb/ingress.pyxsrc/questdb/line_sender.pxdtest/system_test.pytest/test.py
✅ Files skipped from review due to trivial changes (2)
- c-questdb-client
- docs/sender.rst
| del env['CXX'] | ||
| subprocess.check_call( | ||
| cargo_args + ['--features', 'confstr-ffi'], | ||
| cargo_args + ['--features', 'confstr-ffi,insecure-skip-verify'], |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# 1) Locate cargo manifests in the Rust submodule.
fd -a '^Cargo.toml$' c-questdb-client
# 2) Show feature definitions/usages related to insecure TLS behavior.
rg -n --no-heading 'insecure-skip-verify|confstr-ffi|tls_verify|unsafe_off|feature' c-questdb-clientRepository: questdb/py-questdb-client
Length of output: 51
Gate insecure-skip-verify behind an explicit opt-in (not default-on)
At setup.py:149, release builds unconditionally add Cargo features confstr-ffi,insecure-skip-verify to cargo_args, which can weaken TLS certificate verification behavior in the Rust FFI for all published wheels/sdist. Make insecure-skip-verify opt-in via an explicit env var and omit it by default.
cargo_args + ['--features', 'confstr-ffi,insecure-skip-verify'],Requested check (Rust feature wiring): the earlier c-questdb-client Cargo feature search returned no matches in this environment; after initializing the Rust submodule, inspect where insecure-skip-verify is defined/used and confirm it doesn’t change cert verification semantics for release builds.
🧰 Tools
🪛 Ruff (0.15.13)
[warning] 149-149: Consider iterable unpacking instead of concatenation
Replace with iterable unpacking
(RUF005)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@setup.py` at line 149, The build currently unconditionally appends the Rust
feature "insecure-skip-verify" to cargo_args; change this so
"insecure-skip-verify" is only added when an explicit opt-in environment
variable is set (e.g., RUST_INSECURE_SKIP_VERIFY or ALLOW_INSECURE_SKIP_VERIFY),
otherwise append only "confstr-ffi". Locate the cargo_args assembly in setup.py
where the line adds ['--features', 'confstr-ffi,insecure-skip-verify'] and
replace it with logic that conditionally includes "insecure-skip-verify" based
on os.environ; also document the env var in the packaging README. After that,
inspect the Rust workspace (search for the feature name "insecure-skip-verify"
in the Rust submodules/crates) to confirm where the feature is defined/used and
ensure it does not alter TLS verification for default release builds when the
env opt-in is not set.
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
test/system_test.py (2)
796-814:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winDocument
Sender.new_buffer()lifecycle preconditions in public stubs/docs
Sender.new_buffer()raisesIngressErrorif called beforeSender.establish()("new_buffer() can't be called before establish()") or afterSender.close()("new_buffer() can't be called: Sender is closed."), pertest/system_test.py(lines 796-814).src/questdb/ingress.pyi(anddocs/sender.rst) still describenew_buffer()without these required preconditions—update the public contract accordingly.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/system_test.py` around lines 796 - 814, The public type stub and docs don't state that Sender.new_buffer() may raise IngressError when called before Sender.establish() or after Sender.close(); update src/questdb/ingress.pyi and docs/sender.rst to document these lifecycle preconditions: indicate that new_buffer() requires prior Sender.establish() and will raise qi.IngressError with message "new_buffer() can't be called before establish()" if not established, and will raise qi.IngressError with message "new_buffer() can't be called: Sender is closed" if the Sender has been closed; reference the Sender.new_buffer, Sender.establish, Sender.close methods and the IngressError exception in the public contract.
1043-1058:⚠️ Potential issue | 🟠 Major | ⚡ Quick winFix UDP auto-flush system tests to avoid false positives from context-manager flush
In
test/system_test.py, the auto-flush tests (1043-1058,1060-1073,1344-1361,1592-1607) usewith self._mk_qwpudp_sender(...) as sender:. On normalwith-exit the sender context manager callsclose(flush=True), which flushes any buffered rows even if the specific byte/row/interval/rapid-fire trigger never fires—so these tests can pass while the trigger logic is broken.Use a manual sender lifetime instead: create the sender without the context manager, call
sender.establish(), and infinallycallsender.close(flush=False)so buffered data is not flushed on exit; keep/retain the existing server-side assertions (e.g.,retry_check_table(..., min_rows=10)andcountchecks) to validate that only the intended auto-flush path publishes rows.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/system_test.py` around lines 1043 - 1058, The auto-flush UDP tests currently use the context manager from _mk_qwpudp_sender which calls close(flush=True) on exit and masks failures; replace the with self._mk_qwpudp_sender(...) as sender: usage by creating the sender manually, call sender.establish() before sending rows, and ensure you close the sender in a finally block with sender.close(flush=False) so the context exit doesn't flush buffered rows; keep the existing server-side assertions (e.g., qdb_plain.retry_check_table(..., min_rows=10) and resp['count'] checks) unchanged to validate that only the intended auto-flush trigger causes publishes.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/system_test.py`:
- Around line 165-168: The _unused_tcp_port() function currently returns a port
after closing the socket, causing a TOCTOU race; instead bind and keep the
socket reserved and return the bound socket (and/or the port) so callers can use
the reserved endpoint and close the socket only when failover is set up; update
any call sites (including the other occurrence around lines 384-386) to accept
and later close the returned socket once the sender/receiver are using the port.
---
Outside diff comments:
In `@test/system_test.py`:
- Around line 796-814: The public type stub and docs don't state that
Sender.new_buffer() may raise IngressError when called before Sender.establish()
or after Sender.close(); update src/questdb/ingress.pyi and docs/sender.rst to
document these lifecycle preconditions: indicate that new_buffer() requires
prior Sender.establish() and will raise qi.IngressError with message
"new_buffer() can't be called before establish()" if not established, and will
raise qi.IngressError with message "new_buffer() can't be called: Sender is
closed" if the Sender has been closed; reference the Sender.new_buffer,
Sender.establish, Sender.close methods and the IngressError exception in the
public contract.
- Around line 1043-1058: The auto-flush UDP tests currently use the context
manager from _mk_qwpudp_sender which calls close(flush=True) on exit and masks
failures; replace the with self._mk_qwpudp_sender(...) as sender: usage by
creating the sender manually, call sender.establish() before sending rows, and
ensure you close the sender in a finally block with sender.close(flush=False) so
the context exit doesn't flush buffered rows; keep the existing server-side
assertions (e.g., qdb_plain.retry_check_table(..., min_rows=10) and
resp['count'] checks) unchanged to validate that only the intended auto-flush
trigger causes publishes.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
| def _unused_tcp_port(): | ||
| with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock: | ||
| sock.bind(('127.0.0.1', 0)) | ||
| return sock.getsockname()[1] |
There was a problem hiding this comment.
Reserve the failover port instead of probing and releasing it.
This is a TOCTOU race: _unused_tcp_port() closes the socket before the sender uses the endpoint, so another process can claim that port and make the failover path nondeterministic.
Proposed fix
+import contextlib
import socket
...
- `@staticmethod`
- def _unused_tcp_port():
- with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock:
- sock.bind(('127.0.0.1', 0))
- return sock.getsockname()[1]
+ `@staticmethod`
+ `@contextlib.contextmanager`
+ def _reserved_tcp_port():
+ with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock:
+ sock.bind(('127.0.0.1', 0))
+ yield sock.getsockname()[1]
...
- endpoints = [
- (self.qdb_plain.host, self._unused_tcp_port()),
- (self.qdb_plain.host, self.qdb_plain.http_server_port)]
+ with self._reserved_tcp_port() as dead_port:
+ endpoints = [
+ (self.qdb_plain.host, dead_port),
+ (self.qdb_plain.host, self.qdb_plain.http_server_port),
+ ]Also applies to: 384-386
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@test/system_test.py` around lines 165 - 168, The _unused_tcp_port() function
currently returns a port after closing the socket, causing a TOCTOU race;
instead bind and keep the socket reserved and return the bound socket (and/or
the port) so callers can use the reserved endpoint and close the socket only
when failover is set up; update any call sites (including the other occurrence
around lines 384-386) to accept and later close the returned socket once the
sender/receiver are using the port.
Introduce a new Client class wrapping the c-questdb-client questdb_db pool. Client.dataframe() emits via column_sender_chunk_* adapters for the v1 supported subset: fixed table name, NumPy int64/float64, NumPy datetime64[ns/us] field and designated timestamp columns, pandas categorical symbols, Arrow UTF-8 string fields, and large_string columns (cast to UTF-8 before publication). - Extract dataframe_plan_t in dataframe.pxi so row and columnar emitters share argument resolution and dtype planning. - Add column_sender FFI declarations and UnsupportedDataFrameShapeError carrying structured per-column rejection details before publication. - Schema-aware chunk planning with 8-row alignment when validity is present; mandatory column_sender_sync(ok) before return, with sync-and-retry for deferred flushes hitting the in-flight reserve. - Benchmark harness (test/benchmark_pandas_columnar.py) covering Layer 1 plan/populate, Layer 2 client-ACK with pool-reuse check, and Layer 3 against a real QuestDB. - Local ACK fixture (test/qwp_ws_ack_server.py) and a fixture-backed Layer 3 runner that spins up a QuestDB jar and verifies row counts. - Deterministic seed-controlled fuzz coverage (test/test_client_dataframe_fuzz.py) exercising the planner and emitter via the ACK fixture; reproduce with QDB_CLIENT_FUZZ_SEED / QDB_CLIENT_FUZZ_ITER_SEED. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add value-edge and multi-chunk coverage to the seeded fuzz: - Numeric generators emit IEEE-754 specials (NaN, ±Inf, ±0.0, subnormals) and int64 boundary values (INT64_MIN/MAX) with 5% probability each; string generator emits empty strings with 5% probability to exercise zero-length offset slices. - test_multi_chunk_with_nulls: 100 003 rows of nullable categorical + int64 forces multi-chunk emission across the 8-row validity-bitmap alignment boundary. - test_high_cardinality_symbol_i16: 200 categories forces pandas to allocate i16 codes, exercising the column_sender_chunk_symbol_dict_i16 path that random fuzz rarely reaches. - test_wide_frame_multi_chunk: 12 field columns + 64 001 rows hits the planner's n_cols>8 → 64 000 cap branch. - test_sequential_client_lifecycle: 30 open/use/close cycles flush any close-then-reopen lifecycle leaks; verifies accepted_connections matches cycle count. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Update the Cython pxd and Client.dataframe() implementation to track the c-questdb-client submodule's rename of the borrowed pool handle from column_sender to qwpws_conn. No behaviour change at the Python level; Client.from_conf / .dataframe / .close / .reap_idle keep their signatures. Submodule bump: c-questdb-client jh_conn_pool_refactor (7740b7a) covers the C header, Rust FFI shim, and ABI doc. See plan-conn-pool-and-writers.md for the multi-step plan this is part of (Step 1 of 5). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add the pxd declaration for the new Arrow appender. The arrow_c_data_interface module already provides the ArrowArray / ArrowSchema struct definitions, so the new entry point can take the same types pandas / pyarrow / polars already expose via _export_to_c. No call site wires this in yet — Client.dataframe still uses the per-type appenders. Wiring lands in Step 2c. Submodule bump: c-questdb-client jh_conn_pool_refactor (632c647) brings the new FFI surface. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
c-questdb-client jh_conn_pool_refactor (6c53ea7) adds LargeUtf8 (Arrow 'U') support to column_sender_chunk_append_arrow_column. No Python-side change yet — Step 2c wires it in via the planner. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Client.dataframe()'s varchar (string) and symbol (dictionary)
columns now dispatch through column_sender_chunk_append_arrow_column
instead of the per-type column_sender_chunk_column_varchar and
symbol_dict_i* calls. Numeric / timestamp columns stay on the
per-type path — they were already direct-write to wire (see
Open Q1) and the Arrow appender adds no per-row benefit there.
What this changes:
- The Cython side no longer manually slices Arrow offsets / bytes
/ dict-codes for chunked emission; the Arrow appender does
that slicing in Rust, with one dispatch step per format.
- Categorical i8/i16/i32 dispatch lives in Rust now, not Cython.
- The v1 validator accepts both col_source_str_utf8_arrow and
col_source_str_lrg_utf8_arrow — large_string flows through the
Arrow appender's `U` path natively. (The legacy row path still
needs the planner-shared large_string → utf8 cast, so it stays;
the Rust appender's `U` support is latent capability today but
used when the cast is removed in a future change.)
No public Python API change. 929 tests pass; 4 fixed seeds × 200
fuzz iters pass.
Submodule bump: c-questdb-client jh_conn_pool_refactor (0650c40)
adds the row_offset / row_count parameters to the Arrow appender.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Update plan-conn-pool-and-writers.md with Step 1, 2a, 2b, 2c completion (with submodule + parent SHAs). - Correct the Step 1 claim that it resolves the round-3 dirty-sender concern — it doesn't; the rename is cosmetic at that layer. Add an explicit caveat block describing the open follow-up. - Step 2 expansion: enumerate the three sub-steps and the known gaps surfaced during implementation (dead `lrg` validator arm, unreachable `U` format path, two-place 8-row alignment). - Step 3 reframed per Q1: "no perf win, only narrower-dtype / stride / endian convenience" + the open widening-policy decisions to make before implementing. Submodule bump: c-questdb-client jh_conn_pool_refactor (f35123d) applies the Arrow mirror-type and naming polish. 929 Python tests + 836 Rust unit tests still green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`Client.dataframe` now accepts `pa.int8`, `pa.int16`, `pa.int32`,
and `pa.float32` Arrow columns and sends them as the corresponding
narrow QWP wire types (BYTE / SHORT / INT / FLOAT) instead of
silently failing at the column-path serializer. Pre-PR these
sources were classified by the shared planner but had no dispatch
arm on the column-QWP side — they failed at runtime with
"Unsupported columnar int/float source".
Architecture:
- New `col_target_t` narrow targets used *only* by column-QWP;
not added to `_FIELD_TARGETS`, so the shared resolver still
picks the wide targets (`col_target_column_i64` /
`col_target_column_f64`) for row-ILP. Row-ILP serializer is
untouched.
- `_dataframe_columnar_rewrite_to_narrow_arrow_targets` runs
after validation as a pure post-processing pass: rewrites the
(target, dispatch_code) pair for the four Arrow narrow sources.
- `_dataframe_columnar_finalize_plan` bundles
validate-then-retarget so the three column-QWP entry points
call one helper instead of orchestrating both steps.
Validator's i64/f64 source allowlists extended to accept the new
Arrow sources — they're transitionally classified to the wide
target by the shared planner, then narrowed by the retarget
step. The validator's catch-all stays clean (no per-narrow-target
pass-through branch).
Tests: 9 system tests in `TestColumnIngressNarrowTypes` covering
round-trip (i8/i16/i32/f32 against BYTE/SHORT/INT/FLOAT target
columns), server-side widening (i8 → LONG, f32 → DOUBLE),
INT32_MIN sentinel collision, SHORT non-nullable contract, and
`pa.uint8` rejection pin.
Doc: `pandas-paths-comparison.md` gains a "Closing the
type-support gap" section pinning the design — strict Arrow
mirror client-side, server-side coercion for everything else —
plus the categorised work breakdown and the four verification
items.
DATE / `pa.timestamp('ms')` deferred to a follow-up; it touches
the shared row-ILP classifier and needs row-ILP serializer
support to ship safely.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`_dataframe_resolve_target` now takes the field-targets tuple as a parameter; each protocol passes its own ordering. Row-ILP gets `_FIELD_TARGETS_ROW` (today's behaviour, wide targets only). Column-QWP gets `_FIELD_TARGETS_QWP` with the narrow targets listed in front of their wide counterparts, so the resolver picks BYTE / SHORT / INT / FLOAT directly for Arrow narrow sources instead of widening to LONG / DOUBLE. This deletes the post-resolution scaffold added in the previous PR — the rewrite step exists only because the resolver was making the wrong target choice for column-QWP. Once the resolver gets the path-specific tuple, the right target falls out on the first pass, and the rewrite + the finalize bundle + the validator allowlist extensions all become unnecessary. Net: -38 lines, no behaviour change. Prerequisite for PR 2 (UUID) — UUID's `FixedSizeBinary(16)` source has no existing wide target whose source-set can transitionally accept it, so the rewrite pattern would not have extended to that case. Listing `col_target_column_uuid` directly in `_FIELD_TARGETS_QWP` is the natural way to add it. - `_FIELD_TARGETS` → `_FIELD_TARGETS_ROW` + `_FIELD_TARGETS_QWP` - Parameter plumbed through `_dataframe_resolve_target`, `_dataframe_resolve_cols_target_name_and_dc`, `_dataframe_resolve_args`, `_dataframe_plan_build`. Five callsites updated (2 row-ILP, 3 column-QWP). - Deleted `_dataframe_columnar_finalize_plan` and `_dataframe_columnar_rewrite_to_narrow_arrow_targets`. - Validator's i64 / f64 source allowlists reverted to their pre-PR-1 shape; explicit narrow-target branch added so the catch-all doesn't fire on the new targets. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`Client.dataframe` now accepts `pa.fixed_size_binary(16)` columns
and the `arrow.uuid` extension type wrapping the same storage,
forwarding the bytes to QuestDB's UUID wire encoding via
`column_sender_chunk_column_uuid`.
Path-asymmetry contract: column-QWP routes `fsb16_arrow` through
the new `col_target_column_uuid` (listed in `_FIELD_TARGETS_QWP`);
row-ILP's `_FIELD_TARGETS_ROW` deliberately omits it, so the
shared resolver fails to map `fsb16_arrow` to a target when
`Sender.dataframe` is called with a UUID column — pinned by the
`test_fsb16_rejected_by_row_ilp` test.
Byte order: bytes are passed through verbatim in QuestDB's UUID
wire layout ("bytes 0..8 lo half LE, bytes 8..16 hi half LE", per
`column_sender.h:303`). This matches the convention shared across
the c-questdb-client family — Polars ingress
(`questdb-rs/src/ingress/polars.rs`, downcast to
`FixedSizeBinaryArray` + `extend_from_slice`) and egress
(unchanged FSB storage emitted with the `arrow.uuid` metadata
label) do the same. Users who want `uuid.UUID.bytes` (RFC 4122
big-endian) round-trip convert at their boundary.
Source detection: extended `_dataframe_series_resolve_arrow` to
unwrap `pa.BaseExtensionType` and to recognise
`FixedSizeBinary(16)` as `col_source_fsb16_arrow`. Other
fixed-width sizes (e.g. `FSB(8)`, `FSB(32)`) are not routed by
this PR and surface as the existing "Unsupported arrow type"
rejection — LONG256 (FSB(32)) lands in PR 3.
Tests: 7 new system tests in `TestColumnIngressNarrowTypes`:
canonical FSB(16) round-trip, `arrow.uuid` extension-type
round-trip (skip if pyarrow lacks `pa.uuid()`), null-bitmap
round-trip, server-side string→UUID coercion, invalid-UUID-
string rejection, row-ILP UUID rejection, and `FSB(k != 16)`
rejection. Includes a `_uuid_to_wire` helper for tests and
docstring examples; consider exposing this publicly in a
follow-up if user friction emerges.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds two more QuestDB-extension types to `Client.dataframe`,
following the UUID pattern from PR 2:
- IPV4: `pa.uint32()` → `column_sender_chunk_column_ipv4`. The
Arrow source overlaps with the existing
`col_target_column_i64` (which widens any uint to LONG for
row-ILP). `_FIELD_TARGETS_QWP` lists IPV4 before i64 so the
column-QWP resolver picks IPV4 first; row-ILP's
`_FIELD_TARGETS_ROW` doesn't list IPV4, so its existing widening
behaviour is preserved. Per the strict-mirror policy:
`pa.uint32()` is unambiguously IPV4 on column-QWP — counts that
the user wants stored as LONG must cast to `pa.int64()` before
ingest. Pin this with `test_pa_uint32_is_routed_to_ipv4_not_long`.
- LONG256: `pa.fixed_size_binary(32)` →
`column_sender_chunk_column_long256`. No source-overlap; no
ordering subtlety. Bytes forwarded verbatim in the QuestDB wire
layout ("four LE 64-bit limbs, least-significant first"),
matching the same opaque-bytes convention as UUID + Polars (see
PR #150).
GEOHASH deferred: its Arrow canonical mirror is a sized int
(int8/16/32/64 depending on bit precision), which conflicts with
the narrow-int routing from PR 1. Strict-mirror dispatch has no
way to distinguish a `pa.int64()` LONG column from a `pa.int64()`
GEOHASH column without consulting Arrow field metadata — and the
agreed policy is no metadata-based dispatch. GEOHASH will need
either a different policy decision or string-coercion-only
support; revisit before declaring the type-support gap closed.
Also unlocks `pa.uint32()` on row-ILP (Arrow detection was
gated at `_dataframe_series_resolve_arrow` before this PR, even
though the row-ILP serializer at
`_dataframe_serialize_cell_column_i64__u32_arrow` already
existed). Side effect of adding Type_UINT32 detection; the
serializer's behaviour is unchanged.
Tests: 8 new system tests in `TestColumnIngressNarrowTypes` —
IPV4 round-trip, IPV4-string-coercion-currently-unsupported,
IPV4 server-rejection on invalid string, IPV4 routes-to-ipv4-not-
long pin, LONG256 round-trip, LONG256 with nulls, LONG256 row-ILP
rejection. The IPV4 server-coercion test pins a current QuestDB
limitation (no VARCHAR → IPV4 coercion at insert time); flip it
when QuestDB adds coercion support.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Route Client.dataframe Arrow publishes through one QWP buffer per call, add benchmark counters, and remove the manual preflight from real-client measurements. Update tests for Rust Arrow timestamp validation, multi-chunk buffer reuse, and expected fuzz rejections.
Let Client.dataframe use the capsule/Rust Arrow route for pandas frames whose dtypes are already Arrow-backed, while keeping object columns and non-pyarrow string columns on the manual planner. That fixes pandas ArrowDtype UInt64, Float16, and timestamp-unit e2e ingestion without losing the Python-side bad-cell validation that must fail before borrowing a connection. Teach capsule symbol resolution about pandas Arrow/string[pyarrow] dtypes and slice pandas frames with iloc so max_rows_per_batch still bounds each Arrow publish. Timestamp-only pandas frames deliberately fall back to the manual planner to preserve the existing UnsupportedDataFrameShapeError API and no-publication contract. Update the Rust submodule to accept plain FSB16 as UUID and reject null timestamp fields, matching the Python and real-server contracts. The tradeoff is that FSB16 is now treated as UUID in Arrow ingestion rather than generic fixed-size binary; supporting a separate LONG128/binary policy would need an explicit future override or metadata distinction. Also run the narrow-type e2e class against QuestDB 9.4.1+ instead of only local QDB_REPO_PATH builds, so the release-server QWP/WebSocket path is covered.
WIP
Summary by CodeRabbit
New Features
Documentation
Tests
Chores