-
Notifications
You must be signed in to change notification settings - Fork 69
Description
What happens?
When upgrading from DuckDB 1.4.4 (LTS) to 1.5.0, querying a Parquet file containing deeply nested LIST(STRUCT) columns and attempting to export it to Python using .fetch_arrow_table().to_pylist() (or Polars .pl().to_dicts()) results in a fatal C++ assertion failure.
The error suggests that DuckDB is closing the transaction and freeing memory before the Arrow C-Data interface finishes evaluating/copying the nested struct pointers into Python objects.
This worked flawlessly in version 1.4.3 but crashes instantly in 1.5.0.
Error Message:
DuckDB IO Error: INTERNAL Error: TransactionContext::ActiveTransaction called without active transaction This error signals an assertion failure within DuckDB. This usually occurs due to unexpected conditions or errors in the program's logic.
Stack Trace:
0 duckdb_adbc_init + 3548796
1 duckdb_adbc_init + 3444796
2 PyInit__duckdb + 16858592
3 duckdb_adbc_init + 9895212
4 PyInit__duckdb + 18135300
5 PyInit__duckdb + 18104516
6 PyInit__duckdb + 18143968
7 duckdb_adbc_init + 2213140
8 duckdb_adbc_init + 154540
9 duckdb_adbc_init + 84192
10 duckdb_adbc_init + 65780
11 duckdb_adbc_init + 62340
12 duckdb_adbc_init + 69328
13 PyInit__duckdb + 832548
14 PyInit__duckdb + 739972
15 PyInit__duckdb + 551400
16 _duckdb.cpython-313-darwin.so + 41708
17 cfunction_vectorcall_FASTCALL_KEYWORDS.llvm.6373996594270193143 + 88
18 _PyEval_EvalFrameDefault + 38980
19 gen_iternext + 148
20 builtin_next + 72
21 cfunction_vectorcall_FASTCALL.llvm.6373996594270193143 + 92
22 _PyEval_EvalFrameDefault + 38980
Workaround tested: If we bypass Arrow/Polars and use DuckDB's native .fetchall() (i.e. rows = [dict(zip(columns, row)) for row in conn.execute(query).fetchall()]), the query succeeds without crashing in 1.5.0, confirming the issue is specific to the Arrow zero-copy memory handoff for nested structs but since we have GEOMETRY this introduce another which is that that this type is not handled.
Environment:
- OS: macOS (Apple Silicon / Darwin)
- DuckDB Version: 1.5.0
- DuckDB Client: Python (
duckdbviauv) - Python Version: 3.12 / 3.13
- Extensions Used:
httpfs
To Reproduce
**To Reproduce:**
*Context: Our dataset is a wide Parquet file (~1M rows) stored on GCS (`gs://...`), containing scraped real estate data. Several columns are arrays of structs, some nested, e.g., `STRUCT(entity_type VARCHAR, locality VARCHAR, mail VARCHAR, name VARCHAR, phone VARCHAR)[]`.*
```python
import duckdb
# Setup
conn = duckdb.connect()
conn.execute("INSTALL httpfs; LOAD httpfs;")
# Setup GCP secrets...
# ...
query = """
SELECT *
FROM read_parquet('gs://my-bucket/data/offers.parquet')
LIMIT 1000
"""
# CRASHES IN 1.5.0 / WORKS IN 1.4.3
arrow_table = conn.execute(query).fetch_arrow_table()
rows = arrow_table.to_pylist() # <--- Fatal crash occurs here
# (Note: conn.execute(query).pl().to_dicts() also crashes identically)
OS:
macOS (Apple Silicon / Darwin)
DuckDB Package Version:
1.5.0
Python Version:
3.12
Full Name:
Mohamed Ali Ag Ibrahim
Affiliation:
Upfund
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have tested with a stable release
Did you include all relevant data sets for reproducing the issue?
No - I cannot share the data sets because they are confidential
Did you include all code required to reproduce the issue?
- Yes, I have
Did you include all relevant configuration to reproduce the issue?
- Yes, I have