Skip to content

[Python] INTERNAL Error: TransactionContext::ActiveTransaction when fetching nested STRUCT arrays via Arrow in v1.5.0 #385

@Alistorm

Description

@Alistorm

What happens?

When upgrading from DuckDB 1.4.4 (LTS) to 1.5.0, querying a Parquet file containing deeply nested LIST(STRUCT) columns and attempting to export it to Python using .fetch_arrow_table().to_pylist() (or Polars .pl().to_dicts()) results in a fatal C++ assertion failure.

The error suggests that DuckDB is closing the transaction and freeing memory before the Arrow C-Data interface finishes evaluating/copying the nested struct pointers into Python objects.

This worked flawlessly in version 1.4.3 but crashes instantly in 1.5.0.

Error Message:

DuckDB IO Error: INTERNAL Error: TransactionContext::ActiveTransaction called without active transaction This error signals an assertion failure within DuckDB. This usually occurs due to unexpected conditions or errors in the program's logic.

Stack Trace:

0 duckdb_adbc_init + 3548796 
1 duckdb_adbc_init + 3444796 
2 PyInit__duckdb + 16858592 
3 duckdb_adbc_init + 9895212 
4 PyInit__duckdb + 18135300 
5 PyInit__duckdb + 18104516 
6 PyInit__duckdb + 18143968 
7 duckdb_adbc_init + 2213140 
8 duckdb_adbc_init + 154540 
9 duckdb_adbc_init + 84192 
10 duckdb_adbc_init + 65780 
11 duckdb_adbc_init + 62340 
12 duckdb_adbc_init + 69328 
13 PyInit__duckdb + 832548 
14 PyInit__duckdb + 739972 
15 PyInit__duckdb + 551400 
16 _duckdb.cpython-313-darwin.so + 41708 
17 cfunction_vectorcall_FASTCALL_KEYWORDS.llvm.6373996594270193143 + 88 
18 _PyEval_EvalFrameDefault + 38980 
19 gen_iternext + 148 
20 builtin_next + 72 
21 cfunction_vectorcall_FASTCALL.llvm.6373996594270193143 + 92 
22 _PyEval_EvalFrameDefault + 38980

Workaround tested: If we bypass Arrow/Polars and use DuckDB's native .fetchall() (i.e. rows = [dict(zip(columns, row)) for row in conn.execute(query).fetchall()]), the query succeeds without crashing in 1.5.0, confirming the issue is specific to the Arrow zero-copy memory handoff for nested structs but since we have GEOMETRY this introduce another which is that that this type is not handled.

Environment:

  • OS: macOS (Apple Silicon / Darwin)
  • DuckDB Version: 1.5.0
  • DuckDB Client: Python (duckdb via uv)
  • Python Version: 3.12 / 3.13
  • Extensions Used: httpfs

To Reproduce


**To Reproduce:**
*Context: Our dataset is a wide Parquet file (~1M rows) stored on GCS (`gs://...`), containing scraped real estate data. Several columns are arrays of structs, some nested, e.g., `STRUCT(entity_type VARCHAR, locality VARCHAR, mail VARCHAR, name VARCHAR, phone VARCHAR)[]`.*

```python
import duckdb

# Setup
conn = duckdb.connect()
conn.execute("INSTALL httpfs; LOAD httpfs;")

# Setup GCP secrets...
# ...

query = """
    SELECT * 
    FROM read_parquet('gs://my-bucket/data/offers.parquet') 
    LIMIT 1000
"""

# CRASHES IN 1.5.0 / WORKS IN 1.4.3
arrow_table = conn.execute(query).fetch_arrow_table()
rows = arrow_table.to_pylist() # <--- Fatal crash occurs here

# (Note: conn.execute(query).pl().to_dicts() also crashes identically)

OS:

macOS (Apple Silicon / Darwin)

DuckDB Package Version:

1.5.0

Python Version:

3.12

Full Name:

Mohamed Ali Ag Ibrahim

Affiliation:

Upfund

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have tested with a stable release

Did you include all relevant data sets for reproducing the issue?

No - I cannot share the data sets because they are confidential

Did you include all code required to reproduce the issue?

  • Yes, I have

Did you include all relevant configuration to reproduce the issue?

  • Yes, I have

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions