Skip to content

Parallel Race Condition on Windows with DuckDB Extension #60

@ddahawkins-TUDelft

Description

@ddahawkins-TUDelft

Issue

The issue is that several parallel Snakemake jobs call the Overture boundary download script, and each job appears to call DuckDB’s install_extension("spatial") or equivalent setup. On Windows, these jobs share the same DuckDB extension cache:

C:\Users\<user>\.duckdb\extensions\...
When multiple jobs try to install/update/load the same spatial.duckdb_extension file at the same time, Windows may lock the file and one job fails with Access is denied.

The fix should be in module_geo_boundaries, not in individual user setup.

Recommended framework-level fix

Use a project-local DuckDB extension directory

Instead of using DuckDB’s global user cache, configure DuckDB to use a workflow-local extension directory, for example:

.snakemake/duckdb_extensions
or:

.duckdb_extensions
In Python, this would look roughly like:

from pathlib import Path
import duckdb

extension_dir = Path(".snakemake/duckdb_extensions").resolve()
extension_dir.mkdir(parents=True, exist_ok=True)

con = duckdb.connect(
    config={"extension_directory": str(extension_dir)}
)

Install the spatial extension once before the parallel download jobs

Add a small Snakemake setup rule that installs the DuckDB spatial extension once:

rule install_duckdb_spatial:
    output:
        touch(".snakemake/duckdb_extensions/spatial_installed.txt")
    run:
        from pathlib import Path
        import duckdb

        extension_dir = Path(".snakemake/duckdb_extensions").resolve()
        extension_dir.mkdir(parents=True, exist_ok=True)

        con = duckdb.connect(
            config={"extension_directory": str(extension_dir)}
        )
        con.install_extension("spatial")
        con.load_extension("spatial")

Make Overture download rules depend on this setup rule

The Overture download rules should include that marker file as an input, so the extension is installed before parallel jobs start:

input:
spatial_ext=".snakemake/duckdb_extensions/spatial_installed.txt"
Only load the extension inside the parallel jobs

Inside download_country_overture.py, avoid repeated concurrent installation. Use:

con.load_extension("spatial")

rather than repeatedly calling:

con.install_extension("spatial")

The connection should use the same project-local extension_directory.

Optional defensive fallback: serialize only DuckDB-spatial rules

As an extra Windows-safe measure, add a Snakemake resource to the Overture/DuckDB rules:

resources:
duckdb_spatial=1

Then the workflow/profile can constrain:

resources:

  • duckdb_spatial=1

This would allow the overall workflow to run with multiple cores, while preventing only the DuckDB-spatial extension-dependent jobs from running simultaneously.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions