Issue
The issue is that several parallel Snakemake jobs call the Overture boundary download script, and each job appears to call DuckDB’s install_extension("spatial") or equivalent setup. On Windows, these jobs share the same DuckDB extension cache:
C:\Users\<user>\.duckdb\extensions\...
When multiple jobs try to install/update/load the same spatial.duckdb_extension file at the same time, Windows may lock the file and one job fails with Access is denied.
The fix should be in module_geo_boundaries, not in individual user setup.
Recommended framework-level fix
Use a project-local DuckDB extension directory
Instead of using DuckDB’s global user cache, configure DuckDB to use a workflow-local extension directory, for example:
.snakemake/duckdb_extensions
or:
.duckdb_extensions
In Python, this would look roughly like:
from pathlib import Path
import duckdb
extension_dir = Path(".snakemake/duckdb_extensions").resolve()
extension_dir.mkdir(parents=True, exist_ok=True)
con = duckdb.connect(
config={"extension_directory": str(extension_dir)}
)
Install the spatial extension once before the parallel download jobs
Add a small Snakemake setup rule that installs the DuckDB spatial extension once:
rule install_duckdb_spatial:
output:
touch(".snakemake/duckdb_extensions/spatial_installed.txt")
run:
from pathlib import Path
import duckdb
extension_dir = Path(".snakemake/duckdb_extensions").resolve()
extension_dir.mkdir(parents=True, exist_ok=True)
con = duckdb.connect(
config={"extension_directory": str(extension_dir)}
)
con.install_extension("spatial")
con.load_extension("spatial")
Make Overture download rules depend on this setup rule
The Overture download rules should include that marker file as an input, so the extension is installed before parallel jobs start:
input:
spatial_ext=".snakemake/duckdb_extensions/spatial_installed.txt"
Only load the extension inside the parallel jobs
Inside download_country_overture.py, avoid repeated concurrent installation. Use:
con.load_extension("spatial")
rather than repeatedly calling:
con.install_extension("spatial")
The connection should use the same project-local extension_directory.
Optional defensive fallback: serialize only DuckDB-spatial rules
As an extra Windows-safe measure, add a Snakemake resource to the Overture/DuckDB rules:
resources:
duckdb_spatial=1
Then the workflow/profile can constrain:
resources:
This would allow the overall workflow to run with multiple cores, while preventing only the DuckDB-spatial extension-dependent jobs from running simultaneously.
Issue
The issue is that several parallel Snakemake jobs call the Overture boundary download script, and each job appears to call DuckDB’s install_extension("spatial") or equivalent setup. On Windows, these jobs share the same DuckDB extension cache:
C:\Users\<user>\.duckdb\extensions\...When multiple jobs try to install/update/load the same spatial.duckdb_extension file at the same time, Windows may lock the file and one job fails with Access is denied.
The fix should be in module_geo_boundaries, not in individual user setup.
Recommended framework-level fix
Use a project-local DuckDB extension directory
Instead of using DuckDB’s global user cache, configure DuckDB to use a workflow-local extension directory, for example:
.snakemake/duckdb_extensionsor:
.duckdb_extensionsIn Python, this would look roughly like:
Install the spatial extension once before the parallel download jobs
Add a small Snakemake setup rule that installs the DuckDB spatial extension once:
Make Overture download rules depend on this setup rule
The Overture download rules should include that marker file as an input, so the extension is installed before parallel jobs start:
input:
spatial_ext=".snakemake/duckdb_extensions/spatial_installed.txt"Only load the extension inside the parallel jobs
Inside download_country_overture.py, avoid repeated concurrent installation. Use:
The connection should use the same project-local extension_directory.
Optional defensive fallback: serialize only DuckDB-spatial rules
As an extra Windows-safe measure, add a Snakemake resource to the Overture/DuckDB rules:
resources:
duckdb_spatial=1
Then the workflow/profile can constrain:
resources:
This would allow the overall workflow to run with multiple cores, while preventing only the DuckDB-spatial extension-dependent jobs from running simultaneously.