Fix h5py "Unable to synchronously open file" on Windows and some Linux systems#23
Conversation
- _write_h5: remove resource leak / Windows file-lock bug. Old code opened an empty file with mode='w', immediately got a KeyError on f['data'], then tried to unlink the still-open handle — raising PermissionError on Windows and leaving a corrupt file that caused downstream errno=2 read failures. Fix: clean up before opening, use 'with' context manager. - H5Reader: add SWMR fallback for pre-downloaded / non-SWMR files. When h5.File(..., swmr=True) raises OSError and assert_swmr=False, fall back to a plain mode='r' open and record _swmr=False so all subsequent reads use the same non-SWMR path. - Add tests/test_io.py covering both fixes. Co-authored-by: lappalainenj <34949352+lappalainenj@users.noreply.github.com>
Welcome to Codecov 🎉Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests. ℹ️ You can also turn on project coverage checks and project coverage reporting on Pull Request comment Thanks for integrating Codecov - We've got you covered ☂️ |
There was a problem hiding this comment.
Pull request overview
Fixes cross-platform h5py file-open failures by tightening HDF5 write cleanup/handle management and adding a non-SWMR fallback path for readers when explicitly allowed.
Changes:
- Refactors
_write_h5to perform cleanup before opening and to use a context manager to avoid Windows file-lock/resource-leak issues. - Updates
H5Readerto optionally fall back to non-SWMR reads (assert_swmr=False) and centralizes open logic via_open_file(). - Adds regression tests covering overwrite behavior, parent-dir creation, and SWMR/non-SWMR reader behavior.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
datamate/io.py |
Fixes _write_h5 handle/cleanup ordering; adds H5Reader SWMR fallback and _open_file() helper for consistent reads. |
tests/test_io.py |
Adds tests to prevent regressions for Windows locks and SWMR/non-SWMR open behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| except OSError: | ||
| if assert_swmr: | ||
| assert f.swmr_mode, "File is not in SWMR mode." | ||
| assert "data" in f | ||
| self.shape = f["data"].shape | ||
| self.dtype = f["data"].dtype | ||
| raise | ||
| # Fall back to opening without SWMR mode (e.g. pre-existing files | ||
| # that were not written with SWMR, or systems where SWMR is | ||
| # unsupported). | ||
| self._swmr = False | ||
| with h5.File(self.path, mode="r") as f: | ||
| assert "data" in f | ||
| self.shape = f["data"].shape | ||
| self.dtype = f["data"].dtype |
There was a problem hiding this comment.
In H5Reader.__init__, the broad except OSError will also catch non-SWMR-related failures (e.g., missing file, permission errors). When assert_swmr=False, if the fallback open (mode="r") also fails, the original exception context is lost. Consider except OSError as e and, if the fallback open raises, re-raise with from e (or otherwise preserve the original exception) so callers get a useful exception chain; optionally limit the SWMR fallback to known SWMR-open errors.
There was a problem hiding this comment.
Fixed in d2aa3b1. The except OSError now captures as e, the fallback open is wrapped in a try/except OSError as fallback_e, and failure raises raise fallback_e from e to preserve the original exception context.
| f = h5.File(path, libver="latest", mode="w") | ||
| path.parent.mkdir(parents=True, exist_ok=True) | ||
| if path.is_dir(): | ||
| path.rmdir() |
There was a problem hiding this comment.
_write_h5 uses path.rmdir() when the target path is a directory. rmdir() only removes empty directories, so a non-empty directory at *.h5 will raise and the function won’t actually “replace” it. If replacement is intended, use shutil.rmtree(path) (or explicitly detect non-empty dirs and raise a clearer error).
| path.rmdir() | |
| shutil.rmtree(path) |
There was a problem hiding this comment.
Fixed in d2aa3b1. Replaced path.rmdir() with shutil.rmtree(path) to handle non-empty directories correctly.
There was a problem hiding this comment.
@copilot please revert to path.rmdir(), as it should fail if the dir has content.
There was a problem hiding this comment.
Reverted in ba8596c. _write_h5 now uses path.rmdir() again, which will raise OSError for non-empty directories. The associated non-empty-dir test has been removed.
… replacement - H5Reader.__init__: capture `except OSError as e`; wrap fallback open in try/except and `raise fallback_e from e` so the original SWMR OSError is preserved in the exception chain. - _write_h5: replace `path.rmdir()` with `shutil.rmtree(path)` to correctly remove non-empty directories at the target path. - tests: rename dir-replacement test, add non-empty dir test, add exception-chaining test for H5Reader double failure. Co-authored-by: lappalainenj <34949352+lappalainenj@users.noreply.github.com>
As requested: _write_h5 should fail (via rmdir) if a non-empty directory sits at the target path, rather than silently removing it. Remove the non-empty-dir test that was added for the rmtree behavior. Co-authored-by: lappalainenj <34949352+lappalainenj@users.noreply.github.com>
Two bugs in
datamate/io.pycauseOSError: Unable to synchronously open file (errno=2)on Windows and certain Linux environments (e.g. Colab)._write_h5— resource leak / Windows file-lockmode="w"creates an empty file, makingf["data"]always raiseKeyError. Theexceptblock then calledpath.unlink()whilefwas still open —PermissionErroron Windows, leaving a truncated.h5on disk. Reads on that corrupt file produced the downstreamerrno=2error.Fix: Move cleanup (
mkdir,unlink) before the open; use awithcontext manager. If a directory exists at the target path,path.rmdir()is used — this intentionally raisesOSErrorfor non-empty directories.H5Reader— unconditional SWMR fails for pre-downloaded filesH5Reader.__init__always opened withswmr=True. On some h5py/HDF5 builds, this raisesOSErrorfor files not written in SWMR mode (e.g. downloaded model weights)._read_h5sawerrno=2in the error string and re-raised without retry.Fix: Catch the
OSError as ein__init__; whenassert_swmr=False, fall back to plainmode="r"and recordself._swmr = False. If the fallback open also fails, the newOSErroris re-raised chained from the original (raise fallback_e from e) so callers get the full exception context. A new_open_file()helper uses that flag for all subsequent reads (__getitem__,__getattr__).assert_swmr=True(default) still re-raises immediately.<issue_title>h5py Error: "Unable to synchronously open file" #16</issue_title>
Original prompt
🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.