Close the TSV file when building the UMLS semantic-type tree#581
Open
Chessing234 wants to merge 1 commit into
Open
Close the TSV file when building the UMLS semantic-type tree#581Chessing234 wants to merge 1 commit into
Chessing234 wants to merge 1 commit into
Conversation
construct_umls_tree_from_tsv iterated the source TSV with
for line in open(cached_path(filepath), "r"):
which never closes the handle; the file object is only released when
garbage collection happens to run, and on CPython that means the file
descriptor can stay open well past the function's return. Every other
file open in scispacy uses an explicit context manager (see
file_cache.py, linking_utils.py, etc.), so match that convention by
wrapping the loop in `with open(...) as tsv_file:`. Behaviour is
otherwise unchanged: the same lines are parsed in the same order.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Bug
construct_umls_tree_from_tsvinscispacy/umls_semantic_type_tree.pyleaks the TSV file handle onevery call: it iterates the file via
and never closes the returned file object. CPython will eventually
close it when reference counting drops to zero, but that's non-trivial
to guarantee in practice (e.g. long-running processes that build the
tree repeatedly, or runtimes without reference counting such as PyPy),
so each call can leave one file descriptor open until a GC cycle runs.
Root cause
The loop stores the open file only in the implicit iterator of the
for, so there's no handle to close. There's also nowithblock to tie the lifetime of the file to the parsing.
Why the fix is correct
file_cache.py,linking_utils.py, etc.)all open files via
with open(...) as f:; the new code matchesthat convention.
parsing logic inside the loop; only the lifetime management changes.
return value are unchanged.
Change
scispacy/umls_semantic_type_tree.py: wrap the TSV iteration inwith open(cached_path(filepath), "r") as tsv_file:and iteratetsv_file. Small indentation change inside theforblock; noother logic touched.