Skip to content

UMLS Entity Linker throws BadZipFile error #534

@markediger-mdsol

Description

@markediger-mdsol

I am trying to run a basic example of the UMLS Entity Linker:

import spacy
import scispacy
from scispacy.umls_linking import UmlsEntityLinker
nlp = spacy.load('en_core_sci_md')
linker = UmlsEntityLinker()

nlp.add_pipe(linker)
doc = nlp("Spinal and bulbar muscular atrophy (SBMA) is an \
           inherited motor neuron disease caused by the expansion \
           of a polyglutamine tract within the androgen receptor (AR). \
           SBMA can be caused by this easily.")

entity = doc.ents[1]
print("Name: ", entity)

for umls_ent in entity._.umls_ents:
    print(linker.umls.cui_to_entity[umls_ent[0]])

I get an error implying that scispacy is not able to identify the UMLS dictionaries?

Traceback (most recent call last):
  File "H:\integrated_evidence\indication_coding\indication-master\src\scispacy_test.py", line 5, in <module>
    linker = UmlsEntityLinker()
             ^^^^^^^^^^^^^^^^^^
  File "H:\integrated_evidence\indication_coding\indication-master\.venv\Lib\site-packages\scispacy\linking.py", line 85, in __init__
    self.candidate_generator = candidate_generator or CandidateGenerator(
                                                      ^^^^^^^^^^^^^^^^^^^
  File "H:\integrated_evidence\indication_coding\indication-master\.venv\Lib\site-packages\scispacy\candidate_generation.py", line 222, in __init__
    self.ann_index = ann_index or load_approximate_nearest_neighbours_index(
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "H:\integrated_evidence\indication_coding\indication-master\.venv\Lib\site-packages\scispacy\candidate_generation.py", line 133, in load_approximate_nearest_neighbours_index
    concept_alias_tfidfs = scipy.sparse.load_npz(
                           ^^^^^^^^^^^^^^^^^^^^^^
  File "H:\integrated_evidence\indication_coding\indication-master\.venv\Lib\site-packages\scipy\sparse\_matrix_io.py", line 134, in load_npz
    with np.load(file, **PICKLE_KWARGS) as loaded:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "H:\integrated_evidence\indication_coding\indication-master\.venv\Lib\site-packages\numpy\lib\npyio.py", line 444, in load
    ret = NpzFile(fid, own_fid=own_fid, allow_pickle=allow_pickle,
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "H:\integrated_evidence\indication_coding\indication-master\.venv\Lib\site-packages\numpy\lib\npyio.py", line 190, in __init__
    _zip = zipfile_factory(fid)
           ^^^^^^^^^^^^^^^^^^^^
  File "H:\integrated_evidence\indication_coding\indication-master\.venv\Lib\site-packages\numpy\lib\npyio.py", line 103, in zipfile_factory
    return zipfile.ZipFile(file, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\apps\python\Lib\zipfile.py", line 1301, in __init__
    self._RealGetContents()
  File "D:\apps\python\Lib\zipfile.py", line 1368, in _RealGetContents
    raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file

I am using scispacy version 0.5.5 and en_core_sci_md version 0.5.4.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions