Skip to content

Incorrect handling of Unicode keys when creating npz files #49

@cerisola

Description

@cerisola

Hi, I am running into issues when using NPZ to create an npz file that uses unicode strings as keys.

Just to be clear, everything works fine when creating the file using Numpy and reading it using NPZ, i.e. this works fine in Python

>>> import numpy as np

>>> np.savez("file.npz", α=1)

>>> D = np.load("file.npz")

>>> print(D["α"])
1

and reading the file in Julia using NPZ also works as expected

julia> using NPZ

julia> D = npzread("file.npz")
Dict{String, Int64} with 1 entry:
  "α" => 1

julia> D["α"]
1

However, if I try creating this file from NPZ, while NPZ can read it as expected, it cannot be properly read by Numpy.
Indeed, from the NPZ side:

julia> npzwrite("file.npz", Dict("α" => 1))

julia> D = npzread("file.npz")
Dict{String, Int64} with 1 entry:
  "α" => 1

julia> D["α"]
1

everything works fine. However, when I try opening the file with Numpy, while it does load it, the keys are not what I would expect:

>>> D = np.load("file.npz")

>>> print(D["α"])
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-17-7d756a0b03cf> in <module>
----> 1 print(D["α"])

/usr/lib/python3.9/site-packages/numpy/lib/npyio.py in __getitem__(self, key)
    258                 return self.zip.read(key)
    259         else:
--> 260             raise KeyError("%s is not a file in the archive" % key)
    261 
    262 

KeyError: 'α is not a file in the archive'

Indeed if I print the keys of the loaded file I get some different unicode string:

>>> list(D.keys())
['╬▒']

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions