Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: Attributes look like h5py attributes when using the pyfive backend. #6

Merged
merged 2 commits into from
Jan 28, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 16 additions & 5 deletions h5netcdf/attrs.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ def __getitem__(self, key):
if self._h5py.__name__ == "h5py":
attr = self._h5attrs.get_id(key)
else:
# pyfive backend
attr = self._h5attrs[key]

# handle Empty types
Expand Down Expand Up @@ -66,11 +67,21 @@ def __getitem__(self, key):
# transform string array to list
if not np.isscalar(output):
output = output.tolist()

# return item if single element list/array
# see https://github.com/h5netcdf/h5netcdf/issues/116
if not np.isscalar(output) and len(output) == 1:
return output[0]
else:
# pyfive backend: There is no '_h5py.check_string_dtype'
# method, but we only have to deal with
# the case of a numpy array of strings.
try:
if output.dtype == object:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work ok with attributes? Is output the actual array at this point? In which case I imagine it would work fine. Coz this is all in attributes, not data, right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I wonder whether implementing a check_string_dtype method might solve my other problems with dtypes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - this is just attributes. No data arrays here!

Actually, I wonder whether implementing a check_string_dtype method might solve my other problems with dtypes?

I think implementing would indeed work. Looks straight forward: https://github.com/h5py/h5py/blob/master/h5py/h5t.pyx#L1893-L1913.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you fancy doing that in pyfive? It would be consistent with the other stuff you are cleaning up. Then this code can be more consistent h5py and we have some more support for this stuff there.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I do.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bit of a can of worms.

  • The h5py check_string_dtype expects h5py.h5a.AttrID objects, whereas pyfive gives us numpy or str objects. I could delve into the h5py object to see what's in it, but I do know that it relies in the np.dtype.metadata mappingproxy, which is something defined elsewhere in h5py (not standard numpy), and (quote from docs) is "long undocumented and is not well supported. Some aspects of metadata propagation are expected to change in the future."

  • Then there's python2 code in the function: if vlen_kind is unicode: (https://github.com/h5py/h5py/blob/master/h5py/h5t.pyx#L1905). I guess this works because cython still understands it, but our own python3 would not. Is unicode in python2 the same as str in python3. Probably? Am I reading this right?

Anyway, there's enough here to make me want to not implement our own check_string_dtype at this time.

# transform string array to list
output = output.tolist()
except AttributeError:
pass

# return item if single element list/array see
# https://github.com/h5netcdf/h5netcdf/issues/116
if not np.isscalar(output) and len(output) == 1:
return output[0]

return output

Expand Down
1 change: 1 addition & 0 deletions h5netcdf/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -419,6 +419,7 @@ def __getitem__(self, key):

# get padding
padding = self._get_padding(key)

# apply padding with fillvalue (both api)
if padding:
fv = self.dtype.type(self._h5ds.fillvalue)
Expand Down
Binary file modified h5netcdf/tests/test.nc
Binary file not shown.
Binary file modified test.nc
Binary file not shown.