Save and load in binary, compatible with NumPy/Matlab and others #486

certik · 2021-08-18T22:36:43Z

NumPy: https://numpy.org/devdocs/reference/generated/numpy.lib.format.html
Matlab: https://www.mathworks.com/help/matlab/import_export/mat-file-versions.html
SciPy code to load/write MAT files: https://docs.scipy.org/doc/scipy/reference/generated/scipy.io.loadmat.html
Julia library to read/write MAT files: https://github.com/JuliaIO/MAT.jl
Specification of MAT file format: https://www.mathworks.com/help/pdf_doc/matlab/matfile_format.pdf

First requested here.

milancurcic · 2021-08-25T14:50:42Z

This seems useful and in scope. I often work with MAT files (of various versions) from colleagues and I use SciPy.io loadmat and savemat.

For my own interoperable binary data between Fortran and Python, I use NetCDF. I don't think any of the language-specific binary formats will beat it in terms of features, performance, or stability. Likewise for HDF5 which is suitable for unstructured data.

certik · 2021-08-26T20:09:51Z

Both NetCDF and HDF5 are great. The only issue with HDF5 is that there is literally only one library that can read and write it and it's not that easy to build and ship. It's not easy to write a writer in pure Fortran, as an example. While it is easy for the .npy NumPy array format, I've done it in the past, although I can't find the code right now. :(

So that makes me hesitant to just depend on HDF5. However, it is worth investigating what would it take to just support a very small subset of HDF5, say for writing a set of double precision arrays. It might not be that difficult to write a writer for just such a small subset in pure Fortran. Here is the format: https://portal.hdfgroup.org/display/HDF5/File+Format+Specification

The huge advantage of that would be no dependency on the hdf5 library, and using a widely supported format.

Beliavsky · 2021-08-27T00:51:23Z

There is

NPY for Fortran: allows saving numerical Fortran arrays in Numpy's .npy or .npz format, by MRedies

which I have not tried.

MarDiehl · 2021-09-03T09:50:27Z

There is already an HDF5 writer/reader which looks promising: https://github.com/geospace-code/h5fortran. I uses the Fortran bindings of the C library.
I think it is reasonable to keep HDF5 support out of stdlib, it is neither part of the C nor the python standard library.

awvwgk · 2021-11-28T23:15:09Z

I got the basic structure for reading and writing npy files implemented in #581. Needs some polishing, especially the reading, and much more unit tests to cover all possible errors the loading can encounter.

TejasAvinashShetty · 2021-12-02T09:19:41Z

libnpy seems to be a library that provides simple routines for saving a C or Fortran array to a data file using NumPy's own binary format.
Please see https://scipy-cookbook.readthedocs.io/items/InputOutput.html

Not my idea See first CAZT's comment on CAZT's stackoverflow answer

jvdp1 · 2021-12-02T09:41:24Z

There is already an HDF5 writer/reader which looks promising: https://github.com/geospace-code/h5fortran. I uses the Fortran bindings of the C library. I think it is reasonable to keep HDF5 support out of stdlib, it is neither part of the C nor the python standard library.

I agree with @MarDiehl . I recently used @scivision 's h5fortran and found it great and really easy to use. Therefore, I also think reasonable to keep HDF5 support out of stdlib for the moment.

awvwgk · 2021-12-10T20:02:57Z

How do we want to handle the npz format? It is a zip archive with npy files. Probably, we have to develop a general interface for interacting with compressed archives first.

For the mat format I found a specification of the layout (linked in description at the top), should be straight-forward to code up, but I don't think I have a matlab version I could use to verify it, but I could try SciPy.

ivan-pi · 2021-12-12T15:55:24Z

For the mat format I found a specification of the layout (linked in description at the top), should be straight-forward to code up, but I don't think I have a matlab version I could use to verify it, but I could try SciPy.

Was your idea to implement the reader/writer entirely in Fortran based upon the PDF document, or call into the MATLAB C API to Read MAT-File Data? The latter requires the client has the libmat shared run-time library located in matlabroot/bin/arch.

awvwgk · 2021-12-12T16:03:20Z

I was reading the specs, sounds easy enough to implement this from scratch and verify using SciPy. Unfortunately, the data can be compressed, and we need an interface to zlib or similar first.

Having the possibility to dynamically load a library with dlopen in case the matlab runtime libraries are around would be another option. However, than we first need an interface for dynamic loading.

arjenmarkus · 2021-12-13T08:09:53Z

My dynlib module in https://sourceforge.net/p/flibs/svncode/HEAD/tree/trunk/src/dynlib/ could serve as a starting point. Op zo 12 dec. 2021 om 17:03 schreef Sebastian Ehlert < ***@***.***>:

…

I was reading the specs, sounds easy enough to implement this from scratch and verify using SciPy. Unfortunately, the data can be compressed, and we need an interface to zlib or similar first. Having the possibility to dynamically load a library with dlopen in case the matlab runtime libraries are around would be another option. However, than we first need an interface for dynamic loading. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#486 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAN6YR7K6GAMFCJOU6EIQWTUQTBVJANCNFSM5CM7FYQQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

adenchfi · 2022-02-09T20:41:39Z

I see stdlib has save_npy and load_npy functionality! I tested it out and it works great! I was wondering, if possible, if dim(:,:,:,:) arrays could also be supported. I only see interfaces up to rank-3.

awvwgk · 2022-02-09T20:44:36Z

They should be supported up to the maximum rank stdlib was configured for. The docs are only generated up to rank 3 to save space, while the fpm version allow up to rank 4, the CMake version can go up to rank 15.

adenchfi · 2022-02-09T20:58:23Z

Oh, thanks! I should have tested first, I took the docs too literally.

ivan-pi · 2022-05-04T11:33:33Z

While scrolling through the ARCHER2 super-computing service documentation I learned there is BSD-licensed library for MATLAB MAT files called matio. It also has a Fortran interface (help wanted tbeu/matio#51), however it is doesn't appear to use the standard Fortran/C interoperability.

As @awvwgk has remarked above, supporting MATLAB binary files would require a zlib interface and potentially also HDF5, both of which are available as C libraries. It looks more straightforward to just have a thin Fortran wrapper of a C/C++ implementation, than to write an interface/implementation for zlib (and HDF5) first.

ivan-pi · 2022-05-04T12:38:04Z

How do we want to handle the npz format? It is a zip archive with npy files. Probably, we have to develop a general interface for interacting with compressed archives first.

In case of the compressed npz files created with numpy.savez_compressed, the NumPy documentation states zipfile.ZIP_DEFLATED is used which requires zlib behind the scenes.

Irrespective of how we manage to do the zipping/compression (either in C or Fortran), with respect to the zipped format a big question is how to replace positional and keyword arguments in Fortran, without getting overwhelmed by the combinatorial explosion of type/kind/rank + number of saved arrays.

ivan-pi · 2022-07-19T23:37:53Z

Since Fortran doesn't have positional or keyword arguments in the way Python does, for .npz files it seems more natural to adopt an API similar to the one in NPY for Fortran:

subroutine add_npz(zipfile,var_name,array)
   character(len=*), intent(in) :: zipfile
   character(len=*), intent(in) :: var_name
   real|complex|integer, intent(in) :: array(..)

Alternatively, we could have a handle based approach:

integer :: npz_unit
real :: A(2,2)
complex :: B(3,3)

call open_npz(newunit=npz_unit,filename="foo.npz")
call stage_npz(npz_unit,A,"A")
call stage_npz(npz_unit,B,"B")
call close_npz(npz_unit)

Since Fortran uses integer units as file handles, the concept should be familiar already.

ivan-pi · 2022-07-20T09:45:44Z

The .npz format is also useful to read Scipy sparse matrix formats (CSC, CSR, BSR, DIA, COO). See scipy.sparse.save_npz for a description. The implementation can be found here. Note the keywords in the dictionary creation specify the array names.

milancurcic mentioned this issue Aug 27, 2021

Collaborate with Fortran Standard Library to support npy format I/O MRedies/NPY-for-Fortran#5

Open

awvwgk added the topic: IO Common input/output related features label Sep 18, 2021

awvwgk mentioned this issue Nov 27, 2021

Add routines for saving/loading arrays in npy format #581

Merged

4 tasks

ivan-pi mentioned this issue Jan 17, 2024

Support for I/O of standard formats #763

Open

minhqdao mentioned this issue Feb 18, 2024

Add minizip and extract .npz files #771

Closed

8 tasks

This was referenced Aug 23, 2024

Handle npz files minhqdao/stdlib#2

Closed

Handle npz files #865

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Save and load in binary, compatible with NumPy/Matlab and others #486

Save and load in binary, compatible with NumPy/Matlab and others #486

certik commented Aug 18, 2021 •

edited by awvwgk

Loading

milancurcic commented Aug 25, 2021

certik commented Aug 26, 2021

Beliavsky commented Aug 27, 2021

MarDiehl commented Sep 3, 2021

awvwgk commented Nov 28, 2021

TejasAvinashShetty commented Dec 2, 2021

jvdp1 commented Dec 2, 2021

awvwgk commented Dec 10, 2021

ivan-pi commented Dec 12, 2021

awvwgk commented Dec 12, 2021

arjenmarkus commented Dec 13, 2021 via email

adenchfi commented Feb 9, 2022

awvwgk commented Feb 9, 2022

adenchfi commented Feb 9, 2022

ivan-pi commented May 4, 2022

ivan-pi commented May 4, 2022

ivan-pi commented Jul 19, 2022

ivan-pi commented Jul 20, 2022

Save and load in binary, compatible with NumPy/Matlab and others #486

Save and load in binary, compatible with NumPy/Matlab and others #486

Comments

certik commented Aug 18, 2021 • edited by awvwgk Loading

milancurcic commented Aug 25, 2021

certik commented Aug 26, 2021

Beliavsky commented Aug 27, 2021

MarDiehl commented Sep 3, 2021

awvwgk commented Nov 28, 2021

TejasAvinashShetty commented Dec 2, 2021

jvdp1 commented Dec 2, 2021

awvwgk commented Dec 10, 2021

ivan-pi commented Dec 12, 2021

awvwgk commented Dec 12, 2021

arjenmarkus commented Dec 13, 2021 via email

adenchfi commented Feb 9, 2022

awvwgk commented Feb 9, 2022

adenchfi commented Feb 9, 2022

ivan-pi commented May 4, 2022

ivan-pi commented May 4, 2022

ivan-pi commented Jul 19, 2022

ivan-pi commented Jul 20, 2022

certik commented Aug 18, 2021 •

edited by awvwgk

Loading