Help on read/write to disk with parallel compression #26

snehashis-roy · 2020-06-03T00:13:46Z

Hello.
I am trying to use the parallel feature of Blosc compression to write (and read) large (~100GB) dataset to disk, similar to h5py's (or zarr's) create_dataset. I could not find a simple example on how to write numpy ndarray to disk. Could you please point me to the right direction?
Thanks.

aleixalcacer · 2020-06-03T08:31:05Z

Hi @piby2 ,

In latest version, there is a benchmarking file that can help you: https://github.com/Blosc/cat4py/blob/master/bench/compare_getslice.py

To be able to run it, you should go to latest version either cloning cat4py again (recommended option)

git clone --recurse-submodules https://github.com/Blosc/cat4py

or updating master branch to last commit.

Then compile it using:

rm -rf _skbuild cat4py/*.so*  # If you have a previous build
python setup.py build_ext --build-type=RelWithDebInfo

To check the installation, run:

PYTHONPATH=. pytest

Finallly, run the benchmark:

PYTHONPATH=. python bench/compare_getslice.py 1  # 1 enables persistency in this benchmark

PS: You shouldn't use the -O0 flag, it disables all compiler optimizations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Help on read/write to disk with parallel compression #26

Help on read/write to disk with parallel compression #26

snehashis-roy commented Jun 3, 2020

aleixalcacer commented Jun 3, 2020

Help on read/write to disk with parallel compression #26

Help on read/write to disk with parallel compression #26

Comments

snehashis-roy commented Jun 3, 2020

aleixalcacer commented Jun 3, 2020