Skip to content
This repository has been archived by the owner on Apr 10, 2023. It is now read-only.

Help on read/write to disk with parallel compression #26

Open
snehashis-roy opened this issue Jun 3, 2020 · 1 comment
Open

Help on read/write to disk with parallel compression #26

snehashis-roy opened this issue Jun 3, 2020 · 1 comment

Comments

@snehashis-roy
Copy link

Hello.
I am trying to use the parallel feature of Blosc compression to write (and read) large (~100GB) dataset to disk, similar to h5py's (or zarr's) create_dataset. I could not find a simple example on how to write numpy ndarray to disk. Could you please point me to the right direction?
Thanks.

@aleixalcacer
Copy link
Member

Hi @piby2 ,

In latest version, there is a benchmarking file that can help you: https://github.com/Blosc/cat4py/blob/master/bench/compare_getslice.py

To be able to run it, you should go to latest version either cloning cat4py again (recommended option)

git clone --recurse-submodules https://github.com/Blosc/cat4py

or updating master branch to last commit.

Then compile it using:

rm -rf _skbuild cat4py/*.so*  # If you have a previous build
python setup.py build_ext --build-type=RelWithDebInfo

To check the installation, run:

PYTHONPATH=. pytest

Finallly, run the benchmark:

PYTHONPATH=. python bench/compare_getslice.py 1  # 1 enables persistency in this benchmark

PS: You shouldn't use the -O0 flag, it disables all compiler optimizations.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants