PathIterable dataset with a single file and no validation will fail[BUG] #160

jdeschamps · 2024-06-21T12:50:30Z

Describe the bug
If no validation is given, the PathIterableDataset will try to split files (rather than patches) between train and validation. However, with a single file this will always throw an error.

import numpy as np
from tifffile import imwrite

from careamics.config import DataConfig
from careamics.config.support import SupportedData
from careamics.lightning import CAREamicsTrainData

rng  = np.random.default_rng(42)
data = rng.integers(0, 255, (32, 32))
data_path = Path(".") / "data.tif"
imwrite(data_path, data)

data_config = DataConfig(
    data_type=SupportedData.TIFF.value,
    patch_size=(16, 16),
    axes="YX",
    batch_size=1,
)
data_module = CAREamicsTrainData(
    data_config=data_config, 
    train_data=str(data_path),
    use_in_memory=False
)
data_module.prepare_data()
data_module.setup()

Error:

ValueError: Not enough files to split a minimum of 5 files, got 1 files.

This is only applicable for when the data does not fit in memory (according to CAREamics definition), which is an impossible case: if the data does not fit in memory, then we cannot train from this single file.

This issue cannot really be fixed, unless we complexify even further the dataset (e.g. keep the validation set in memory and extract it randomly in a first pass).

I am leaning towards waiting for the Zarr dataset, and then just retire the PathIterableDataset. We could then provide a convenience function to convert train/validation/test files into a single Zarr archive and use it for training/prediction.

The text was updated successfully, but these errors were encountered:

melisande-c · 2024-06-21T15:32:06Z

This is only applicable for when the data does not fit in memory (according to CAREamics definition), which is an impossible case: if the data does not fit in memory, then we cannot train from this single file.

Probably the error that the file is too big should be raised before we get to this point!

jdeschamps · 2024-12-17T13:36:24Z

Probably superseded by #292

jdeschamps added the bug Something isn't working label Jun 21, 2024

jdeschamps added the wontfix This will not be worked on label Dec 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PathIterable dataset with a single file and no validation will fail[BUG] #160

PathIterable dataset with a single file and no validation will fail[BUG] #160

jdeschamps commented Jun 21, 2024

melisande-c commented Jun 21, 2024

jdeschamps commented Dec 17, 2024

PathIterable dataset with a single file and no validation will fail[BUG] #160

PathIterable dataset with a single file and no validation will fail[BUG] #160

Comments

jdeschamps commented Jun 21, 2024

melisande-c commented Jun 21, 2024

jdeschamps commented Dec 17, 2024