Why the difference in patch extraction approaches (random vs. sequential) for iterable and in-memory datasets? #282

carshadi · 2024-11-26T21:50:37Z

Hello, thanks for creating this package!

I am training various Noise2Void (+V2, Struct) models on 3D SPIM data and need to use the PathIterableDataset since my training images are rather large (1024,1024,1024) and the patches (64, 128, 128) don't all fit into system and especially GPU memory, particularly when using augmentations. I was wondering why there is a difference in patch extraction schemes when using the iterable dataset (random) vs. the in-memory dataset (sequential). In particular, I am wondering whether there is a higher probability of over-representing the same regions of the training data when using random patch extraction, since the total number of patches is still given by image_shape / patch_shape.

I have noticed that my trained models can have trouble when inferring on images with more high intensity information, even though these images were also used for training. Please see the below image, where the top row is the validation images and the bottom three rows are the model predictions, all set to the same display range (0-100). Certain predicted images have the minimum intensity raised abnormally high. I am wondering if this could be due in part to the patch extraction method, and if a non-overlapping sequential method could help. I will work on implementing this myself. Otherwise, I am curious if you have any advice for why this could occur and how to mitigate it. Thanks!

The text was updated successfully, but these errors were encountered:

jdeschamps · 2024-11-27T17:20:13Z

Hi @carshadi !

Thanks for sharing your example!!!

Could you give more details on your images? Are you training on multiple images? Can you show some full slice examples? That would help to get a sense of the heterogeneity of the training data. What are the "cratio" and are some plots totally white because of the intensity scaling (your abnormally high background)? And what are the histograms at the bottom?

I am wondering how ubiquitous are the high intensity areas, and whether, if they are scarce, the reason for your issue is that they are not seen often enough to be allow the model to converge well for these structures?

structN2V seems like it did not converge well, the images look strange.

If you have multiple images, I'd try training independent networks on each image and compare the results. In case the noise models are very different from one image to the other for some reason, this could prevent proper convergence.

I would be surprised if the problem is the patching. Long story short, the reason for the different patching strategies has more to do with technical debt than reasoning and experience... We are currently in the process of rewriting the datasets, to be easier to maintain and be compatible with NGFF, and will probably stick with the random patching.

Random patching is the way to go usually. It provides more diversity than non-overlapping sequential patching for two reasons: (i) the patches are different from one epoch to the other, (ii) a fixed sequential patching limits where different structures are in the patches (fixed grid). With a lot of data the difference should be very minimal. Note that the random patching simply leads to less probability to see the pixels on the edges (patch // 2 from the border), but all other pixels have the same probability to be selected each epoch. Some overlap happens, but it is averaged out over the epochs. As part of the dataset rewriting, we will benchmark this properly to show it once and for all in this project.

If you do implement the sequential patching for testing purposes, please let us know how it goes!

carshadi · 2024-12-04T18:22:53Z

Hi @jdeschamps ,

I appreciate your quick response and apologize for my late reply!

Could you give more details on your images?

Certainly, I am training these networks using 7 sub-volumes of shape 1024^3 that are all taken from the same whole-brain volume with shape (Z: 28790, Y: 28746, X: 62301). I split each training volume into 8 tiles with shape 512^3 and hold out 1 tile for validation, which are shown in my previous image.

The full dataset was acquired using light sheet fluorescence microscopy on a ~3X expanded cleared specimen. In our case, the foreground are neurites (axons+dendrites) labelled with a fluorescent marker (GFP+AF488). The stochastic nature of the labelling and expression can mean some neurons are brightly labelled while others are much dimmer, resulting in a wide range of signal intensities, which are roughly correlated with neuron type and brain area. This is also true for the background noise distribution, which can vary due to differences in tissue autofluorescence across the brain. The 7 training volumes were taken from different regions of the brain, and have different intensity characteristics which should ideally cover much of the possible variation.

Here is a neuroglancer link showing the full brain dataset these images were taken from. You'll notice that the background is very low (~10-30 counts) compared to the signal (~100-50000 counts)

The histogram at the bottom shows the intensity distribution for each raw validation image. Some examples have much brighter neurons, which accounts for the longer histogram tail. These brighter examples were "whited out" after prediction when using the same intensity scaling for all images.

Our goal with the denoising is to improve compression ratios for lower storage costs, since in theory we could remove much of the high frequency information while retaining signal fidelity. The 'cratio' is the compression ratio of the predicted data after encoding with Blosc Zstd.

If you have multiple images, I'd try training independent networks on each image and compare the results. In case the noise models are very different from one image to the other for some reason, this could prevent proper convergence.

I can try this. In the end though, we will probably want a single trained network to be able to cover all of the possible variation within a single brain, since it could be non-trivial to apply multiple networks in different regions of the same dataset. I will also try including more training examples and training for a longer period to see if the model will converge. I did only train these models for ~25k steps, so perhaps more epochs will help as well.

I see the reasoning behind the random patching, and agree that this is probably not the reason behind the intensity scaling issue. I will keep training and let you know how it goes. Thanks!

jdeschamps · 2024-12-10T12:38:04Z

Interesting, thanks for sharing! (just love the neuroglancer link, so handy)

So all the tiles from the microscope have different dynamical ranges. CAREamics will compute a mean and std over the entire training dataset, which will lead to non optimal normalization of the patches (since depending on the tile, they would have different intensities).

Have you considered normalizing the tiles individually before feeding them to CAREamics?

You could have a zero-mean and unit-variance normalization per tile, then train CAREamics, predict and apply the inverse normalization operation if you want to recover the intensity.

carshadi added the feature New feature or request label Nov 26, 2024

jdeschamps added question Further information is requested example Examples of using CAREamics and removed feature New feature or request labels Dec 6, 2024

jdeschamps mentioned this issue Dec 6, 2024

General framework for dataset, format-agnostic and compatible with NGFF #292

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why the difference in patch extraction approaches (random vs. sequential) for iterable and in-memory datasets? #282

Why the difference in patch extraction approaches (random vs. sequential) for iterable and in-memory datasets? #282

carshadi commented Nov 26, 2024

jdeschamps commented Nov 27, 2024

carshadi commented Dec 4, 2024

jdeschamps commented Dec 10, 2024

Why the difference in patch extraction approaches (random vs. sequential) for iterable and in-memory datasets? #282

Why the difference in patch extraction approaches (random vs. sequential) for iterable and in-memory datasets? #282

Comments

carshadi commented Nov 26, 2024

jdeschamps commented Nov 27, 2024

carshadi commented Dec 4, 2024

jdeschamps commented Dec 10, 2024