Skip to content

alexriedel1/HPA-Single-Cell-Classification-Place-21-Solution

Repository files navigation

HPA Single Cell Classification Place 21 Solution

This is my solution for the Kaggle HPA Cell Classification Challenge 2021
I'm verry happy to get my first silver medal on Kaggle and want to share my approach with you.

And thanks to phalanx and his post on Puzzle-CAM for good inspiration.
Cheers also to Sanghyun Jo and In-Jae Yu for their publication on Puzzle-CAM. If this approach is not familiar to you, you find informations in the paper

General Approach

Like many of you, I read a lot a weakly-labeled instance segmentation and eventually wanted to go for a image-level training and inferencing method. For this to achieve, a model producing good Class-Activation-Maps was needed so I decided to try Puzzle-CAM and do some mapping magic for inferencing to get probabilities from my CAMs.

Remarks about this approach:

  1. No single cell segmentation has to be done for training
  2. The more explainable the model is, the better classification results will be made

Highly discriminative Class-Activation-Maps for the predicted classes

Training

I trained according to the Puzzle-CAM paper with each images being tiled to four single images and considering the full-image CAMs versus the tiled-image CAMs in a loss function. I used a ResNest-101 and an EfficientNet-B4 with the according GAP Layers added and Focal Loss function.

The following diagram shows the training process for the CAM-Models, exemplary for ResNest-101

Training Time
3 Epochs for CAM-models on V100:
3x EfficientNet-B4: 120 min per model = 360 min 2x ResNeSt-101: 360 min per model = 720 min
10 Epochs for Image-level-models on TPU:
ViT, ResNext, EfficientNet-B7: 120 min = 360 min
Total: 24h

Inferencing

Here's the interesting part. I'm simply multiplying the CAM of each class with the cell mask of each cell and the class probability the model produces (using a Swish-Activation to obtain the CAMs gives slightly better results than raw CAMs or ReLU). This gives very large class activated values for each class for each cell, which have to be mapped to real class probabilities and I used two approaches for this:

  • standardize the values of each image using sklearn.preprocessing.StandardScaler and applying a sigmoid function to these values (works surprisingly good)
  • Do the inferencing on the single-class labeled train data to get the raw values and train a gradient boosting regressor to learn the according label (0..1) for each class (to make sure, that the right mapping function, that might be different from the sigmoid function, is found)

The gradient boosting regressors (GBR) is trained for every model the final inference will be done on, to ensure that the potentially different CAM-intensities across the models don't have an impact on the prediction quality. The two GBR for the ResNest-101 models show the following function for the mapping of the normalized-CAM-intensities to the probability of the target class

In the end I combined both approaches. To get probability values from the intensity of the CAMs was the most crucial step in my inferencing pipeline and improving this component gave the highest rises in mAP on the test images.

The following diagram shows the inferencing process

remarks on making the code work

  • Train the Puzzle-CAM models starting here: train-cam-model/train_cam.py
    The models are trained on the standard data provided by HPA and an additional dataset containing more data provided by HPA with only "rare" classes. Be sure to place these datasets in train-cam-model/train_data/ Also carefully check train-cam-model/config.py about hyperparameters and models

  • Train the Image-Level-Models starting here: train-image-level-model/train_img.py
    This script is best run in a Kaggle TPU-Instance as it's TPU-Training and not downloading the image data to a local space but gets them directly from GCS where the Kaggle datasets are stored.

  • Train the Gradient Boosting Regressor starting here: train-gradboost-regressor/train_grad_boost.py
    I'm providing the inferencing CAM-intensity data for every model used in my final submission here to train the regressor.

  • Inferencing starting here: inference/inference.py
    I'm providing my models via the same link as above. Be sure to figure out paths and everything, this can be painful in python...

About

21st place HPA Challenge 2021 solution

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published