by Zhi-Yi Chin
This repository is implementation of homework1 for IOC5008 Selected Topics in Visual Recognition using Deep Learning course in 2021 fall semester at National Yang Ming Chiao Tung University.
In this homework, we participated a bird image classification challenge hosted on CodaLab. This challenge provided a total of 6,033 bird images (3000 for training and 3033 for testing) belonging to 200 bird species. For the fairness of this challenge, external data is not allowed to train our model. We can use pre-trained models that have been released by other people online, but is limited to pre-train on ImageNet.
In data-preprocessing, we applied many data augmentations: RandomAffine, RandomGrayscale, RandomHorizontalFlip, RandomPerspective, RandomVerticalFlip, ColorJitter, RandomShift, color transform, RandomRotation. We applied EfficientNet-b4 as our classification model.
You can download a copy of all the files in this repository by cloning this repository:
git clone https://github.com/joycenerd/bird-images-classification.git
You need to have Anaconda or Miniconda already installed in your environment. To install requirements:
conda env create -f environment.yml
conda activate classify
You can download the raw data after you have registered the challenge mention above. You can get my training and evaluation data split by moving two text files in data/
into your downloaded raw data directory. Or you can generate these two text files by yourself by running this command:
python train_eval_split.py --data-root <path_to_data>
There will be two files generated in your raw data directory:
new_train_label.txt
: record the training images path and labelsnew_eval_label.txt
: record the validation images path and labels
You should have Graphics card to train the model. For your reference, we trained on 2 NVIDIA RTX 1080Ti. To train the model, run this command:
python train.py [-h] --data-root DATA_ROOT [--model MODEL] [--lr LR] --gpu GPU
[GPU ...] [--epochs EPOCHS] [--num-classes NUM_CLASSES]
[--train-batch-size TRAIN_BATCH_SIZE]
[--dev-batch-size DEV_BATCH_SIZE] [--num-workers NUM_WORKERS] --logs
LOGS [--img-size IMG_SIZE]
optional arguments:
-h, --help show this help message and exit
--data-root DATA_ROOT
Your dataset root directory
--model MODEL which model
--lr LR learning rate
--gpu GPU [GPU ...] gpu device
--epochs EPOCHS num of epoch
--num-classes NUM_CLASSES
The number of classes for your classification problem
--train-batch-size TRAIN_BATCH_SIZE
The batch size for training data
--dev-batch-size DEV_BATCH_SIZE
The batch size for validation data
--num-workers NUM_WORKERS
The number of worker while training
--logs LOGS Directory to save all your checkpoint.pth
--img-size IMG_SIZE Input image size
Recommended training command:
python train.py --data-root <path_to_data> --gpu 0 1 --logs ./efficientnet_b4 --lr 0.005 --model efficientnet-b4 --img-size 380 --train-batch-size 21 --dev-batch-size 8
The logging directory will be generated in the path you specified for --logs
. Inside this logging directory you can find:
checkpoints/
: All the training checkpoints will be saved inside here. Checkpoints is saved every 10 epochs andbest_model.pth
save the current best model based on evaluation accuracy.events/
: Tensorboard event files, you can visualize your experiment bytensorboard --logdir events/
logger/
: Some information that print on screen while training will be log into here for your future reference.
You can test your training results by running this command:
python test.py [-h] [--data-root DATA_ROOT] [--ckpt CKPT] [--img-size IMG_SIZE]
[--num-classes NUM_CLASSES] [--net NET] [--gpu GPU]
optional arguments:
-h, --help show this help message and exit
--data-root DATA_ROOT
data root dir
--ckpt CKPT checkpoint path
--img-size IMG_SIZE image size in model
--num-classes NUM_CLASSES
number of classes
--net NET which model
--gpu GPU gpu id
Run this command to zip
your submission file:
zip answer.zip answer.txt
You can upload answer.zip
to the challenge. Then you can get your testing score.
Click into Releases. Under EfficientNet-b4 model download efficientnet-b4_best_model.pth
. This pre-trained model get accuracy 72.53% on the test set.
Recommended testing command:
python test.py --data-root <path_to_data> --ckpt <path_to_checkpoint> --img-size 380 --net efficientnet-b4 --gpu 0
answer.txt
will be generated in this directory. This file is the submission file.
To reproduce our results, run this command:
python inference.py --data-root <path_to_data> --ckpt <pre-trained_model_path> --img-size 380 --net efficientnet-b4 --gpu 0
To reproduce our submission without retraining, do the following steps
- Getting the code
- Install the dependencies
- Download the data
- Download pre-trained models
- Inference
- Submit the results
Our model achieves the following performance:
EfficientNet-b4 w/o sched | EfficientNet-b4 with sched | |
---|---|---|
acc | 55.29% | 72.53% |
If you find our work useful in your project, please cite:
@misc{
title = {bird_image_classification},
author = {Zhi-Yi Chin},
url = {https://github.com/joycenerd/bird-images-classification},
year = {2021}
}
If you'd like to contribute, or have any suggestions, you can contact us at [email protected] or open an issue on this GitHub repository.
All contributions welcome! All content in this repository is licensed under the MIT license.