Skip to content

Latest commit

 

History

History
46 lines (33 loc) · 2.96 KB

README.md

File metadata and controls

46 lines (33 loc) · 2.96 KB

Malaria Detection using Cell Images

How to Run the Code

  • Prepare the data by using the data_download.ipynb notebook found in the 'Data Download' directory.
    • Tune the required height and width (parameters at the top of the notebook)
    • The output should create a Data directory containing the original cell images, and a Resized_data_ directory, containing the resized images.
  • Label the data using the labelling.ipynb notebook found in the 'Data Labelling' directory.
    • It will save a CSV of relative filenames and labels in the specified directory.
  • Create train and test splits using train_test_split.ipynb
  • Modeling scripts are in the 'Modeling' directory.

Contributors

  1. Srishti Singh, [email protected]
  2. Shreya Bhatia, [email protected]
  3. Madhava Krishna, [email protected]
  4. Harshit Goyal, [email protected]

Motivation

Malaria is a life-threatening disease affecting many people wordwide, spread by infected Anopheles mosquito bites. Earlier studies have shown that the degree of agreement between physicians on the acuteness of the disease in a given patient's sample is very low. Preliminary detection aided by computer systems can be of utmost importance for faster and reliable diagnosis. We aim to create a classifier for paratisized and non-parasitized cells to aid medical professionals in this venture.

Related Work

  • Pan, et al. (2018) created a model based on deep CNN architectures. They were able to obtain accuracies of over 90% on the training and validation samples using data augmentation.
  • Raihan and Nahid (2021) created a model based on boosted trees with feature engineering and determined feature importance using Shapely Additive Explanations (SHAP).
  • Fuhad et al. (2020) implemented a CNN based model with accuracy over 99% while being computationally efficient.

Suggested Outcomes

Automation of the diagnosis process will guarntee accurate diagnosis and, as a result, holds the possibility of providing dependable healthcare to places with limited resources. We aim to implement various algorithms for classification while attempting to find optimal parameters for optimising training time, computational complexity and performance. We will attempt transformations and feature engineering and extraction on the dataset. We are going to apply various machine learning models such as SVMs, logistic regression, decision trees, random forest, and compare the performance of all models. We intend to also attempt grayscale conversion and observe the change in behavior of the models.


Project Proposal

This browser does not support PDFs. Please download the PDF to view it: Download PDF.