Predicting cellular phospholipidosis on different cell lines using repurposing libraries and machine learning

About

This is the source and data repository for the using ML models to predicting the cellular phospholipidosis activity of compounds. A subsequent publication titled "Predicting cellular phospholipidosis on different cell lines using repurposing libraries and machine learning" is under preparation.

For our model training workflow, we have leveraged both KNIME and Python frameworks to allow both communities to reuse our work. Below we describe in detail the Python framework only. For the KNIME framework, please take a look here. # TODO: Add link to KNIME space

Data organization

TODO: Add tree here

How to use our model?

How did we build the dataset?

The dataset was built on the KNIME workflow. So more details can be found either in our manuscript or the KNIME workflow.

How to build own model in Python?

We use the conda environment to build and run our codes. Please follow the following steps to build the conda environment with all the necessary python packages

git clone https://github.com/Fraunhofer-ITMP/PLD.git
conda create --name=pld python=3.9
conda activate pld
conda cd PLD
pip install -r requirements.txt

To use the Jupyter notebooks, you need to ensure that the conda environment is available for use. To do so, following the following lines in the terminal.

pip install ipykernel
python -m ipykernel install --user --name=pld

After this, "pld" should be displayed as a kernel in your VSCode environment. Alternatively, you could spin the jupyter notebook from the conda environment itself using the following command: jupyter notebook

Sample the modelling effort on Phospholipidosis together with Karolinska data.

It contains data input KNIME workflow and a Python notebook which implements XGBoost classification model reported in the publications (ref)

Show_database_app.py is a Streamlit app which allows user to "see" the training set that has been used and eventually the XOR dataset which is not part of the training set. Moreover, it shows the top10 most important features of any model saved as pickle file and provides a set of boxplot to visualize how much these features really are different in the labelling group ('Active' - 'Inactive')

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
data		data
figures		figures
notebooks		notebooks
.gitignore		.gitignore
Chembl_ML_PLD.ipynb		Chembl_ML_PLD.ipynb
Confusion_Matrix_XGBoost_SMOTE_chemphys.JPG		Confusion_Matrix_XGBoost_SMOTE_chemphys.JPG
Confusion_matrix_XGBoost.PNG		Confusion_matrix_XGBoost.PNG
LICENSE		LICENSE
PLD_tSNE.html		PLD_tSNE.html
Phospholipidosis_v4_AND.knwf		Phospholipidosis_v4_AND.knwf
README.md		README.md
Show_database_app.py		Show_database_app.py
Top10_feature_importance_python.PNG		Top10_feature_importance_python.PNG
final_model_PLD_XGBoost.pkl		final_model_PLD_XGBoost.pkl
requirements.txt		requirements.txt
top10_features_boxplots_with_significance.png		top10_features_boxplots_with_significance.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predicting cellular phospholipidosis on different cell lines using repurposing libraries and machine learning

TOC

About

Data organization

How to use our model?

How did we build the dataset?

How to build own model in Python?

About

Releases

Packages

Contributors 2

Languages

License

Fraunhofer-ITMP/PLD

Folders and files

Latest commit

History

Repository files navigation

Predicting cellular phospholipidosis on different cell lines using repurposing libraries and machine learning

TOC

About

Data organization

How to use our model?

How did we build the dataset?

How to build own model in Python?

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages