TrialEnroll: Predicting Clinical Trial Enrollment Success with Deep & Cross Network and Large Language Models
Congratulations! This paper has been accepted at the 2024 ACM-BCB Conference!
This repository contains the code for the paper "TrialEnroll: Predicting Clinical Trial Enrollment Success with Deep & Cross Network and Large Language Models." The project leverages advanced machine learning techniques to predict the success of clinical trial enrollments.
- Python 3.10
- PyTorch 2.3
cd data
wget https://clinicaltrials.gov/AllPublicXML.zip
unzip AllPublicXML.zip -d trials
find trials/* -name "NCT*.xml" | sort > trials/all_xml.txt
Download the IQVIA label data from here and place it in the data/
directory. Rename the file trial_outcomes_v1.csv
to IQVIA_trial_outcomes.csv
.
Navigate to the preprocessing directory and run the following scripts:
cd TrialEnroll/preprocess
python collect_age.py
python collect_location.py
python collect_str.py
python collect_time.py
python save_df.py
Download the Mistral-7B-Instruct model:
huggingface-cli download --resume-download mistralai/Mistral-7B-Instruct-v0.3 --local-dir 7B-Instruct-v0.3
Set the Mistral path in llm_request_MistralInstruct.py
. Then, create the necessary directories and run the preprocessing scripts:
cd TrialEnroll/llm_emb
mkdir -p data_llm/disease/MistralInstruct data_llm/drug/MistralInstruct
python preprocess.py
python llm_request_MistralInstruct.py
python embedding.py
cd TrialEnroll
python protocol_encode.py
python col_preprocessing.py
python stack_features_dcn.py
python hatten_cross.py
The model achieved a PR AUC of 0.7015.
If you use this code, please cite our paper:
@article{yue2024trialenroll,
title={Trialenroll: Predicting clinical trial enrollment success with deep \& cross network and large language models},
author={Yue, Ling and Xing, Sixue and Chen, Jintai and Fu, Tianfan},
journal={arXiv preprint arXiv:2407.13115},
year={2024}
}