TrialEnroll: Predicting Clinical Trial Enrollment Success with Deep & Cross Network and Large Language Models

Congratulations! This paper has been accepted at the 2024 ACM-BCB Conference!

Overview

This repository contains the code for the paper "TrialEnroll: Predicting Clinical Trial Enrollment Success with Deep & Cross Network and Large Language Models." The project leverages advanced machine learning techniques to predict the success of clinical trial enrollments.

Requirements

Python 3.10
PyTorch 2.3

Data Sources

Getting Started

1. Download the ClinicalTrial Data

cd data
wget https://clinicaltrials.gov/AllPublicXML.zip

2. Decompress the Data File

unzip AllPublicXML.zip -d trials
find trials/* -name "NCT*.xml" | sort > trials/all_xml.txt

3. Download IQVIA Label Data

Download the IQVIA label data from here and place it in the data/ directory. Rename the file trial_outcomes_v1.csv to IQVIA_trial_outcomes.csv.

Data Preprocessing

1. Preprocess Clinical Trial Data

Navigate to the preprocessing directory and run the following scripts:

cd TrialEnroll/preprocess
python collect_age.py
python collect_location.py
python collect_str.py
python collect_time.py
python save_df.py

2. Preprocess LLM Generated Features

Download the Mistral-7B-Instruct model:

huggingface-cli download --resume-download mistralai/Mistral-7B-Instruct-v0.3 --local-dir 7B-Instruct-v0.3

Set the Mistral path in llm_request_MistralInstruct.py. Then, create the necessary directories and run the preprocessing scripts:

cd TrialEnroll/llm_emb
mkdir -p data_llm/disease/MistralInstruct data_llm/drug/MistralInstruct
python preprocess.py
python llm_request_MistralInstruct.py
python embedding.py

3. Prepare Criteria Embedding

cd TrialEnroll
python protocol_encode.py

Model Training

1. Run DCN Preprocessing

python col_preprocessing.py
python stack_features_dcn.py
python hatten_cross.py

Results

The model achieved a PR AUC of 0.7015.

Citation

If you use this code, please cite our paper:

@article{yue2024trialenroll,
  title={Trialenroll: Predicting clinical trial enrollment success with deep \& cross network and large language models},
  author={Yue, Ling and Xing, Sixue and Chen, Jintai and Fu, Tianfan},
  journal={arXiv preprint arXiv:2407.13115},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
llm_emb		llm_emb
preprocess		preprocess
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
col_preprocess.py		col_preprocess.py
hatten_cross.py		hatten_cross.py
protocol_encode.py		protocol_encode.py
save_df.py		save_df.py
stack_features_dcn.py		stack_features_dcn.py
trialenroll.png		trialenroll.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TrialEnroll: Predicting Clinical Trial Enrollment Success with Deep & Cross Network and Large Language Models

Overview

Requirements

Data Sources

Getting Started

1. Download the ClinicalTrial Data

2. Decompress the Data File

3. Download IQVIA Label Data

Data Preprocessing

1. Preprocess Clinical Trial Data

2. Preprocess LLM Generated Features

3. Prepare Criteria Embedding

Model Training

1. Run DCN Preprocessing

Results

Citation

About

Releases

Packages

Contributors 2

Languages

LeoYML/TrialEnroll

Folders and files

Latest commit

History

Repository files navigation

TrialEnroll: Predicting Clinical Trial Enrollment Success with Deep & Cross Network and Large Language Models

Overview

Requirements

Data Sources

Getting Started

1. Download the ClinicalTrial Data

2. Decompress the Data File

3. Download IQVIA Label Data

Data Preprocessing

1. Preprocess Clinical Trial Data

2. Preprocess LLM Generated Features

3. Prepare Criteria Embedding

Model Training

1. Run DCN Preprocessing

Results

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages