Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drug-Drug Interaction #18

Merged
merged 63 commits into from
Oct 4, 2022
Merged
Show file tree
Hide file tree
Changes from 62 commits
Commits
Show all changes
63 commits
Select commit Hold shift + click to select a range
f1da0ff
setup initial skeleton, it only runs one fold.
andrew-thach Aug 17, 2022
d1a5511
refactored data tree, now need to add in blocked similarities and blo…
andrew-thach Aug 22, 2022
1445550
Finished migrating over blocked similarities and blocked validInterac…
andrew-thach Aug 22, 2022
ae32a5f
changed to AUCEvaluator instead of DiscreteEvaluator
andrew-thach Aug 22, 2022
b83c9e4
the similarity data is now blocked
andrew-thach Aug 23, 2022
bc40778
changed csv to txt
andrew-thach Aug 25, 2022
5dd56f0
skeleton of README
andrew-thach Aug 25, 2022
ec72f92
changed to AUCeval
andrew-thach Aug 25, 2022
ba2f5c0
implemented weight learning for one fold
andrew-thach Aug 27, 2022
96a3a39
converted csv similarity files to txt
andrew-thach Aug 28, 2022
95c6873
fixed similarity permissions, and renamed some data files
andrew-thach Aug 28, 2022
1c119cf
refactored fold 01
andrew-thach Aug 29, 2022
9c6d024
refactored fold 02
andrew-thach Aug 29, 2022
6941064
refactored fold 03
andrew-thach Aug 29, 2022
3b12134
refactored fold 04
andrew-thach Aug 29, 2022
5a2eb3c
refactored fold 05
andrew-thach Aug 29, 2022
8126723
refactored fold 06
andrew-thach Aug 29, 2022
72ef0d3
refactored fold 07
andrew-thach Aug 29, 2022
0df9c62
refactored fold 08
andrew-thach Aug 29, 2022
13294f0
refactored fold 09
andrew-thach Aug 29, 2022
1505bc3
changed name of some directories
andrew-thach Aug 29, 2022
8a1e42a
updated data files to the renames
andrew-thach Aug 29, 2022
08f35be
updated README for datasets and threshold spec
andrew-thach Aug 29, 2022
a4e6a16
update title size
andrew-thach Aug 29, 2022
dc4dc95
added a DrugBankIDs map file
andrew-thach Aug 30, 2022
2b1233e
added external script for running 10 fold CV
andrew-thach Aug 30, 2022
ea7b010
Fixed accidental stdout dump from inference/weight learning
andrew-thach Aug 30, 2022
43f6b3b
generalized the regex for the sed
andrew-thach Sep 1, 2022
0646aa5
updated fold directories
andrew-thach Sep 1, 2022
0cd3266
updated directories for different blocking mechanisms. The general-i…
andrew-thach Sep 6, 2022
c2f2529
Specified different blocking mechanisms
andrew-thach Sep 6, 2022
0d7c42d
added a couple data folds for evaluation sanity checks
andrew-thach Sep 6, 2022
ba99e66
moved similarities into the folds
andrew-thach Sep 8, 2022
2a8dd9a
fixed incorrect info about ncrd similarities
andrew-thach Sep 12, 2022
d5a9f63
Fixed the discrete/auc evaluator inconsistency. Hardcoded t=0.4 for …
andrew-thach Sep 12, 2022
00dddd4
added 2 new datasets: crd-interactions and ncrd-interactions
andrew-thach Sep 19, 2022
ee67ff9
updated experimental setup with threshold info
andrew-thach Sep 19, 2022
39f2837
offloaded data directory and a 10-fold external run script
andrew-thach Sep 19, 2022
a13f52a
renamed to directory to drug-drug-interaction
andrew-thach Sep 20, 2022
d1caea9
cleaned up lint and specified more details
andrew-thach Sep 20, 2022
b8d4f49
cleaned up lint
andrew-thach Sep 20, 2022
fe2ec95
Added integer IDs to DrugBanks.txt. Offloaded file location to data d…
andrew-thach Sep 23, 2022
16eca36
changed filenames
andrew-thach Sep 23, 2022
c572ae8
inserted drug-drug-interaction to config
andrew-thach Sep 23, 2022
2538caa
the data zip file now extracts into 3 different directories. This is …
andrew-thach Sep 23, 2022
ec3c7f6
hardcoded weights from the original run, also fixed a rule that was s…
andrew-thach Sep 26, 2022
0253fc6
modernized syntax
andrew-thach Sep 26, 2022
2ab49cb
modernized syntax with the != operator
andrew-thach Sep 27, 2022
6f56486
changed data path
andrew-thach Sep 27, 2022
dd1d31c
updated syntax
andrew-thach Sep 27, 2022
114bf47
removed double newlines and double spaces after periods.
andrew-thach Sep 27, 2022
169c4f6
removed trailing spaces
andrew-thach Sep 27, 2022
274f31b
refactored origin section
andrew-thach Sep 27, 2022
642fd05
Edited dataset intro
andrew-thach Sep 28, 2022
3a323f1
added a sed example to show how to change thresholds
andrew-thach Sep 28, 2022
c80bb1f
Edited sed command for in-place replacement
andrew-thach Sep 28, 2022
9dbc8f2
made line breaks consistent
andrew-thach Sep 28, 2022
853d45d
edited origin information. removed sed example
andrew-thach Sep 28, 2022
1fd4514
added command line usage for threshold changes
andrew-thach Sep 28, 2022
fd0bad6
Removed redundant flags
andrew-thach Sep 28, 2022
caef6d8
changed weight learning method to Guided Random Grid Search
andrew-thach Oct 3, 2022
d56fb04
added weight learning section
andrew-thach Oct 3, 2022
c1a1fa5
Changed weight learning section
andrew-thach Oct 4, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .templates/config.json
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,13 @@
"evalOptions": "--eval CategoricalEvaluator -D categoricalevaluator.categoryindexes=1 -D eval.includeobs=true",
"weightLearningOptions": ""
},
"drug-drug-interaction": {
"dataURL": "https://linqs-data.soe.ucsc.edu/public/psl-examples-data/drug-drug-interaction.zip",
"weightLearning": true,
"pslOptions": "--int-ids --eval AUCEvaluator -D aucevaluator.threshold=0.4",
"evalOptions": "--eval DiscreteEvaluator -D discreteevaluator.threshold=0.4",
"weightLearningOptions": "GuidedRandomGridSearch -D gridsearch.weights=0.01:0.1:1.0 -D randomgridsearch.maxlocations=75 -D weightlearning.evaluator=AUCEvaluator"
},
"entity-resolution": {
"dataURL": "https://linqs-data.soe.ucsc.edu/public/psl-examples-data/entity-resolution/entity-resolution-small.zip",
"weightLearning": true,
Expand Down
39 changes: 39 additions & 0 deletions drug-drug-interaction/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
### Drug-Drug Interaction

## Problem
In this example, we attempt to infer unknown drug drug interactions from a network of multiple drug-based similarities and known interactions.

## Dataset
andrew-thach marked this conversation as resolved.
Show resolved Hide resolved
There are three datasets which are derived from [DrugBank 4.3](https://go.drugbank.com/downloads/archived) and [drugs.com](Drugs.com). Each dataset below contain seven drug–drug similarities. Four of these similarity measures are drug-based: Chemical-based, Ligand-based, Side-effect-based and Annotation-based. Three similarities are between drug targets and computed by aggregating over known targets for the drugs: Sequence-based, PPI network-based, and Gene Ontology-based.
- general-interactions: This dataset has a total of 4,293 interactions across 315 drugs. There is also a file DrugBankIDs, which maps integer IDs to DrugIDs. Each fold share the same similarity data.
- crd-interactions: This dataset has a total of 10,106 CRD interactions across 807 drugs. These IDs are anonymized (so there is no DrugBankID mapping). Each fold uses different similarity data (since blocking is stricter in the crd setting).
- ncrd-interactions: This dataset has a total of 45,737 NCRD interactions across 807 drugs. These IDs are also anonymized. Each fold uses same similarity data.

## Experimental Setup
The default settings for the run script is for the dataset "general-interactions". Therefore, the evaluator thresholds must be changed when running other datasets. For example, `./run.sh -D discreteevaluator.threshold=<NEW_THRESHOLD> aucevaluator.threshold=<NEW_THRESHOLD>` will use the new values NEW\_THRESHOLD.

## Weight Learning
For each dataset, we used a Guided Random Grid Search to tune the weights. This produces slightly better results than the original weight learning method. To produce the same results from the original experiment, then disable weight learning and use the default hardcoded weights.
andrew-thach marked this conversation as resolved.
Show resolved Hide resolved

## Origin
This example is based on the work from [A Probabilistic Approach for Collective Similarity-Based Drug-Drug Interaction Prediction](https://linqs.org/publications/#id:sridhar-bio16). This [repo](https://bitbucket.org/linqs/psl-drug-interaction-prediction/src/master/) contains the original data and experiments from the paper. This example contains data which have been preprocessed and dumped from the original experiment. To reference the original work, please use this citation:

andrew-thach marked this conversation as resolved.
Show resolved Hide resolved
andrew-thach marked this conversation as resolved.
Show resolved Hide resolved
```
@article{sridhar:bio16,
title = {A Probabilistic Approach for Collective Similarity-Based Drug-Drug Interaction Prediction},
author = {Dhanya Sridhar and Shobeir Fakhraei and Lise Getoor},
journal = {Bioinformatics},
year = {2016},
publisher = {Oxford},
pages = {3175--3182},
volume = {32},
number = {20},
}
```

## Keywords
- `cli`
- `evaluation`
- `inference`
- `real data`
- `weight learning`
27 changes: 27 additions & 0 deletions drug-drug-interaction/cli/drug-drug-interaction-eval.data
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
predicates:
ATCSimilarity/2: closed
SideEffectSimilarity/2: closed
GOSimilarity/2: closed
ligandSimilarity/2: closed
chemicalSimilarity/2: closed
seqSimilarity/2: closed
distSimilarity/2: closed
validInteraction/2: closed
interacts/2: open

observations:
ATCsimilarity: ../data/drug-drug-interaction/general-interactions/00/eval/similarity_atc.txt
chemicalSimilarity: ../data/drug-drug-interaction/general-interactions/00/eval/similarity_chemical.txt
distSimilarity: ../data/drug-drug-interaction/general-interactions/00/eval/similarity_dist.txt
GOSimilarity: ../data/drug-drug-interaction/general-interactions/00/eval/similarity_go.txt
ligandSimilarity: ../data/drug-drug-interaction/general-interactions/00/eval/similarity_ligand.txt
seqSimilarity: ../data/drug-drug-interaction/general-interactions/00/eval/similarity_seq.txt
SideEffectSimilarity: ../data/drug-drug-interaction/general-interactions/00/eval/similarity_sideeffects.txt
validInteraction: ../data/drug-drug-interaction/general-interactions/00/eval/valid_interactions_obs.txt
interacts: ../data/drug-drug-interaction/general-interactions/00/eval/interacts_obs.txt

targets:
interacts: ../data/drug-drug-interaction/general-interactions/00/eval/interacts_target.txt

truth:
interacts: ../data/drug-drug-interaction/general-interactions/00/eval/interacts_truth.txt
27 changes: 27 additions & 0 deletions drug-drug-interaction/cli/drug-drug-interaction-learn.data
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
predicates:
ATCSimilarity/2: closed
SideEffectSimilarity/2: closed
GOSimilarity/2: closed
ligandSimilarity/2: closed
chemicalSimilarity/2: closed
seqSimilarity/2: closed
distSimilarity/2: closed
validInteraction/2: closed
interacts/2: open

observations:
ATCsimilarity: ../data/drug-drug-interaction/general-interactions/00/learn/similarity_atc.txt
chemicalSimilarity: ../data/drug-drug-interaction/general-interactions/00/learn/similarity_chemical.txt
distSimilarity: ../data/drug-drug-interaction/general-interactions/00/learn/similarity_dist.txt
GOSimilarity: ../data/drug-drug-interaction/general-interactions/00/learn/similarity_go.txt
ligandSimilarity: ../data/drug-drug-interaction/general-interactions/00/learn/similarity_ligand.txt
seqSimilarity: ../data/drug-drug-interaction/general-interactions/00/learn/similarity_seq.txt
SideEffectSimilarity: ../data/drug-drug-interaction/general-interactions/00/learn/similarity_sideeffects.txt
validInteraction: ../data/drug-drug-interaction/general-interactions/00/learn/valid_interactions_obs.txt
interacts: ../data/drug-drug-interaction/general-interactions/00/learn/interacts_obs.txt

targets:
interacts: ../data/drug-drug-interaction/general-interactions/00/learn/interacts_target.txt

truth:
interacts: ../data/drug-drug-interaction/general-interactions/00/learn/interacts_truth.txt
12 changes: 12 additions & 0 deletions drug-drug-interaction/cli/drug-drug-interaction.psl
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
5: ATCSimilarity(D1, D2) & interacts(D1, D3) & validInteraction(D1, D3) & validInteraction(D2, D3) & (D2 != D3) & (D1 != D2) -> interacts(D2, D3) ^2
5: SideEffectSimilarity(D1, D2) & interacts(D1, D3) & validInteraction(D1, D3) & validInteraction(D2, D3) & (D2 != D3) & (D1 != D2) -> interacts(D2, D3) ^2
5: GOSimilarity(D1, D2) & interacts(D1, D3) & validInteraction(D1, D3) & validInteraction(D2, D3) & (D2 != D3) & (D1 != D2) -> interacts(D2, D3) ^2
5: ligandSimilarity(D1, D2) & interacts(D1, D3) & validInteraction(D1, D3) & validInteraction(D2, D3) & (D2 != D3) & (D1 != D2) -> interacts(D2, D3) ^2
5: chemicalSimilarity(D1, D2) & interacts(D1, D3) & validInteraction(D1, D3) & validInteraction(D2, D3) & (D2 != D3) & (D1 != D2) -> interacts(D2, D3) ^2
5: seqSimilarity(D1, D2) & interacts(D1, D3) & validInteraction(D1, D3) & validInteraction(D2, D3) & (D2 != D3) & (D1 != D2) -> interacts(D2, D3) ^2
5: distSimilarity(D1, D2) & interacts(D1, D3) & validInteraction(D1, D3) & validInteraction(D2, D3) & (D2 != D3) & (D1 != D2) -> interacts(D2, D3) ^2

// prior
5: validInteraction(D1,D2) -> !interacts(D1,D2) ^2

interacts(D1, D2) = interacts(D2, D1) .
162 changes: 162 additions & 0 deletions drug-drug-interaction/cli/run.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,162 @@
#!/bin/bash

# Options can also be passed on the command line.
# These options are passed blindly to the PSL CLI.
# Ex: ./run.sh -D log4j.threshold=DEBUG

readonly THIS_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"

readonly PSL_VERSION='3.0.0-SNAPSHOT'
readonly JAR_PATH="${THIS_DIR}/psl-cli-${PSL_VERSION}.jar"
readonly RUN_SCRIPT_VERSION='1.3.6'

readonly BASE_NAME='drug-drug-interaction'
readonly OUTPUT_DIRECTORY="${THIS_DIR}/inferred-predicates"

readonly ADDITIONAL_PSL_OPTIONS='--int-ids --eval AUCEvaluator -D aucevaluator.threshold=0.4'
readonly ADDITIONAL_WL_OPTIONS='--learn GuidedRandomGridSearch -D gridsearch.weights=0.01:0.1:1.0 -D randomgridsearch.maxlocations=75 -D weightlearning.evaluator=AUCEvaluator'
readonly ADDITIONAL_EVAL_OPTIONS='--infer --eval DiscreteEvaluator -D discreteevaluator.threshold=0.4'

function main() {
trap exit SIGINT

bash "${THIS_DIR}/../data/fetchData.sh"

# Make sure we can run PSL.
check_requirements
fetch_psl

# Run PSL.
run_weight_learning "$@"
run_inference "$@"
}

function run_weight_learning() {
echo "Running PSL Weight Learning."

java -jar "${JAR_PATH}" \
--model "${THIS_DIR}/${BASE_NAME}.psl" \
--data "${THIS_DIR}/${BASE_NAME}-learn.data" \
${ADDITIONAL_PSL_OPTIONS} ${ADDITIONAL_WL_OPTIONS} "$@"

if [[ "$?" -ne 0 ]]; then
echo 'ERROR: Failed to run weight learning.'
exit 60
fi
}

function run_inference() {
echo "Running PSL Inference."

java -jar "${JAR_PATH}" \
--model "${THIS_DIR}/${BASE_NAME}-learned.psl" \
--data "${THIS_DIR}/${BASE_NAME}-eval.data" \
--output "${OUTPUT_DIRECTORY}" \
${ADDITIONAL_PSL_OPTIONS} ${ADDITIONAL_EVAL_OPTIONS} "$@"

if [[ "$?" -ne 0 ]]; then
echo 'ERROR: Failed to run infernce.'
exit 70
fi
}

function check_requirements() {
local hasWget
local hasCurl

type wget > /dev/null 2> /dev/null
hasWget=$?

type curl > /dev/null 2> /dev/null
hasCurl=$?

if [[ "${hasWget}" -ne 0 ]] && [[ "${hasCurl}" -ne 0 ]]; then
echo 'ERROR: wget or curl required to download the jar.'
exit 10
fi

type java > /dev/null 2> /dev/null
if [[ "$?" -ne 0 ]]; then
echo 'ERROR: java required to run project.'
exit 13
fi
}

function get_fetch_command() {
type curl > /dev/null 2> /dev/null
if [[ "$?" -eq 0 ]]; then
echo "curl -o"
return
fi

type wget > /dev/null 2> /dev/null
if [[ "$?" -eq 0 ]]; then
echo "wget -O"
return
fi

echo 'ERROR: wget or curl not found.'
exit 20
}

function fetch_file() {
local url=$1
local path=$2

local name=$(basename "${path}")

if [[ -e "${path}" ]]; then
echo "${name} file found cached, skipping download."
return
fi

echo "Downloading ${name} file located at: '${url}'."
`get_fetch_command` "${path}" "${url}"
if [[ "$?" -ne 0 ]]; then
echo "ERROR: Failed to download ${name}."
exit 30
fi
}

# Fetch the jar from a remote or local location and put it in this directory.
# Non-snapshot builds are fetched from Maven Central.
# For snapshot builds, the local maven cache ($HOME/.m2) is checked first, and then the snapshot deployment servers.
# Snapshots are fetched from the local maven repo and other builds are fetched remotely.
function fetch_psl() {
if [[ -e "${JAR_PATH}" ]] ; then
echo "Using PSL jar found at ${JAR_PATH}. To fetch a new version, delete this cached jar."
return
fi

if [[ "${PSL_VERSION}" =~ .*-SNAPSHOT$ ]]; then
local snapshotLocalPath="$HOME/.m2/repository/org/linqs/psl-cli/${PSL_VERSION}/psl-cli-${PSL_VERSION}.jar"
if [[ -e "${snapshotLocalPath}" ]] ; then
echo "Using local PSL snapshot build."
cp "${snapshotLocalPath}" "${JAR_PATH}"
return
fi

echo "Using remote PSL snapshot build."
local snotshotMetadataURL="https://oss.sonatype.org/content/repositories/snapshots/org/linqs/psl-cli/${PSL_VERSION}/maven-metadata.xml"
local metadataFilename='._maven-metadata.xml'

rm -f "${metadataFilename}"
fetch_file "${snotshotMetadataURL}" "${metadataFilename}"

local snapshotDate=$(grep -m 1 'timestamp' "${metadataFilename}" | sed -E 's/^.*>([0-9]+\.[0-9]+)<.*$/\1/')
local snapshotNumber=$(grep -m 1 'buildNumber' "${metadataFilename}" | sed -E 's/^.*>([0-9]+)<.*$/\1/')
rm -f "${metadataFilename}"

local baseVersion=$(echo "${PSL_VERSION}" | sed -E 's/-SNAPSHOT$//')
local version="${baseVersion}-${snapshotDate}-${snapshotNumber}"

local snotshotJarURL="https://oss.sonatype.org/content/repositories/snapshots/org/linqs/psl-cli/${PSL_VERSION}/psl-cli-${version}.jar"
fetch_file "${snotshotJarURL}" "${JAR_PATH}"
else
echo "Using remote PSL build."
local remoteJarURL="https://repo1.maven.org/maven2/org/linqs/psl-cli/${PSL_VERSION}/psl-cli-${PSL_VERSION}.jar"
fetch_file "${remoteJarURL}" "${JAR_PATH}"
fi
}

[[ "${BASH_SOURCE[0]}" == "${0}" ]] && main "$@"
3 changes: 3 additions & 0 deletions drug-drug-interaction/data/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
*
!fetchData.sh
!.gitignore
93 changes: 93 additions & 0 deletions drug-drug-interaction/data/fetchData.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
#!/bin/bash

readonly THIS_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"

readonly DATA_URL='https://linqs-data.soe.ucsc.edu/public/psl-examples-data/drug-drug-interaction.zip'
readonly DATA_FILE=$(basename "${DATA_URL}")
readonly DATA_DIR='drug-drug-interaction'
readonly SCRIPT_VERSION='1.3.6'

function main() {
trap exit SIGINT

cd "${THIS_DIR}"

check_requirements

fetch_file "${DATA_URL}" "${DATA_FILE}"
extract_zip "${DATA_FILE}" "${DATA_DIR}"
}

function check_requirements() {
local hasWget
local hasCurl

type wget > /dev/null 2> /dev/null
hasWget=$?

type curl > /dev/null 2> /dev/null
hasCurl=$?

if [[ "${hasWget}" -ne 0 ]] && [[ "${hasCurl}" -ne 0 ]]; then
echo 'ERROR: wget or curl required to download the jar.'
exit 10
fi
}

function get_fetch_command() {
type curl > /dev/null 2> /dev/null
if [[ "$?" -eq 0 ]]; then
echo "curl -o"
return
fi

type wget > /dev/null 2> /dev/null
if [[ "$?" -eq 0 ]]; then
echo "wget -O"
return
fi

echo 'ERROR: wget or curl not found.'
exit 20
}

function fetch_file() {
local url=$1
local path=$2

local name=$(basename "${path}")

if [[ -e "${path}" ]]; then
echo "${name} file found cached, skipping download."
return
fi

echo "Downloading ${name} file located at: '${url}'."
`get_fetch_command` "${path}" "${url}"
if [[ "$?" -ne 0 ]]; then
echo "ERROR: Failed to download ${name}."
exit 30
fi
}


function extract_zip() {
local path=$1
local expectedDir=$2

local name=$(basename "${path}")

if [[ -e "${expectedDir}" ]]; then
echo "Extracted ${name} zip found cached, skipping extract."
return
fi

echo "Extracting the ${name} zip"
unzip "${path}"
if [[ "$?" -ne 0 ]]; then
echo "ERROR: Failed to extract ${name}."
exit 40
fi
}

[[ "${BASH_SOURCE[0]}" == "${0}" ]] && main "$@"