Skip to content

Commit

Permalink
updated files
Browse files Browse the repository at this point in the history
  • Loading branch information
oriolfornes committed Oct 4, 2021
1 parent cff77bd commit a3422f7
Show file tree
Hide file tree
Showing 2,352 changed files with 29,152 additions and 14,381 deletions.
18 changes: 9 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,10 @@ This repository contains the data and code used to generate the JASPAR UCSC Geno

## Content
* The `genomes` folder contains scripts to download and process different genome assemblies
* The `profiles` folder contains the output from the script [`get_profiles.py`](https://github.com/wassermanlab/JASPAR-UCSC-tracks/blob/master/profiles/get_profiles.py), which downloads the JASPAR CORE profiles for different taxons
* The file [`environment.yml`](https://github.com/wassermanlab/JASPAR-UCSC-tracks/blob/master/environment.yml) contains the conda environment used to generate the genomic tracks for JASPAR 2020 (see installation)
* The `profiles` folder contains the output from the script [`get-profiles.py`](https://github.com/wassermanlab/JASPAR-UCSC-tracks/blob/master/profiles/get_profiles.py), which downloads the JASPAR CORE profiles for different taxons
* The file [`environment.yml`](https://github.com/wassermanlab/JASPAR-UCSC-tracks/blob/master/environment.yml), within the `conda` folder, contains the conda environment used to generate the genomic tracks for JASPAR 2022 (see installation)
* The script [`install-pwmscan.sh`](https://github.com/wassermanlab/JASPAR-UCSC-tracks/blob/master/install-pwmscan.sh) downloads and installs PWMscan and places its binaries in the in the `bin` folder.
* The script [`scan_sequence.py`](https://github.com/wassermanlab/JASPAR-UCSC-tracks/blob/master/scan_sequence.py) takes as its input the `profiles` folder and a nucleotide sequence in [FASTA format](https://en.wikipedia.org/wiki/FASTA_format)</br>(*e.g.* a genome), and outputs TFBS predictions
* The script [`scan-sequence.py`](https://github.com/wassermanlab/JASPAR-UCSC-tracks/blob/master/scan_sequence.py) takes as its input the `profiles` folder and a nucleotide sequence in [FASTA format](https://en.wikipedia.org/wiki/FASTA_format)</br>(*e.g.* a genome), and outputs TFBS predictions
* The script [`scans2bigBed`](https://github.com/wassermanlab/JASPAR-UCSC-tracks/blob/master/scans2bigBed) creates a [bigBed track file](https://genome.ucsc.edu/goldenPath/help/bigBed.html) from TFBS predictions

The original scripts used for the publication of [JASPAR 2018](https://doi.org/10.1093/nar/gkx1126) have been placed in the folder [`version-1.0`](https://github.com/wassermanlab/JASPAR-UCSC-tracks/tree/master/version-1.0).
Expand All @@ -26,26 +26,26 @@ To install PWMScan, execute the script [`install-pwmscan.sh`](https://github.com

The remaining dependencies can be installed through the [conda](https://docs.conda.io/en/latest/) package manager:
```
conda env create -f ./environment.yml
conda env create -f ./conda/environment.yml
```

## Availability
Genomic tracks and TFBS predictions for human and six other model organisms are available online:
* [http://expdata.cmmt.ubc.ca/JASPAR/downloads/UCSC_tracks/2020/](http://expdata.cmmt.ubc.ca/JASPAR/downloads/UCSC_tracks/2020/)
Genomic tracks and TFBS predictions for human and **seven** other model organisms, covering **11** genome assemblies, are available online:
* [http://expdata.cmmt.ubc.ca/JASPAR/downloads/UCSC_tracks/2022/](http://expdata.cmmt.ubc.ca/JASPAR/downloads/UCSC_tracks/2022/)

## Usage
To illustrate how the genomic tracks are generated, we provide an example for the [baker's yeast genome](https://www.ncbi.nlm.nih.gov/assembly/GCF_000146045.2/):
* Download the genome sequence and chromosome sizes (automated in this [script](https://github.com/wassermanlab/JASPAR-UCSC-tracks/blob/master/genomes/sacCer3/sacCer3.sh))
* Scan the genome sequence using [**all** fungi profiles from the JASPAR CORE](http://jaspar.genereg.net/search?q=&collection=CORE&tax_group=fungi)
```
./scan_sequence.py --fasta-file ./genomes/sacCer3/sacCer3.fa --profiles-dir ./profiles/ \
./scan-sequence.py --fasta-file ./genomes/sacCer3/sacCer3.fa --profiles-dir ./profiles/ \
--output-dir ./tracks/sacCer3/ --threads 4 --latest --taxon fungi
```
For this example, the scanning step should take no longer than a minute. For human and other similar genomes, this step is usually finished within a few hours (the final amount of time will depend on the number of `--threads` specified).
* Create the genomic track
```
./scans2bigBed -c ./genomes/sacCer3/sacCer3.chrom.sizes -i ./tracks/sacCer3/ -o ./tracks/sacCer3.bb -t 4
./scans2bigBed -c ./genomes/sacCer3/sacCer3.fa.sizes -i ./tracks/sacCer3/ -o ./tracks/sacCer3.bb -t 4
```
TFBS predictions from the previous step are merged into a [bigBed track file](https://genome.ucsc.edu/goldenPath/help/bigBed.html). In column five, we use as scores the <i>p</i>-values from PWMScan (scaled between 0-1000, where 0 corresponds to <i>p</i>-value = 1 and 1000 to <i>p</i>-value ≤ 10-10). This allows for comparison of prediction confidence across TFBSs. Again, for this example, this step should be completed within a few minutes, while for larger genomes it can take a few hours.

**Important note:** both disk space and memory requirements for large genomes (*i.e.* danRer11, hg19, hg38 and mm10) are substantial. In these cases, we highly recommend allocating at least 1Tb of disk space and 512Gb of ram.
**Important note:** disk space requirements for large genomes (*i.e.* danRer11, hg19, hg38, mm10, and mm39) are substantial. In these cases, we highly recommend allocating at least 1Tb of disk space.
1 change: 1 addition & 0 deletions environment.yml → conda/environment.yml
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ dependencies:
- pip=19.2.2
- pyfaidx=0.5.5.2
- python=3.7.3
- seaborn=0.11.2
- setuptools=41.0.1
- tqdm=4.35.0
- ucsc-bedtobigbed
Expand Down
2 changes: 1 addition & 1 deletion genomes/get-genomes.sh
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ if ! [ -f araTha1/araTha1.fa.sizes ]; then
wget https://genome-test.gi.ucsc.edu/~hiram/hubs/Plants/araTha1/araTha1.2bit
twoBitToFa araTha1.2bit araTha1.fa
faidx -x araTha1.fa
faidx araTha1.fa -i chromsizes > araTha1.chrom.sizes
faidx araTha1.fa -i chromsizes > araTha1.fa.sizes
rm araTha1.2bit chr*.fa
cd ..
fi
Expand Down
16 changes: 16 additions & 0 deletions profiles/fungi/MA0265.1.pwm
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
-103 66 -208 76
-346 188 -346 -297
-345 -345 190 -345
-345 -345 -345 190
66 -50 -17 -29
-17 -5 5 15
59 -57 15 -50
41 -42 -29 15
63 -23 -5 -65
69 -343 114 -343
-94 -1 -127 105
-346 -346 160 -36
178 -139 -346 -346
-346 79 -126 82
52 -17 -42 -11
41 -42 -74 41
20 changes: 20 additions & 0 deletions profiles/fungi/MA0265.2.pwm
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
20 -56 -45 53
-4 -30 -57 62
-8 -65 -78 87
-59 -54 -288 133
-346 193 -418 -444
-361 -444 192 -346
-346 -444 -378 192
79 -65 -16 -43
-28 -22 -67 76
106 -137 -54 -25
52 -107 -69 56
140 -107 -92 -151
23 -418 143 -378
-361 -128 -333 177
-418 -397 193 -361
192 -378 -418 -361
-378 64 -247 113
107 -97 -63 -41
-1 -20 -43 48
39 -85 -26 39
7 changes: 7 additions & 0 deletions profiles/fungi/MA0266.1.pwm
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
-42 59 0 -42
-126 -126 -261 158
-346 190 -346 -346
-346 -346 -346 190
190 -346 -346 -346
-346 -346 190 -346
181 -262 -262 -262
14 changes: 7 additions & 7 deletions profiles/fungi/MA0267.1.pwm
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
86 18 -24 -231
-464 188 -206 -464
-10000 200 -10000 -10000
185 -132 -10000 -10000
-10000 -10000 200 -10000
-10000 200 -10000 -10000
97 -32 -56 -84
80 16 -21 -186
-297 178 -169 -297
-346 190 -346 -346
175 -114 -346 -346
-346 -346 190 -346
-346 190 -346 -346
91 -29 -50 -74
14 changes: 7 additions & 7 deletions profiles/fungi/MA0268.1.pwm
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
71 -6 -94 -18
-10000 200 -10000 -10000
-10000 177 -10000 -74
-10000 200 -10000 -10000
-10000 200 -10000 -10000
136 -206 -32 -132
44 103 -206 -147
66 -5 -83 -17
-346 190 -346 -346
-346 167 -346 -65
-346 190 -346 -346
-346 190 -346 -346
127 -169 -29 -114
41 96 -169 -126
21 changes: 21 additions & 0 deletions profiles/fungi/MA0269.1.pwm
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
-118 -27 -11 85
-66 0 -18 57
52 -33 -17 -19
-46 65 -16 -31
75 -11 -67 -40
-84 -158 -78 134
149 -297 -273 -14
-75 -273 20 108
-262 -223 -246 178
-62 -385 167 -343
-384 192 -377 -384
191 -393 -349 -356
-402 193 -393 -385
-297 190 -470 -343
-326 154 -362 -14
-44 4 96 -182
8 1 -5 -5
20 -66 -43 57
14 -45 -176 90
-97 -14 50 22
-67 -8 86 -71
8 changes: 8 additions & 0 deletions profiles/fungi/MA0270.1.pwm
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
-139 79 10 -29
140 -346 24 -346
-346 190 -346 -346
190 -346 -346 -346
-346 190 -346 -346
-346 190 -346 -346
-346 190 -346 -346
-124 71 11 -21
6 changes: 6 additions & 0 deletions profiles/fungi/MA0271.1.pwm
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
110 -351 -351 75
-438 -438 195 -438
190 -353 -353 -353
-438 195 -438 -438
-12 -377 68 49
-426 184 -426 -161
16 changes: 8 additions & 8 deletions profiles/fungi/MA0272.1.pwm
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
25 -204 75 -17
-277 -10000 -336 191
-169 -10000 188 -10000
200 -10000 -10000 -10000
-10000 199 -10000 -609
-537 -10000 -10000 199
-10000 200 -10000 -10000
34 34 -354 46
24 -176 71 -16
-221 -362 -255 182
-153 -423 183 -423
191 -366 -366 -366
-430 194 -430 -395
-328 -363 -363 190
-434 195 -434 -434
32 32 -271 43
42 changes: 21 additions & 21 deletions profiles/fungi/MA0273.1.pwm
Original file line number Diff line number Diff line change
@@ -1,21 +1,21 @@
-129 -69 109 -25
40 -37 27 -52
-99 -10 35 37
70 2 -30 -88
64 -37 -12 -42
31 -33 -89 51
-159 -70 43 77
-264 157 -306 -43
-564 -11 -496 160
-564 198 -796 -564
-796 -350 196 -796
-796 -564 199 -696
-406 52 24 40
143 -596 -538 35
147 -188 -206 -46
-125 -17 58 26
11 -107 -37 74
50 -25 -28 -12
4 -20 -49 47
-8 -128 48 31
16 1 -55 26
-123 -66 107 -24
39 -36 26 -50
-94 -10 34 36
69 2 -29 -84
63 -36 -12 -41
30 -32 -85 50
-150 -67 42 75
-243 155 -277 -42
-432 -11 -402 157
-432 195 -485 -432
-486 -311 193 -486
-485 -432 195 -470
-349 51 23 39
140 -444 -421 34
144 -176 -192 -45
-119 -16 56 25
10 -102 -36 73
49 -24 -27 -11
4 -20 -47 46
-8 -121 47 30
15 1 -53 25
8 changes: 8 additions & 0 deletions profiles/fungi/MA0274.1.pwm
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
191 -368 -368 -368
-97 55 -73 51
-59 55 -389 85
-369 -369 -369 191
30 -420 141 -420
144 -14 -167 -386
162 -376 -49 -284
-163 -372 -70 158
6 changes: 6 additions & 0 deletions profiles/fungi/MA0275.1.pwm
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
-346 151 -346 -5
-346 190 -346 -346
-346 -346 190 -346
-346 -346 190 -346
145 -101 -56 -345
101 -346 -346 85
10 changes: 10 additions & 0 deletions profiles/fungi/MA0276.1.pwm
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
-419 169 -256 -84
-434 193 -434 -350
11 -128 119 -212
4 -6 80 -193
143 -371 21 -371
-357 -161 -194 173
-71 135 -235 -64
65 -380 85 -84
-425 -262 180 -167
-424 -424 185 -173
9 changes: 9 additions & 0 deletions profiles/fungi/MA0277.1.pwm
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
123 -85 -85 -85
190 -346 -346 -346
190 -346 -346 -346
171 -84 -347 -347
149 -346 0 -346
-346 -346 190 -346
155 -57 -232 -232
162 -168 -168 -168
123 -85 -85 -85
21 changes: 21 additions & 0 deletions profiles/fungi/MA0278.1.pwm
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
-184 -19 75 22
-120 112 -71 -36
94 -111 -108 20
-158 67 -41 41
85 -213 55 -99
-1 -262 140 -226
-128 151 -77 -265
-210 185 -411 -320
47 46 16 -301
-369 -456 194 -421
192 -432 -306 -444
-292 -421 191 -444
-393 -385 -444 193
-223 186 -411 -349
180 -237 -343 -218
94 -98 35 -176
20 -80 33 2
57 -113 -153 78
-47 54 -42 10
67 -80 44 -108
78 -12 -5 -134
20 changes: 10 additions & 10 deletions profiles/fungi/MA0279.1.pwm
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
187 -249 -252 -10000
-250 -10000 -10000 193
-10000 -10000 -10000 200
181 -246 -10000 -165
8 -142 136 -10000
-10000 -10000 -10000 200
185 -139 -10000 -10000
200 -10000 -10000 -10000
-10000 -10000 136 53
-10000 193 -10000 -249
185 -236 -239 -562
-236 -558 -558 191
-556 -556 -556 198
179 -233 -556 -159
8 -137 134 -564
-556 -556 -556 198
183 -134 -555 -555
198 -556 -556 -556
-559 -559 134 52
-558 191 -558 -236
16 changes: 16 additions & 0 deletions profiles/fungi/MA0279.2.pwm
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
38 -24 -47 18
-47 -32 -11 63
-75 -56 -85 112
-172 -172 155 -109
23 132 -218 -337
-285 -285 -337 186
-285 -337 -24 155
186 -337 -285 -285
-153 146 -85 -153
-97 -337 -218 166
181 -218 -337 -247
186 -337 -285 -285
-172 -285 -56 153
18 47 -56 -32
38 7 -17 -39
33 -65 7 7
12 changes: 6 additions & 6 deletions profiles/fungi/MA0280.1.pwm
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
-10000 200 -10000 -10000
-10000 200 -10000 -10000
-10000 -10000 200 -10000
-10000 -10000 200 -10000
140 -94 -25 -10000
36 -47 16 -18
-346 190 -346 -346
-346 190 -346 -346
-346 -346 190 -346
-346 -346 190 -346
132 -83 -23 -346
33 -42 15 -17
8 changes: 8 additions & 0 deletions profiles/fungi/MA0281.1.pwm
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
-34 -138 103 -34
-233 177 -233 -233
190 -346 -346 -346
-346 190 -346 -346
-346 -346 190 -346
-346 -346 -346 190
-346 -346 190 -346
142 -51 -171 -171
15 changes: 15 additions & 0 deletions profiles/fungi/MA0281.2.pwm
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
18 -50 -44 51
22 -88 39 -2
105 -192 18 -92
-209 -150 -20 135
-377 192 -377 -409
192 -409 -350 -377
-377 186 -350 -228
-263 -409 190 -409
-409 -409 -409 193
-409 -377 193 -409
169 -128 -218 -277
-170 114 -239 37
6 51 -144 22
36 -33 -42 23
29 -88 4 27
16 changes: 8 additions & 8 deletions profiles/fungi/MA0282.1.pwm
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
-264 116 -264 53
-10000 -10000 -10000 200
-10000 200 -10000 -10000
-10000 -10000 200 -10000
-10000 -10000 200 -10000
186 -464 -464 -184
187 -307 -307 -307
75 -25 -25 -64
-208 109 -208 49
-346 -346 -346 190
-346 190 -346 -346
-346 -346 190 -346
-346 -346 190 -346
177 -297 -297 -153
177 -233 -233 -233
69 -23 -23 -57
Loading

0 comments on commit a3422f7

Please sign in to comment.