Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bias matrices for other experiments #6

Open
cahn20 opened this issue Nov 16, 2023 · 8 comments
Open

Bias matrices for other experiments #6

cahn20 opened this issue Nov 16, 2023 · 8 comments

Comments

@cahn20
Copy link

cahn20 commented Nov 16, 2023

Hello,

It looks like SELMA has built-in bias score matrices for DNaseI and Tn5, each called by the '-t DATATYPE' flag. Is it possible for the user to generate/use custom bias matrices for other enzymes as well? (e.g. MNase). Thanks in advance.

Best,
Chris

@Tarela
Copy link
Owner

Tarela commented Nov 16, 2023

Hi Chris, Sorry for replying late. Are you still suffering from the cutSum issue? I double-checked the code with the testing data (provided in this github repo), and other real data before, and everything looks good. I just reprocessed the data with the cmd you pasted, and it looks good in my environment. If you still suffer from that, could you please rerun it with the --keep-tmp parameter and let me know what the folder structure looks like (e.g., file sizes of all the files in main/tmp folders).

For the DATATYPE question, 1. using MNase/benzonase/cyanse or other enzymes for bias matrix is a good idea, and I've tried them before. Some of them have a significant/enzyme-specific bias pattern. SELMA used DNase and Tn5 only because these two are most popular in transcriptional regulation studies (DNase-seq/ATAC-seq). Does your MNase-seq data have enough chrM reads? If so, you can just set the "--bias chrM" parameter to let SELMA estimate the bias from chrM reads. If not, We can further discuss an alternative solution, and I can also make an updated version specific to your issue if necessary.

I hope that helps. Feel free to provide more information or request, and I'd be very happy to help.
Best
Shawn

@cahn20
Copy link
Author

cahn20 commented Nov 17, 2023

I used the --keeptmp flag and I'm still getting the same error. I don't see a tmp folder, so I just have the contents of the output folder below:

[ 416] .
├── [ 11K] hg38.sizes
├── [ 0] testsc_chrM.bed
├── [1.4G] testsc_chromatin.bed
├── [1.4G] testsc_highQcellReads.bed
├── [ 46] testsc_peakFeatures.txt
├── [ 18K] testsc_peakXcellMat.txt
├── [1.1K] testsc_progress.log
├── [ 0] testsc_scOVcleavage.bed
├── [241K] testsc_summitEXT.bed
├── [ 95K] testsc_tmpSCpeaks.bed
└── [ 59M] testsc_tmpSCreads.bed

No need to rush though, as I probably won't be using sc mode.

As for using other enzymes, I don't have any chrM reads available. Is it not feasible to create duplicates of my fragments and rename them as chrM as you've mentioned?

@cahn20
Copy link
Author

cahn20 commented Nov 20, 2023

I actually do have chrM reads available, sorry for the misinformation.
Also, is there a way to obtain "bias corrected" bw tracks?

@Tarela
Copy link
Owner

Tarela commented Nov 20, 2023 via email

@cahn20
Copy link
Author

cahn20 commented Nov 21, 2023

Thanks for the reply Shawn.

I have around 10k reads on chrM so I guess it might not be enough, but as you've mentioned I still tried running SELMA with the --bias chrM parameter and I'm getting an error saying that there is no naked DNA bias matrix, cannot estimate bias. The -t parameter does not matter here, right?

The command I used and the output log are as follows:

Command:
SELMA -m bulk -i fragment.bed.gz -p peak.bed -g hg38 -f PE -o test_chrM -t ATAC --bias chrM -s hg38.2bit --keeptmp --kmer 6

Output Log:
/usr/local/bin/SELMA:4: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
import('pkg_resources').run_script('SELMA==1.0', 'SELMA')
process log is printed into test_chrM_progress.log
Start SELMA bulk
Step0: check input Data and parameters
detected bed file format is SE
detected bed file format SE is different from the input file format PE. Please double check the parameter -f(format)
bedtools installed
bigWigSummary(UCSCtools) installed
twoBitToFa(UCSCtools) installed
twobitInfo(UCSCtools) installed
bedGraphToBigWig(UCSCtools) installed
Rscript installed
no naked DNA bias matrix, use mtDNA reads to estimate
Step0 check input Data and parameters DONE
Step1: check quality and reformat data
summarize reads count distribution
Step1: check quality and reformat data DONE
running time for Step1 : 14.279722929000854
Step2: bias estimation
chrM reads number < 500k, obtain pre-processed bias matrix from naked DNA data
[ERROR] no naked DNA bias matrix, cannot estimate bias
error occurs, check log file~!

The files created during the SELMA run look like the following:
[ 192] test_chrM
├── [ 11K] hg38.sizes
├── [678K] test_chrM_chrM.bed
├── [430M] test_chrM_chromatin.bed
└── [ 885] test_chrM_progress.log

Thanks.

@cahn20
Copy link
Author

cahn20 commented Nov 21, 2023

I apologize for all the questions.

I tried to see if running the test command for bulk mode provided by the SELMA readme with --keeptmp gave the bias "corrected" cleavage tracks, but I can't find them. Perhaps I'm missing something?

Command
SELMA -m bulk -i testdata_reads.bed.gz -p testpeak.bed -g hg38 -f PE -o testbulk -t ATAC -s ~/Downloads/hg38.2bit --keeptmp

Output
/usr/local/bin/SELMA:4: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
import('pkg_resources').run_script('SELMA==1.0', 'SELMA')
process log is printed into testbulk_progress.log
Start SELMA bulk
Step0: check input Data and parameters
detected bed file format is PE
bedtools installed
bigWigSummary(UCSCtools) installed
twoBitToFa(UCSCtools) installed
twobitInfo(UCSCtools) installed
bedGraphToBigWig(UCSCtools) installed
Rscript installed
Step0 check input Data and parameters DONE
Step1: check quality and reformat data
summarize reads count distribution
Step1: check quality and reformat data DONE
running time for Step1 : 67.17461681365967
Step2: bias estimation
obtain pre-processed bias matrix from naked DNA data
Step2: bias estimation
running time for Step2: bias estimation: 1.9309027194976807
Step3: peak detection
obtain 6607 peaks from (-p) inputted, extend peak to +/- 200bp from peak center
running time for Step3: peak detection 0.06051301956176758
Step4: generate cleavage and bias profile on peaks
split fragments to strand specific cleavage sites
remove redundant position from the extended peak file
readin sequence from 2bit
calculate bias expected cleavages
pile up cleavage sites
running time for Step4: generate cleavage and bias profile on peaks 359.35685992240906
StepFinal: summary
Collect results
--keeptmp was set, keep intermediate results
Generate summary reports
pdflatex was not installed, SELMA will not generate pdf version of summary report. Please copy the testbulk_summaryReports.tex to an environment with pdflatex installed and complie the pdf file
running time for StepFinal: summary 0.497211217880249
SELMA finished. Check the folder: testbulk/summary/ for results

tmpResults Directory Structure

[ 832] tmpResults/
├── [ 95K] chr18_mergePeaks.bed
├── [ 25M] chr18_minusCuts.bed
├── [ 14M] chr18_minusCutsOnPeak.bed
├── [ 25M] chr18_plusCuts.bed
├── [ 14M] chr18_plusCutsOnPeak.bed
├── [146K] chr19_mergePeaks.bed
├── [ 72M] chr19_minusCuts.bed
├── [ 68M] chr19_minusCutsOnPeak.bed
├── [ 72M] chr19_plusCuts.bed
├── [ 68M] chr19_plusCutsOnPeak.bed
├── [ 11K] hg38.sizes
├── [ 80M] testbulk_biasExpCuts_minus.bdg
├── [ 80M] testbulk_biasExpCuts_minus_sorted.bdg
├── [ 80M] testbulk_biasExpCuts_plus.bdg
├── [ 80M] testbulk_biasExpCuts_plus_sorted.bdg
├── [ 0] testbulk_chrM.bed
├── [1.4G] testbulk_chromatin.bed
├── [ 64M] testbulk_cleavage_minus.bdg
├── [1.0G] testbulk_cleavage_minus.bed
├── [ 64M] testbulk_cleavage_minus_sorted.bdg
├── [ 64M] testbulk_cleavage_plus.bdg
├── [1.0G] testbulk_cleavage_plus.bed
├── [ 64M] testbulk_cleavage_plus_sorted.bdg
└── [152K] testbulk_summitEXTmerge.bed

@Tarela
Copy link
Owner

Tarela commented Nov 21, 2023 via email

@Tarela
Copy link
Owner

Tarela commented Nov 21, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants