-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bias matrices for other experiments #6
Comments
Hi Chris, Sorry for replying late. Are you still suffering from the cutSum issue? I double-checked the code with the testing data (provided in this github repo), and other real data before, and everything looks good. I just reprocessed the data with the cmd you pasted, and it looks good in my environment. If you still suffer from that, could you please rerun it with the --keep-tmp parameter and let me know what the folder structure looks like (e.g., file sizes of all the files in main/tmp folders). For the DATATYPE question, 1. using MNase/benzonase/cyanse or other enzymes for bias matrix is a good idea, and I've tried them before. Some of them have a significant/enzyme-specific bias pattern. SELMA used DNase and Tn5 only because these two are most popular in transcriptional regulation studies (DNase-seq/ATAC-seq). Does your MNase-seq data have enough chrM reads? If so, you can just set the "--bias chrM" parameter to let SELMA estimate the bias from chrM reads. If not, We can further discuss an alternative solution, and I can also make an updated version specific to your issue if necessary. I hope that helps. Feel free to provide more information or request, and I'd be very happy to help. |
I used the --keeptmp flag and I'm still getting the same error. I don't see a tmp folder, so I just have the contents of the output folder below: [ 416] . No need to rush though, as I probably won't be using sc mode. As for using other enzymes, I don't have any chrM reads available. Is it not feasible to create duplicates of my fragments and rename them as chrM as you've mentioned? |
I actually do have chrM reads available, sorry for the misinformation. |
Hi Chris,
I am glad you found the chrM reads. You can simply run the code and
generate chrMbiasMatrix from two samples (if you have one) and compare the
correlation coefficient between the estimated bias from the two samples if
they should have a good correlation (e.g., a correlated scatter plot with a
0.8 or higher correlation coefficient). BTW according to my experience on
MNase from my MNase-seq data analysis many years ago, 10-mer (k=10) might
not be the best option. I believe 6-mer or 8-mer might be better (try
6-mer, especially when your chrM reads are not too many, like < 50k).
I believe running SELMA bulk mode with --keeptmp parameter will output the
biasCorrected bigwig file (with a similar filename). It would be great if
you could try that as well. I'll also confirm that later and let you know.
Hope that helps.
Best
Shawn
…On Mon, Nov 20, 2023 at 8:35 AM Chris ***@***.***> wrote:
I actually do have chrM reads available, sorry for the misinformation.
Also, is there a way to obtain "bias corrected" bw tracks?
—
Reply to this email directly, view it on GitHub
<#6 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA6QQAEMQNITY7YTW74ZDYLYFNMDBAVCNFSM6AAAAAA7OY7B4SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMJZGA3TOMBVGA>
.
You are receiving this because you commented.Message ID:
***@***.***>
--
Sheng'en Shawn Hu, PhD
Postdoctoral research associate
University of Virginia
|
Thanks for the reply Shawn. I have around 10k reads on chrM so I guess it might not be enough, but as you've mentioned I still tried running SELMA with the --bias chrM parameter and I'm getting an error saying that there is no naked DNA bias matrix, cannot estimate bias. The -t parameter does not matter here, right? The command I used and the output log are as follows: Command: Output Log: The files created during the SELMA run look like the following: Thanks. |
I apologize for all the questions. I tried to see if running the test command for bulk mode provided by the SELMA readme with --keeptmp gave the bias "corrected" cleavage tracks, but I can't find them. Perhaps I'm missing something? Command Output tmpResults Directory Structure [ 832] tmpResults/ |
Thanks for letting me know. The error is caused by limited chrM reads as you mentioned. I preset a cutoff for sufficient chrM reads to ensure the estimation is robust. I’ll customize a script with chromatin reads for bias estimation for you. I’ll try to send you ASAP. Have a wonderful thanksgiving!BestShawnOn Nov 21, 2023, at 2:07 PM, Chris ***@***.***> wrote:
I apologize for all the questions.
I tried to see if running the test command for bulk mode provided by the SELMA readme with --keeptmp gave the bias "corrected" cleavage tracks, but I can't find them. Perhaps I'm missing something?
Command
SELMA -m bulk -i testdata_reads.bed.gz -p testpeak.bed -g hg38 -f PE -o testbulk -t ATAC -s ~/Downloads/hg38.2bit --keeptmp
Output
/usr/local/bin/SELMA:4: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
import('pkg_resources').run_script('SELMA==1.0', 'SELMA')
process log is printed into testbulk_progress.log
Start SELMA bulk
Step0: check input Data and parameters
detected bed file format is PE
bedtools installed
bigWigSummary(UCSCtools) installed
twoBitToFa(UCSCtools) installed
twobitInfo(UCSCtools) installed
bedGraphToBigWig(UCSCtools) installed
Rscript installed
Step0 check input Data and parameters DONE
Step1: check quality and reformat data
summarize reads count distribution
Step1: check quality and reformat data DONE
running time for Step1 : 67.17461681365967
Step2: bias estimation
obtain pre-processed bias matrix from naked DNA data
Step2: bias estimation
running time for Step2: bias estimation: 1.9309027194976807
Step3: peak detection
obtain 6607 peaks from (-p) inputted, extend peak to +/- 200bp from peak center
running time for Step3: peak detection 0.06051301956176758
Step4: generate cleavage and bias profile on peaks
split fragments to strand specific cleavage sites
remove redundant position from the extended peak file
readin sequence from 2bit
calculate bias expected cleavages
pile up cleavage sites
running time for Step4: generate cleavage and bias profile on peaks 359.35685992240906
StepFinal: summary
Collect results
--keeptmp was set, keep intermediate results
Generate summary reports
pdflatex was not installed, SELMA will not generate pdf version of summary report. Please copy the testbulk_summaryReports.tex to an environment with pdflatex installed and complie the pdf file
running time for StepFinal: summary 0.497211217880249
SELMA finished. Check the folder: testbulk/summary/ for results
Output Directory Structure
[ 832] tmpResults/
├── [ 95K] chr18_mergePeaks.bed
├── [ 25M] chr18_minusCuts.bed
├── [ 14M] chr18_minusCutsOnPeak.bed
├── [ 25M] chr18_plusCuts.bed
├── [ 14M] chr18_plusCutsOnPeak.bed
├── [146K] chr19_mergePeaks.bed
├── [ 72M] chr19_minusCuts.bed
├── [ 68M] chr19_minusCutsOnPeak.bed
├── [ 72M] chr19_plusCuts.bed
├── [ 68M] chr19_plusCutsOnPeak.bed
├── [ 11K] hg38.sizes
├── [ 80M] testbulk_biasExpCuts_minus.bdg
├── [ 80M] testbulk_biasExpCuts_minus_sorted.bdg
├── [ 80M] testbulk_biasExpCuts_plus.bdg
├── [ 80M] testbulk_biasExpCuts_plus_sorted.bdg
├── [ 0] testbulk_chrM.bed
├── [1.4G] testbulk_chromatin.bed
├── [ 64M] testbulk_cleavage_minus.bdg
├── [1.0G] testbulk_cleavage_minus.bed
├── [ 64M] testbulk_cleavage_minus_sorted.bdg
├── [ 64M] testbulk_cleavage_plus.bdg
├── [1.0G] testbulk_cleavage_plus.bed
├── [ 64M] testbulk_cleavage_plus_sorted.bdg
└── [152K] testbulk_summitEXTmerge.bed
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***>
|
The one ends with .bdg is the one you need(biasExpCut). Usually it will automatically transform to bigwig format but it seems that you don’t have UCSCtools installed so the software cannot find the bdg2bw script for the auto transformation. Here you have two option 1. Download and install UCSCtools and transform it yourself or rerun Selma. 2. The bdg file is actually an old format called .wiggle file. You can replace the suffix too .wig and then you can load it using IGV or other browserI’m sorry that I didn’t propose a bias corrected track in SElMa (sorry I just realized that’s what you need probably). If you need a bias corrected cut track you can simply subtract the biasExpCut track (before or after normalization) from the observed cut track in basepair resolution or other resolutions. Hope that helps. Any further comments, suggestions and questions are welcome. BestShawn On Nov 21, 2023, at 2:07 PM, Chris ***@***.***> wrote:
I apologize for all the questions.
I tried to see if running the test command for bulk mode provided by the SELMA readme with --keeptmp gave the bias "corrected" cleavage tracks, but I can't find them. Perhaps I'm missing something?
Command
SELMA -m bulk -i testdata_reads.bed.gz -p testpeak.bed -g hg38 -f PE -o testbulk -t ATAC -s ~/Downloads/hg38.2bit --keeptmp
Output
/usr/local/bin/SELMA:4: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
import('pkg_resources').run_script('SELMA==1.0', 'SELMA')
process log is printed into testbulk_progress.log
Start SELMA bulk
Step0: check input Data and parameters
detected bed file format is PE
bedtools installed
bigWigSummary(UCSCtools) installed
twoBitToFa(UCSCtools) installed
twobitInfo(UCSCtools) installed
bedGraphToBigWig(UCSCtools) installed
Rscript installed
Step0 check input Data and parameters DONE
Step1: check quality and reformat data
summarize reads count distribution
Step1: check quality and reformat data DONE
running time for Step1 : 67.17461681365967
Step2: bias estimation
obtain pre-processed bias matrix from naked DNA data
Step2: bias estimation
running time for Step2: bias estimation: 1.9309027194976807
Step3: peak detection
obtain 6607 peaks from (-p) inputted, extend peak to +/- 200bp from peak center
running time for Step3: peak detection 0.06051301956176758
Step4: generate cleavage and bias profile on peaks
split fragments to strand specific cleavage sites
remove redundant position from the extended peak file
readin sequence from 2bit
calculate bias expected cleavages
pile up cleavage sites
running time for Step4: generate cleavage and bias profile on peaks 359.35685992240906
StepFinal: summary
Collect results
--keeptmp was set, keep intermediate results
Generate summary reports
pdflatex was not installed, SELMA will not generate pdf version of summary report. Please copy the testbulk_summaryReports.tex to an environment with pdflatex installed and complie the pdf file
running time for StepFinal: summary 0.497211217880249
SELMA finished. Check the folder: testbulk/summary/ for results
Output Directory Structure
[ 832] tmpResults/
├── [ 95K] chr18_mergePeaks.bed
├── [ 25M] chr18_minusCuts.bed
├── [ 14M] chr18_minusCutsOnPeak.bed
├── [ 25M] chr18_plusCuts.bed
├── [ 14M] chr18_plusCutsOnPeak.bed
├── [146K] chr19_mergePeaks.bed
├── [ 72M] chr19_minusCuts.bed
├── [ 68M] chr19_minusCutsOnPeak.bed
├── [ 72M] chr19_plusCuts.bed
├── [ 68M] chr19_plusCutsOnPeak.bed
├── [ 11K] hg38.sizes
├── [ 80M] testbulk_biasExpCuts_minus.bdg
├── [ 80M] testbulk_biasExpCuts_minus_sorted.bdg
├── [ 80M] testbulk_biasExpCuts_plus.bdg
├── [ 80M] testbulk_biasExpCuts_plus_sorted.bdg
├── [ 0] testbulk_chrM.bed
├── [1.4G] testbulk_chromatin.bed
├── [ 64M] testbulk_cleavage_minus.bdg
├── [1.0G] testbulk_cleavage_minus.bed
├── [ 64M] testbulk_cleavage_minus_sorted.bdg
├── [ 64M] testbulk_cleavage_plus.bdg
├── [1.0G] testbulk_cleavage_plus.bed
├── [ 64M] testbulk_cleavage_plus_sorted.bdg
└── [152K] testbulk_summitEXTmerge.bed
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***>
|
Hello,
It looks like SELMA has built-in bias score matrices for DNaseI and Tn5, each called by the '-t DATATYPE' flag. Is it possible for the user to generate/use custom bias matrices for other enzymes as well? (e.g. MNase). Thanks in advance.
Best,
Chris
The text was updated successfully, but these errors were encountered: