Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Imported HiC .tgz files created by HiCPro2DNAlandscapeR fail #26

Open
bioinformike opened this issue Dec 12, 2019 · 0 comments
Open

Imported HiC .tgz files created by HiCPro2DNAlandscapeR fail #26

bioinformike opened this issue Dec 12, 2019 · 0 comments

Comments

@bioinformike
Copy link
Contributor

Overview
There is currently an issue or misunderstanding having to do with importing and visualizing HiC Pro data exported by HiCPro2DNAlandscapeR .

sparseHiC::HiCPro2DNAlandscapeR
The HiCPro2DNAlandscapeR function converts HiC Pro data into an RDS file and then outputs that RDS file and a metadata file inside a .tgz.

MySample-HiC.tgz
|-- MySample-HiC.tar
|   |-- MySample-HiC.rds
|   |-- MySample-HiC.sparseHiC.meta

DNAlandscapeR Import
On the import tab of DNAlandscapeR, in the 'Data type' drop-down, HiC data is referenced as 'Hi-C/.tgz', giving the impression that the user should select a .tgz file for import, such as the one output by HiCPro2DNAlandscapeR.

dna_lands

However, if you import the .tgz file and try to visualize it readRDS inside DNAlandscapeR throws the following error:

(message = "unknown input format", call = readRDS(file)

This appears to be due to DNAlandscapeR handing the .tgz file directly to readRDS without first extracting the contained .rds file.

Possible Fixes

  1. The 'Data type' field for HiC could be changed to 'Hi-C/.rds' so that users know to select the .rds file for importing and not the .tgz file.

  2. Code could be inserted into server.R that handles unzipping the .tgz file and giving DNAlandscapeR the path to the .rds file that is inside.

    Such as the following change to server.R starting at line 560.

# Hi C data
} else { #datType is 6
     old <- curfile
+   if (tolower(tools::file_ext(old)) == 'tgz')
+   {
+        # Extract the files and pull the list into var
+        # At the same time push entire file name tolower for grep
+        comp_files = tolower(untar(old, list=TRUE))
          
+         # Get the path to the RDS file we just pulled out of tgz
+         rds_path = grep("rds", comp_files, value = T)
+         curfile <- rds_path
+   else 
+   {
           curfile <- paste0(curfile, name, ".rds")
           file.rename(old, curfile)
+   }    

Notes

  • I'm not aware of the possible downstream or unintended consequences of either possible fix as I only tested the HiC visualization functionality after implementing the code change described in fix 2.

  • My code in fix 2 includes the use of the tools library. This additional dependency should have little effect as it is included in the R base and I call it without the need for importing it, tools::file_ext().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant