Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Analysis tab format is inconsistent with the other tabs #35

Open
ivelsko opened this issue Aug 26, 2020 · 4 comments
Open

Analysis tab format is inconsistent with the other tabs #35

ivelsko opened this issue Aug 26, 2020 · 4 comments
Labels

Comments

@ivelsko
Copy link

ivelsko commented Aug 26, 2020

This is more of a formatting change request than an issue. I wanted to use some information in the Analysis tab, but this information is presented differently from the other tabs. That made selecting/filtering for what I wanted more involved, because I kept losing samples and had to figure out why.

Instead of having each entry in Analysis as a column, it's all as rows under 2 columns (analysis.Title, analysis.Result). Is it necessary that the information is presented this way instead of making each entry an individual column?

The way it is currently means that if a sequenced library isn't run through this analysis, it won't have an entry, not even an place-holder NA, so when I filtered for "Initial reads", a bunch of samples I wanted to include were lost from my table (yes, these were blanks. Absolutely I still need them).

I did realize the entries under analysis.Title are not consistent, which is coming from Pandora itself, which is a problem. For example, GRG003.B0101.SG1.1.Human_Shotgun has:

Initial reads (forward+reverse): 
Failed reads (fwd+rev): 
Failed reads (fwd+rev) in %: 
Merged reads: Merged reads in %: 
Mapped reads (fwd+rev+merged): 
Mapped reads (fwd+rev+merged) in %: 
Mapped fragments: 
Mapped fragments in %: 
Mapped fragments (L>=30): 
Mapped fragments (L>=30) in %:

while GRG004.A0101.SG1.1.Human_Shotgun has:

Initial reads: 
Failed reads: 
Failed reads in %: 
Mapped reads/fragments: 
Mapped reads/fragments in %: 
Mapped reads/fragments (L>=30): 
Mapped reads/fragments (L>=30) in %:

Is that difference b/c the human shotgun screening pipeline changed? Can it be normalized across Pandora, so that you can make the entries columns like for the other tabs?

@nevrome
Copy link
Member

nevrome commented Aug 26, 2020

Concerning your first question: Changing the structure from this long to a wide format is something you could usually do easily e.g. with tidyr::pivot_wider().

Unfortunately the second issue you raise -- the inconsistency of the analysis titles -- makes exactly this transformation way more tricky.

Maybe @jfy133 or even @kaypruefer has some input here how this information could be standardized?

@jfy133
Copy link
Member

jfy133 commented Aug 26, 2020

The inconsistency of the analysis titles I think is because of different versions of the pipeline which in principle should be indicated at the Analysis level (not Results String). That said, I don't think Kay has officially announced that these pipelines are stable yet so maybe that is the problem.

@kaypruefer
Copy link

The two Analysis entries differ because the runs differed. One is paired end, the other is single read. I've chosen different naming conventions to make clear how exactly the reads are counted.

There are no conventions on how Analysis entries can look like, deliberately so. You'll have to check what type of Analysis is run and then have an understanding of the fields based on the type of Analysis. Documentation for this is completely absent at the moment. I hope to change that, eventually.

@ivelsko
Copy link
Author

ivelsko commented Aug 26, 2020

I talked w/ @jfy133 this morning, and if you're going to keep the rows instead of columns format, I strongly suggest the difference be documented in the readme. Speaking from the perspective of someone who isn't familiar w/ how you've structured the tabs (any new user), this is a completely unexpected change. Since every other tab is formatted the same way, there's no way to know or reason to suspect that this one is different.

So going along pulling out what I need, all of my select() commands work until suddenly they fail for every possible analysis.xyz column, b/c that information now exists in rows of 2 columns with names that don't exist in Pandora. The user would either have to know in advance what those column names are, or open and look through the table beforehand.

I suggest adding something like "The Analysis tab is formatted differently from the other tabs in sidora. Instead of each entry existing as a column, there are 2 columns analysis.Table and analysis.Results, where the entries (ie Initial reads, Failed reads, etc) and their values are the rows of these 2 columns"

@jfy133 jfy133 added enhancement New feature or request upstream and removed enhancement New feature or request labels Nov 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants