Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable JSON <-> YAML, JSON <-> binary conversion? #16

Open
julesjacobsen opened this issue Oct 27, 2021 · 8 comments
Open

Enable JSON <-> YAML, JSON <-> binary conversion? #16

julesjacobsen opened this issue Oct 27, 2021 · 8 comments

Comments

@julesjacobsen
Copy link
Collaborator

Currently the converter only handles JSON. Might be an idea to offer conversion of other formats too.

@pnrobinson
Copy link
Collaborator

@julesjacobsen not sure this is absolutely needed? If we stick with JSON that will mean we encourage people to use JSON as the primary format?

@pnrobinson
Copy link
Collaborator

@julesjacobsen see new class DefaultPhenopacketIngestor. We could add some functions to this class such as public fromYamlFile(...) and DefaultPhenopacketIngestor(Message message). Thoughts?

@pnrobinson
Copy link
Collaborator

@ielis is this issue closable? I think this is supported for some operations

@ielis
Copy link
Collaborator

ielis commented Nov 29, 2022

In principle yes. Each command that reads or writes a phenopacket accepts/produces phenopacket, family, or cohort in any of these formats.
The commands have the -f | --format option for the input data. The convert command has the --output-format option for choosing the, well, output format.

We do not have a command solely for the format conversion (something similar to cat sample.bam | samtools view -S > file.sam). Implementing the command is a no-brainer, since we already have all the nuts and bolts. I just need some use case.

@andrewpatto
Copy link

@julesjacobsen not sure this is absolutely needed? If we stick with JSON that will mean we encourage people to use JSON as the primary format?

Just revisiting this - is JSON the primary format for phenopackets? Is this written somewhere else?

I am trying to do some dataset sharing (ala EGA) - and was considering placing a phenopacket alongside each individuals' genomic artifacts. But I was assuming I needed that to be a protobuf file with some sort of known file suffix like pxf to be a primary format Phenopacket.

e.g.

ABC.bam
ABC.vcf
ABC.pxf

And so to that end - I was going to store some v2 JSON or YAML phenopackets for ease of editing - and then convert them over to protobuf using the CLI tool (so this is my +1 for the general feature of being able to convert between formats with just the CLI tool - which is currently not possible - convert requires the input to be v1 format)

But if JSON is the primary way we think phenopackets are to be exchanged in the wild - then I can skip using protobuf entirely.

Is there some suggested file naming conventions to let people know it is a phenopacket (in JSON)?

@andrewpatto
Copy link

I should add that I am starting via hand crafting some examples for a demonstration of how this would all work - hence the hand editing of JSON or YAML.

Obviously for a real system I would be translating from some clinical source like an EHR or Redcap or something and so I guess I would do that using the Java library and output easily whatever format choice I wanted.

I think the broader thought is still there - if I have unlimited choice here - what is the primary "phenopacket" file format and how should I name them to make this clear?

@pnrobinson
Copy link
Collaborator

Hi Andrew, there could be a lossless conversion from protobuf (binary), JSON, YAML, XML, SQL ... so there really isn't a primary format. My guess is that almost everybody would prefer JSON because of the tooling for JSON.

@andrewpatto
Copy link

In which case - having an tool that seamlessly converts between the formats might be useful (if I get a batch of phenopackets in protobuf but would prefer them in JSON) - I can just run the CLI tool to convert.. (rather than dusting off my java and writing a small snippet using the library to do the same)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants