Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to set only one haplotype of a diploid to reference-sense in gbwt #4533

Open
ASLeonard opened this issue Feb 28, 2025 · 1 comment
Open

Comments

@ASLeonard
Copy link

Hi,

Setting the "references" via the GFA RS:Z: tag in the header or using vg gbwt --set-reference for a given panSN sample name works fine. However, for triobinned assemblies (particularly in agriculture), one haplotype might be taken as the reference while the other is not. Setting the header with RS:Z:sample#hap results in HAPLOTYPE-sense, while trying vg gbwt --set-reference "sample#hap" fails due to a prohibited character ("#").

Perhaps the easiest solution is to promote the entire sample to reference and just ignore that the non-reference haplotype is REFERENCE-sense, but do you have any other ideas? I guess handling a half-reference half-haplotype sample might complicate a lot of the internal workings for diploid sampling, but parsing the haplotype field of panSN for setting reference paths seems doable.

This is using v1.63.1.

Best,
Alex

@jltsiren
Copy link
Contributor

Only entire samples can be set as references.

GBZ could in principle specify references at the level of individual paths, but there are no interfaces for passing that information to the graph. Not on the command line, not in the API, and not in the file formats. Some other implementations determine the sense by parsing the path name, and renaming the paths is the only way to switch between haplotype and reference senses. I'm not exactly sure about the rules. sample#sequence is clearly a reference and sample#haplotype#sequence#something is clearly a haplotype, but I'm not sure how PanSN names will be interpreted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants