Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Syntax of covrowinds and geneticrowinds #34

Open
came203 opened this issue Jan 19, 2023 · 1 comment
Open

Syntax of covrowinds and geneticrowinds #34

came203 opened this issue Jan 19, 2023 · 1 comment

Comments

@came203
Copy link

came203 commented Jan 19, 2023

As a follow-up to my last issue, it's quite cumbersome to construct VCF or BED files for each phenotype separately, so the best way to go on would be probably to use the covrowinds and geneticrowinds argument to select only the samples used in the current analysis. As I'm by no means an expert in Julia, how would one go on to implement this, similar to the snpinds argument which works very nicely. Is there any easy way to construct the covrowinds and geneticrowinds using IDs of participants? Or does one have to construct the indices manually? What is the syntax of the indices?`
Any pointers on this would be highly appreciated

@kose-y
Copy link
Member

kose-y commented Jan 20, 2023

My go-to method would be using dictionaries. Julia has an efficient built-in data structure for the dictionaries, just like in Python.

Here is an example using the bgen format (edited from https://github.com/kose-y/TrajGWAS-reproducibility/blob/main/ukbiobank/4_scoretest_script/scoretest.jl):

using BGEN

genetic_iids_subsample = # a list of sample names here
data = Bgen(bgenfilename * ".bgen"; sample_path = samplefilename)
genetic_iids = samples(data) # assuming sample names are `String`s directly mapped to individual IDs. 

order_dict = Dict{String, Int}() # creating an empty dictionary with key type String, value type Int.
for (i, iid) in enumerate(genetic_iids)
    order_dict[iid] = i
end

sample_indicator = falses(length(genetic_iids))
for v in genetic_iids_subsample
    sample_indicator[order_dict[v]] = true
end

# then use sample_indicator as `geneticrowinds`

You can do the similar with the SNP's RefSeq IDs when constructing snpinds.

It would be a nice feature to have in our package, but it is tricky since it depends on the files you have, as you can see in the example that I linked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants