add option to use maskfile #12

zmaroti · 2023-10-19T10:27:52Z

Hi,

It would be nice if you could add the maskfile option to either to the

hapBLOCK_chroms (to not emit IBD from mask areas, since all relevant genom coordinate info is available here)
or
filter_ibd_df plus the caller create_ind_ibd_df ind_all_ibd_df (to filter IBD instead of (or additionally with) the SNP density parameter)

functions as a parameter since this could be handled naturally in the base package.

(The individual IBD data in the output of hapBLOCK_chroms (yet) does not contain the genomic coordinates, and the mapping data is not the same scale (M vs cM) as in the mask data, thus simple "shell magic" would be complex to do this.)

While at a few samples, and at the individual pairwise IBD share it is not an issue, when you work with several hundreds individuals the combinations (N*(N-1)/2) gets large and at these genome locations almost everyone will share IBD with all other samples. This result in nedlessly large portion of these false positive IBD compared to the randomly distributed true IBD in the outputs.

Thanks!

The text was updated successfully, but these errors were encountered:

hringbauer · 2023-11-16T10:15:19Z

That is an excellent suggestion showing some deep competence. Thank you!

We will work on implementing it, as it could substantially speed up the post-processing for large datasets. I leave the thread open until then.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add option to use maskfile #12

add option to use maskfile #12

zmaroti commented Oct 19, 2023

hringbauer commented Nov 16, 2023

add option to use maskfile #12

add option to use maskfile #12

Comments

zmaroti commented Oct 19, 2023

hringbauer commented Nov 16, 2023