-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Processing VCF files #17
Comments
What was the reasoning behind truncating after 50 variants? Is it a gut-feeling that this limit will not inhibit users' ability to analyse their VCF data? Is it a necessary limit to avoid overloading the server? Do we have any data regarding the number of variants in submitted VCF files? In a way, we need to answer each of these questions to allow us to have an opinion on how to proceed. If server capacity is not an issue, we can chunk the processing of large VCF files and better support our customers. |
50,000 sorry. Typo |
Yes, we don't want to overload the system and have to consider other customers. We have asynchronous scheduling but huge jobs take up a lot of capacity. Also, how big a job realistically can we handle before we cannot email the job back We used to cut off at 25000 variants output. Now at 50,000 variants input which for some genes can be a huge output. I have had VCFs from 1 variant up to several million submitted |
When users submit VCFs containing several million variants, you have to wonder whether they understand what they are doing. Sounds like no filtering has been applied. |
That's exactly what I was thinking. The tool is not for annotating huge VCFs aka VEP. It is for extracting HGVS descriptions. I'm kind of thinking 50000 variants is overly generous, but that doesn't mean we don't consider upping it. I'm torn(ish) |
Perhaps the vcf2hgvs entry page needs to spell out it's purpose more clearly to steer away those users who perhaps should be using VEP instead. |
Also a valid statement. |
We need to think about how VCf files are processed. Currently we truncate the job after 50 variants.
An alternative approach, which Teri will describe, would allow us to handle large VCF processing in smaller chunks, which will be re-assembled and sent to the user once completed (As a single email this time, DOH!)
Pros - We can support larger VCFs which may be useful to our customers
Cons - Since we don't as such annotate VCFs why bulk processing. Is this really what the tool is for. We also don't have the capacity of EBI. What are our limitations?
The text was updated successfully, but these errors were encountered: