-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatically reset PCs based on var size #76
Conversation
BTW, |
Thanks for reviewing @nh3 . We just don't want the PCA tool to fail when we happen to supply a matrix that invalidates the hard-coded number of PCs. Analagous to the variable genes fix the other day, it should just knock the number of PCs down to the maximum allowed value (perhaps with a warning). |
Sounds good to me. Should I swap to 'n_comps'? |
Yep- see API doc for the function being called here: https://icb-scanpy.readthedocs-hosted.com/en/stable/api/scanpy.tl.pca.html |
I was looking here scanpy-scripts/scanpy_scripts/cmd_options.py Line 114 in 0340cea
I wondered why this was different to scanpy. |
scanpy-scripts/scanpy_scripts/cmd_options.py Line 160 in 0340cea
(sorry for earlier incorrect response). It is confusing, but PC-related arguments are used in a number of different places in Scanpy, not always for the same thing. You just picked the wrong one is all. |
Is this acceptable @nh3 ? An an example, Matt had a matrix with 33 genes. The default of 50 causes a nasty error becuase 50 >= floor(30/2). Seems more graceful to reduce to the maximum possible. |
I think the method of modifying When not specifying Also, the behaviour of reduce to largest number of PCs mathematically allowed seems more as expected than reduce to some not so well-known fractions. Just my two cents. |
Thanks @nh3 - yes, that appears to be the heuristic required (not my own invention, honest!). The 33 genes in the above example required no more than 15 PCs- it really isn't The need to apply the heuristics is that the PCA tool will die with small matrices if a default of 50 is left in place, killing the workflow, and we can't run the workflow unsupervised. Just doesn't seem graceful behaviour, and it's just nice to be able to say "50 when you can, otherwise as many as you can". I actually experienced an issue with the neighours step- it doesn't like the sparse matrices that come from using the expression matrix (was going to be the subject of a future PR)- so we're forcing to use the PCs for now. |
Have you tried more datasets, and values of |
@hewgreen will have to have a go- I'm on annual leave for 2 weeks very soon ;-). Here's the error trace:
... and here's the input object: |
Ahh, ths was merged on another PR after (#78). |
This is required for imaging omic datasets where the gene number (vars) is likely to be very low. Passing PC higher than genes would not be good.