-
Notifications
You must be signed in to change notification settings - Fork 9
Performance problems of multi process implementation of a block of code #28
Comments
Hi, thanks for the feedback, is much appreciated. Sorry about the poor performance. To be honest, I haven't touched this particular functionality in a long time! I don't think I ever did exhaustive performance checks (which is embarrassing) as it worked okay on the validation data I was running. I'm currently working on a new release, albeit slowly (I'm the only dev on this project). If you're happy I would like to test your solution and include it in the new release with attribution ofc. Or you can make a PR on the v2.1 branch and make these changes. Feel free to contact me at [email protected] for further discussion. Thanks, |
Hey Ross! Not sure if you've already addressed this, but wanted to chime in. I recently moved my implementation for finding points in a polygon from FlowKit to FlowUtils. Basically, all C extension based functions will go in FlowUtils from now on, and I have binary distributions on PyPI for FlowUtils for all the major platforms & recent Python versions. This should go a long way in helping those users without C compilers get the dependencies set up. The points in ellipsoid (supporting true n-dimensional data) function was moved to FlowUtils as well. Both of these are compatible with the GatingML definitions, i.e. how to handle points on the boundary, etc. On a side note, have you considered using FlowKit for CytoPy? CytoPy is a much more ambitious project in many ways. FlowKit aims to be an API for the more traditional flow concepts, and will never be or have a GUI in it's code base. I've toyed with the idea of an Electron based front-end but it would be a separate project and I need to finish the FlowJo workspace support in FlowKit first. Would be happy to discuss via email if you are interested. Regards, |
Hey Scott! Thanks for reaching out, I think if I remember rightly I have resolved this issue in the unpublished dev branch (v3.0) of CytoPy, but this is good to know. My implementation is still in Python so your C extension will probably be quicker. I haven't considered FlowKit, mostly because CytoPy is aimed more at high-dimensional cytometry data and biomarker discovery. Its designed around the assumption of the following workflow:
I do really admire what you have done with FlowKit however and I think it's major advantage is the GatingML compatibility which is a huge gap in CytoPy currently. The future for CytoPy, once I've finished writing my PhD thesis and have time to implement it, is a decoupled ecosystem. So in the future there will be CytoPy v3.0, which will depend on some smaller packages that can be used in isolation for those that don't need automated gating and a complex MongoDB database to track meta-data. There is probably a lot of overlap here in our work - I use your FlowUtils and FlowIO package extensively! Great job btw. If you ever need help maintaining it, please let me know, I owe you one for creating such great packages! As for future CytoPy work, I've manage to make a start by pulling out some of functionality into smaller packages: Perhaps there is some areas we can collaborate here. I'm of the same mind and have thought about an Electron frontend or perhaps Django and React (a bit like your ReFlow work) but again, it always comes down to time. |
In the "inside_polygon" function of "cytopy/data/geometry.py", the performance of multi process implementation is very poor. A 10000 line of data takes nearly 30 seconds, but it only takes less than 0.04 seconds to change to normal programming.For the time being, I simply handle it like this, adding a row count judgment.Maybe you have a better way to deal with it.
My configuration:
python: 3.8.7
Memory: 32g
CPU: i7-1165g7 (4 cores and 8 threads)
The text was updated successfully, but these errors were encountered: