Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
This started out by looking at why some documentation didn't have short_descriptions on the collection index pages. Looking into it, lead to a long chain of behaviours that eventually arrived at that bug. * I wanted to run pydantic validation and normalization in parallel for all of the plugins that need to be read in but that step would be CPU-bound so I used EventLoop.run_in_executor() with a concurrent.futures.ProcessPoolExecutor. That way each worker would be a separate process and hopefully take better advantage of the CPUs on the system. * Under the hood, ProcessPoolExecutor uses multiprocessing to hand off work to the worker processes. multiprocessing has to be able to pickle the Python objects that are sent to and received from the workers. * It turns out that there's a few bugs in the Python pickle library that cause pydantic exceptions to fail in a specific way: they can pickle fine but they can't unpickle. * That means that my code would encounter a validation error, raise a pydantic.ValidationError, then the worker process would pickle that and send it to the parent process. Once on the parent process, unpickling the error would traceback. * That traceback would be unexpected and ProcessPoolExecutor would assume that things were in an unknown state and it should cancel all of the pending tasks. * So instead of getting a few nice error messages about the few plugins which had broken documentation on the remote side, 30-50% of the plugins were cancelled and gave back BrokenPoolError, which wasn't informative of what the real problem was. The workaround for this is very simple: catch the pydantic exception, extract the information we care about, and then reraise as a different, picklable exception so that the asyncio framework can properly operate on it. Fixes ansible-community#86
- Loading branch information