Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ThreadPool() is not supported on AWS Lambda using Python 3.7 and above #1

Open
mdw123 opened this issue May 13, 2022 · 2 comments
Open

Comments

@mdw123
Copy link

mdw123 commented May 13, 2022

We use docspring for filling out documents in our event-driven python system that runs within AWS Lambdas.

Recently, when upgrading from Python 3.6, your component started failing with the following callstack:

File "/var/task/api/external/docspring_api.py", line 10, in __init__     self._client = docspring.Client()   
File "/var/task/docspring/api/pdf_api.py", line 32, in __init__     api_client = ApiClient()   
File "/var/task/docspring/api_client.py", line 68, in __init__     self.pool = ThreadPool()   
File "/var/lang/lib/python3.8/multiprocessing/pool.py", line 925, in __init__    Pool.__init__(self, processes, initializer, initargs)   
File "/var/lang/lib/python3.8/multiprocessing/pool.py", line 196, in __init__     self._change_notifier = self._ctx.SimpleQueue()   
File "/var/lang/lib/python3.8/multiprocessing/context.py", line 113, in SimpleQueue     return SimpleQueue(ctx=self.get_context())   
File "/var/lang/lib/python3.8/multiprocessing/queues.py", line 336, in __init__     self._rlock = ctx.Lock()   
File "/var/lang/lib/python3.8/multiprocessing/context.py", line 68, in Lock     return Lock(ctx=self.get_context())   File "/var/lang/lib/python3.8/multiprocessing/synchronize.py", line 162, in __init__     SemLock.__init__(self, SEMAPHORE, 1, 1, ctx=ctx)   
File "/var/lang/lib/python3.8/multiprocessing/synchronize.py", line 57, in __init__     sl = self._semlock = _multiprocessing.SemLock( OSError: [Errno 38] Function not implemented

This occurs because the lambda execution environment does not support shared memory.

More: https://aws.amazon.com/blogs/compute/parallel-processing-in-python-with-aws-lambda/

@ndbroadbent
Copy link
Member

Hello, sorry to hear about that! I’m not too familiar with the internals of how our python API library works, or why it needs to use shared memory. We generate this client library using OpenAPI generator. It might be possible that this issue is fixed in a later version, but I’m not sure, and it can take a lot of effort to update since we have a few customizations that need to be merged in carefully. The other difficulty is that this might just be a limitation with AWS functions, so not too sure if we promise anything unfortunately. But I will definitely put this on our todo list and see if there’s something we can do. In the meantime, it might be easier if you could please make the HTTP requests to call the API directly.

@scraimer
Copy link

I found that someone else ran into a similar issue, and solved it by using their own thread-safe (not multiprocessing-safe) lock.

Since the one built in to Python is meant to be multiprocessing-safe, and requires /dev/shm (which doesn't exist in Lambda).

ecederstrand/exchangelib@ea45d19

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants