-
-
Notifications
You must be signed in to change notification settings - Fork 391
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Forbid await
in for
loop
#1600
Comments
What if you're using * Note, we're making the assumption that the code is also doing something other than fetch pages. |
Hi @sobolevn |
@shivang970 you would need to use And Thanks a lot for your help! |
@Sxderp there are better ways to do it. https://stackoverflow.com/questions/42231161/asyncio-gather-vs-asyncio-wait |
I understand how gather works, but there are workflows in which we don't want to "start off" 100 connection requests. Using gather would cause everything to initialize in fairly quick succession. I can see something like gathering in increments of four, but you are still waiting in a loop. Anyway, I think rate limiting is definitely an appropriate use. |
I can get behind a Rule that forbids await in list comprehensions or a single line for loop. I think in most circumstances that would be a mistake. The rate limiting could potentially still apply, but if you need rate limiting you're generally doing other things with the data (more than one line). |
Can you please provide a real-ish code sample with rate limiting in a loop? |
Here is a stripped sample of something similar to what I've used. I can also think of examples in IPC in which state is encoded in the response and you need to send part of the response back in order to perform your next request. Something like a Redis "SCAN" operation. I have no actual implements doing that with asyncio, but just a thought. import aiohttp
import asyncio
import random
from bs4 import BeautifulSoup
# Galleries would normally be fetched too. But for simplicity sake
# hardcode them.
GALLERIES = [
'https://example.com/gallery/1',
'https://example.com/gallery/2',
'https://example.com/gallery/3',
'https://example.com/gallery/4',
]
async def extract_image_srcs(session, gallery_url):
async with session.get(gallery_url) as gallery_response:
gallery_soup = BeautifulSoup(await gallery_response.text(), features='lxml')
return [img.attrs['src'] for img in gallery_soup.findAll('img')]
async def get_image_data(session, image_src):
async with session.get(image_url) as image_response:
return await image_response.read()
async def fetch_gallery(gallery_url):
session = aiohttp.ClientSession()
image_sources = await extract_image_srcs(session, gallery_url)
for image_src in image_sources:
# This could technically be "gathered" but await is used to artifically
# reduced the number of potential requests. We can't guarantee any
# specific number, but it'll definitely be less than doing everything
# at once.
image_data = await get_image_data(session, image_src)
with open(str(random.randint(1, 255)), 'wb') as fd:
fd.write(image_data)
# Optionally include additional waiting if we want to be extra nice.
await asyncio.sleep(random.randint(1, 10))
await session.close()
async def bootstrap():
await asyncio.gather(*[fetch_gallery(gallery_url) for gallery_url in GALLERIES])
asyncio.run(bootstrap()) |
You should use asyncio.Semaphore if you need to ratelimit your request. |
If you're rate-limiting with more than one I can agree. If it's one I'm not sure I see the point. |
https://github.com/Instagram/Fixit/blob/master/fixit/rules/gather_sequential_await.py
The text was updated successfully, but these errors were encountered: