-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactoring to prepare for gevent #9
base: master
Are you sure you want to change the base?
Conversation
@justcool393 ok, so I think I got everything worked out. I'm performing all archiving actions in parallel (it's wicked fast 🐎💨), but apply rate limiting to the creation of archives of links pointing to reddit (I know this sounds weird). I also noticed we got banned from 4 subs, some of which are quite active, so I decided to automatically unsubscribe from subs we have been banned from. If you're ok with this, I'm going to test this new setup on the server. |
@hidde-jan I don't know how I missed this message. This is excellent. Small thing, I believe Archive.is automagically ratelimits its own requests to reddit (ceddit is based of some quirk on how reddit works, so those are based off of the users who are browsing the archive), so it may only be necessary for sites that do not ratelimit themselves. I do think this is an great idea. The reason I had it at five initially was because reddit was weird about requests when they came from archive.org. Further, I send a message to the admins about maybe getting one of the services (that isn't currently in place) un-spamfiltered for more redundancy. Also, I'll ask again about archive.org when I get a response. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very good. The only things that I saw were minor (I mentioned in other comments down below), and more suggestions rather than issues
I'm still not completely satisfied. It currently only handles one submission at a time. I want to set up some queue based system that can handle multiple things at once. The xkcd transcriber bot has something like this. I've already started looking at it. |
For non-reddit links, we create all archives in parallel. For reddit links, we rate limit other sites by only creating archives once every two seconds.
I'm gonna test this in our production setting this week :) |
Standard PRAW rate limiting doesn’t work with gevent
Ok, I think I know why I abandoned this last year. Praw is not thread safe. It's rate limit function depends on time.sleep and even which patching that out means that there is no actual way to get this working. I'm still pretty happy with the refactoring in this PR, so I'm going to port them to the master brach and close this PR. |
This is a WIP branch where I'm refactoring a bit. We currently perform all http requests sequential, which takes a lot of time per submission and limits our ability to add more subreddits to monitor. By using gevent (or some other solution) we can perform the requests (mostly) in parallel, or at least make them non-blocking. This hopefully speeds up the bot a huge deal.