-
-
Notifications
You must be signed in to change notification settings - Fork 208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add per-client rate-limiting #1052
Conversation
… seconds. Signed-off-by: DL6ER <[email protected]>
d96e324
to
bf52156
Compare
I wonder, would a default of |
I guess we want this protection to be enabled by default. Use cases are clients going crazy because of some defect and/or DNS loops between a router and the Pi-hole. We can add a warning to the Pi-hole diagnosis system if you like so users are aware of this. My first attempt was showing rate-limited queries in the Query Log, however, this would not effectively help against a DoS attack as the memory needed to hold them would still quickly grow. We should rather stress in the change log that this is something you can disable. The numbers I put in should not even be close to be triggered by any correctly working client. 10,000 queries per minute still allow 14,4 million queries per-client in 24 hours. We may even want to reduce this number but |
This pull request has been mentioned on Pi-hole Userspace. There might be relevant details there: |
Would be nice if this new rate limiting setting can be adjusted via the Web GUI. |
@LordSimal Why is your backup script getting rate-limited in the first place? And do you consider this healthy behavior? |
I use rclone to sync my server data to PCloud. I cant decide/change any way how rclone connects to the server. I guess every request, and therefore ever file/folder, is creating a new DNS query. |
Rate-limiting is only measured and applied per-client. You cannot set different levels for different devices. The proper fix for this behavior seems to be a local DNS chance on the server. Like what modern Ubuntu ships with |
Well I am running on Ubuntu 20 so I will look into that. Still I think it would be nice if there would be at least a notice in the backend when a client is being rate limited (not only in debug mode). On the client side I only got "server misbehaving" (which is OK of course) and in the pihole.log the Query was logged with REFUSED. Can that at least be adapted to REFUSED/RATELIMITED or something that leads to this feature? |
"Or at least notify the user about that new feature when updating (which now is a bit late i would say)" In all of our release announcements, we have stressed the need to read the release notes for each new release. This new feature, in particular, was thoroughly discussed in the V5.7 release notes: https://pi-hole.net/2021/02/16/pi-hole-ftl-v5-7-and-web-v5-4-released/#page-content |
Im sorry, I didn't look at the blog/release notes 🙇🏻 |
Or/and first make a web-gui instell page nd make it possible to enter it per client. Before you make a drastic short mindbreak.So next time make sure to have everything ready before you inplement stuff. as a router, fer example pfsense use pihole to. And the router has a lot of dns quiries from other devices..... smart!!! |
@anno006 Smart people read the release notes or blog posts before updating. We mentioned the option to disable rate-limiting and you could have done so even before updating to ensure there would have been no downtime. In your case, your router (pfSense) seems to be configured incorrectly which is the cause for this. When using ECS (EDNS0 Client Subnet), the Pi-hole can tell apart your clients even when the router makes all the queries on their behalf. We are preparing a pfSense + Pi-hole step-by-step to make this more obvious in the future. As you can see above, this feature was merged and released more than half a year ago. In the meantime, it always turned out that there is an underlying problem (rouge client) whenever rate limiting was triggered. Even for those where the router is the single point of contact. The default value is rather permissive: 1000 queries every 60 seconds equals up to 1.4 million queries a day. If your network is larger than this, you should clearly have your router configured accordingly to stop things like these from happening. |
Really smart people will accept there mistakes and learn from them instead of backfire. As i said it is possible to overrun you limet. Next time make the step by step before activating... |
It wasn't my intention to "backfire". Not in the slightest. Rereading my own post, I can see how you can have seen it differently. You have my sincere apologies for that. I just wanted to point out that you can improve your router configuration to get an improvement which also influences rate-limiting. I can also assure you that we have no issues with undoing a change if it turned out to be incorrect. We have at least a few thousand Pi-hole users out there and I've heard less than five complaints about it. Typically, users were even thankful because it revealed misbehaving clients and they were able to do something against it. And I have seen at least one report that helped a user to keep the rest of their network responsive when one client went nuts and started requesting millions of queries per minute. What do you consider to be the mistake exactly? Rate-limiting in the first place or do you consider the default value too low? Would it still be too low in your network when pfSense would be configured such that clients can be kept apart? Or is the problem rather that the value is not modifiable via the web dashboard? If the latter, please take into account that the vast majority of advanced settings is currently not editable on the wbe interface. |
I did search for the terms you posted. But this is also pretty new. it has the same problem, not inplemented yet or atleast understandable/findable for the peaple involved in the devoloping. not for people like me. don't understand me wrong i don't dislike it, only the way you inplement it.
i do like what i read about the EDNS and you explenation that it can help with misbehaving. I will look out for the step by step. And last some background info about my system. i am running a pfsense (own hardware) run some vm's for homeautomation and pihole, have about 200000 queries in 24 hours. |
The link to the documentation is directly there on the web dashboard. The place where to put the options is at the very top of this linked document:
I'm open to suggestions how to improve the situation.
Yes. We've had a lot of discussions about the default value on our Discourse forum and this is what we concluded on. The motivation behind this is that Pi-hole is typically employed in a regular household with typically not more than a dozen devices. If Pi-hole is run on an Raspberry Pi and only one of the clients goes near the rate limiting limit of 1.4 million queries a day, the Pi-hole will continue to work. However, if only two clients go close to this query rate, the Pi-hole will eventually stop working correctly because the memory will be used up. Hence, we didn't want to set the default even higher. Of course, there are much larger networks and Pi-hole may be running on much beefier hardware with more than 1 GB of memory available, but this is rather edge-case than a default setup. This just to give some background of why we chose this value.
You are not the first one asking for this, we're currently preparing a blog post about this. @dschaper might be able to give some more info about the current status of it (I do not use pfSense myself).
So this means roughly 140 queries per minute so about 7 times lower than the default limit. So we're talking about peak load here. We could easily change the default from 1,000 queries in 1 minute to, say, 10,000 queries in 10 minutes. Even when this would not change the upper limit, it'd relax the peak load issue. However, this would also mean that clients will only later be rate-limited and blocking them will take longer. This is the price to pay. @dschaper thoughts? |
I have most of the guide written for OPNsense. If pfSense uses The basic idea is to add
as a |
@anno006 Please read https://pi-hole.net/2021/09/30/pi-hole-and-opnsense/ and see if this works on pfsense as well. Thanks! |
I'd counter that with Arch being broken if it's querying 1000 different hosts to get imagemagick libraries in one minute. At least it should be caching some of the domain queries. |
Please have a look at my reply to the initial request of enlightenment.
I disagree for the reasons already mentioned above:
Maybe we can add a step in the installer script that asks the user to specify a custom limit (or accept the default) so you can guess what would be appropriate for your setup. |
I appreciate the level of care taken about system resources, especially in the context of rpi's constrained RAM. However, i feel that the approach taken here is not the commonly accepted one. We have many well-known mechanisms in the OS to tackle the problem of lack of memory. If i don't have enough RAM, OOM kicks in, and it is loud, and it is known where to look for its signs, and it will probably yell at me as soon as i ssh to the rpi. If i am ok with some slowness, i add swap. If i want to limit RAM, i use cgroups. And those who care about system resources monitor them with grafana or something else. Firefox does not refuse to open the 101-st tab. Messenger does not refuse to accept n-th message. Terminal does not refuse to execute m-th command. They just try to do it. I still think that rate-limiting should be opt-in instead of opt-out. |
The comparison isn't valid because Firefox spawn individual processes that can by killed independently without taking the entire application down. Neither do messagers, etc. keep all their stuff in memory. Pi-hole keeps all queries in memory intentionally so you can do quick filtering and requesting. In the end, Pi-hole always pays attention to work fast even on the low-end of hardware and this means we cannot do a ton of disk lookups over the slow SD interface. I also disagree about OOM being "loud". You will not notice a single bit of its action if you are not connected to a live terminal that will show "Killed." or if you are not used to reading the system logs manually. I would argue that Pi-hole's rate-limiting is a lot louder (for the inexperienced user) as it will be shown prominently on the dashboard and an in the log files, too. I perfectly see how power users want to have more control and that us giving them the control via a config options isn't obvious enough. This and
is why I suggested
This will explicitly ask you what to do. It will neither by an opt-in or opt-out. Instead it will be explicitly asking. This ensures we are not actually making any assumptions for what is good for the user. |
I also just found out about this feature, by restoring my last firefox session with like 15 tabs. Reading this thread I agree on the rate limiting itself, even if it might be too conservative per default. |
|
I meant the time of being limited. |
It's all there in the docs.
|
Correct me if I'm wrong, but all I see there is the time and the amount of queries needed to get limited. But I want to set the time of being rate limited. Even after 5 minutes I was unable to use anything on my PC because of being limited. Therefore, I would like to know, how to configure the time of being in the state "rate limited" or at least how to un-limit again. Not setting the bounds of before getting limited. |
I agree it's not that obvious and we need to update the documentation. Since #1199
So far, there is no way to
It's always until the end of the interval or until the end of the next interval if the limit is reached while being blocked. If a client continues to send queries it will be blocked forever. |
Ah I see, thanks. So likely systemd-resolve needs to be reconfigured to don't retry too often..
Edit: Maybe instead of blocking all requests all together, can't we just reject all requests above the limit? |
If you surpass the limit at any point, the client gets blocked. This means they were able to do 1000 queries within a minute and then only blocked thereafter. A client is unblocked as soon as it made less than 1000 queries per minute.
Here your client was only slightly above the limit and almost got de-limited. The client could immediately have done 1000 queries thereafter.
Because this doesn't look like a proper solution. Imagine the typical usecase of the rate-limitation that could be a DNS loop between, e.g., the router and the Pi-hole or a client going rouge. Allowing 1000 queries per minute leads to significant load on the Pi-hole - consider also the ever growing queries database eventually eating up all space on disk. |
I adjusted my above answer slightly as it was a bit incorrect regarding the interval. The interval steps correspond to the set rate-limiting interval. They also do not correspond to the straight minute but relative to when FTL has finished starting (so start of the daemon + possible delay by DELAY_STARTUP).
This was the behavior before #1199 |
I route all our clients to our in house DNS then have the pi-hole setup as the first forwarder address. So in my case the in house DNS server, serving 150+ clients, can easily hit 1000 requests in 60 seconds and does so often. I increased my rate limit in the config file to 2000/60 and it rarely hits that but it would be nice if this was a GUI setting as others have mentioned. |
Feedback from user land:
My take-away here is that this is a suboptimal failsafe for a problem that, granted, I barely understand but keeps off -the rack software from well-known vendors from working as designed AND discoverability of the source of the issue is super low AND ease of removing the blocker is super complicated. |
@MarkusNiGit Sorry for the inconvenience this feature caused. We've been discussing ways to improve the user experience but this is difficult. For instance, the jumping triangle used to be larger and jump more heavily in a previous version but users complained its too catchy so it was reduced in size to be less obtrusive. This may or not be the reason why you didn't see it before. My initial idea to add a dialogue to the installer that explicitly asks you to configure the rate limit was discussed by then dismissed because it had a couple of drawbacks. The main being that there are no dialogues on upgrades, only on fresh installs, as the former is a semi-automated process that shouldn't require user interaction. So I meant to add a dedicated page to the dashboard's settings page but, you know, family duties took over and, eventually, I forgot about it, so I'll use this as a reminder to hopefully get this done, soonish.
I know you said that a few times but I disagree on this bit. Let me say why: Synology might be a bigger company (about 600 employees according to Wikipedia), but this doesn't mean they are a global player like Google and similar. While the latter surely have a quality assurance center, I don't think this is true for Synology. At least not for their software or, at least, not for this part of it. Why do I say this? Because opening a dedicated connection per file (which is what I deduce from your explanation) is not good software design by any means. Imagine you are synchronizing a large number of really small files. The three-way handshake of TCP may easily be more traffic than the synchronization itself. Even more when TLS (HTTPS) is also in use. On top comes the thousands of DNS queries that, in a typical setup without a Pi-hole (absence of a caching DNS resolver) bounce back and forth between you and, say, Google DNS and cause additional delays and traffic all over the place. This all does not tell me that this is a good software. snip Concerning your three points: I disagree on the quality of the software and explained why (point 1), point 2 is unfortunate but it seems we cannot have a solution that suits all users, point 3: I'll work on this as said above. |
I used the word "professional" and not "quality" to describe the Synology solution deliberately. I cannot judge the quality of the software though even I can tell that multiple DNS requests per file to be transferred might appear inefficient but there might be very good reasons such as restrictions on the API put in place by the cloud provider. My point is that you cannot judge either whether this is "good software" and this vendor created facts by selling the solution as it is currently designed. Based on your answer you seem to be equally concerned about "rogue clients" and enforcing quality on the software of vendors that sell professional solutions. |
This "feature" cost me quite a bit of time to track down... for the future it would be nice if features like these would be asked about during an upgrade instead of being silently enabled. Either that or only enable features like these on new installations. I understand that 1000 queries per minute seems like a reasonable limit, but pihole has enough power users and/or quirky setups that can run into issues. |
"it would be nice if features like these would be asked about during an upgrade instead of being silently enabled." This is why we write and publish detailed release notes, and in our release post we remind readers to read the notes prior to upgrading. https://pi-hole.net/blog/2021/02/16/pi-hole-ftl-v5-7-and-web-v5-4-released/ At user request, activation of rate limits was added to the diagnostic messages, and this change was also covered in release notes: https://pi-hole.net/blog/2021/09/11/pi-hole-ftl-v5-9-web-v5-6-and-core-v5-4-released/ A later release did some tweaking to rate limits: https://pi-hole.net/blog/2021/10/23/pi-hole-ftl-v5-11-web-v5-8-and-core-v5-6-released/ |
I know, and I did read them. I just never assumed I would hit the limit because at first glance it seemed reasonable. Only after a bunch of random issues over te last few days did I notice that pi-hole was the culprit. That's the issue with these types of changes. Everything might seem fine and dandy initially but break things down the line at some seemingly random moment when some service or cron job suddenly does a burst of requests. |
bf52156 seems to have altered the default rate limit of queries for pihole and if this should be opened in a new issue I can do that but in what basis is that number generated? I can't see how this limit would even support a normal network of a household of four. This is probably in the same concern others have expressed in this pull request. |
The commit you linked did not change any rate limit but introduced it in the first place. Before there was no rate-limit. This is also not a new commit, but over 1 year old. No need to open an new issue about it. |
Just chiming in I think this should be an option in the web GUI or opt-in. I was pulling my hair out why my network was grinding to a halt, but if I rebooted things it would work for a while then stop. Well, turns out the corporate VPN I am using (I work from home, no way around it) likes to randomly spam thousands of DNS requests for literally every network endpoint at random times, which rate limits me and blocks all further connections from my main router. I ended up disabling rate limiting on both my pihole's to resolve it, but it would be nice if there was just a toggle to turn it on/off, so I could turn it back on after my VPN is finished having its fit. |
"turns out the corporate VPN I am using (I work from home, no way around it) likes to randomly spam thousands of DNS requests for literally every network endpoint at random times" It seems unusual that a corporate VPN would send DNS queries to a local DNS server, and not send the DNS through the VPN. Consider discussing this with corporate IT and moving the DNS traffic to the tunnel. |
I agree - this has happened before, but our network team could not figure out what was going on. We have split tunneling enabled for our VPN client, so internet traffic goes through our local connection but internal goes through the VPN, and in this case, the DNS queries start 'bleeding' to local, causing the massive spike in queries. This morning there were over 260k requests after I removed rate limiting, so now I will re-enable and hope it doesn't happen again - but still, would be nice if there was a GUI option! If you're curious, we are using Global Protect by Palo Alto. |
Apparently |
Not sure why we abandoned #1468 but maybe we could revive it at some point when all the other PRs are merged. |
Bumping - I recently ran into the the rate limit doing nothing special , and nothing misbehaving particularly - I mentioned this to a friend who also uses a pi-hole and on closer inspection he realized he was also hitting the limit. I suggest bumping up the default by 50% until #1468 can be implemented. I suspect this affecting may users without them realizing. |
By submitting this pull request, I confirm the following:
How familiar are you with the codebase?:
10
Add per-client rate-limiting. Rate-limited queries are answered with a
REFUSED
reply and not further processed by FTL Even when they are logged inpihole.log
, they will not contribute to the overall statistics nor enter the Query Log or the database.This serves the purpose of a real rate-limit and ensures that abnormally behaving clients hammering FTL with thousands of queries per second cannot lead to a denial-of-service failure.
Rate-limiting is very customizable, it defaults to allowing not more than 1000 queries in 60 seconds. Both numbers can be changed by the user.
It is important to note that rate-limiting is happening on a per-client basis. Other clients can continue to use FTL while rate-limited clients are short-circuited at the same time.
Rate-limiting can be disabled by setting
RATE_LIMIT=0/0
.One might argue that rate-limiting should best be realized with a firewall. However, we do not want to touch user firewalls and this effectively does the same thing (albeit better because we don't simply drop but reply with a proper
REFUSED
message).