-
-
Notifications
You must be signed in to change notification settings - Fork 208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve insertion of clients, domains and DNS cache records #2095
Conversation
Signed-off-by: DL6ER <[email protected]>
… to speep up insertion Signed-off-by: DL6ER <[email protected]>
Signed-off-by: DL6ER <[email protected]>
On which step would one see an improvement here? Looking between
development
this branch, first start
consecutive restart
|
How do you interpret those results? What does it mean no Clients recycled? What happens when the list is full?
|
Hmm, yes, this is interesting, I would have expected much more than:
I am unable to test on "real" slow hardware so I configured a Pi-hole v6.0 container to be "low-end" on services:
pi-hole-slow:
image: pihole/pihole:local-v6
[...]
deploy:
resources:
limits:
# maximum the container can get
cpus: '0.001'
memory: 100M
reservations:
# minimum that needs to be guaranteed for the container
cpus: '0.0001'
memory: 50M and saw improvements by a factor 3-4x. Your numbers above show something close to a 2x, at least when comparing the last with the
There are two possible causes here:
In both cases, the code to create new clients will immediately append the new clients at the end of the list without wasting time to search possible gaps. The situation is the same for the
There are 113 domains that have been recycled. Whenever the next new domain is created, FTL takes the last ID from this list and can create the domain at this location. No need to search for an empty spot. As the binary search algorithm ensures monotonicity (strict ordering) in the lookup table, domains can equally fast be found regardless if they are created at the beginning, somewhere in the middle or towards the end of the list.
It means that there are no clients that have not been seen for consecutive 24 hours and, hence, have been recycled. This is rather unlikely to happen in small home networks with always the same clients. You will much more likely to see this with domains of pages you have visited yesterday but not ever since again.
In this case, the list stops to be filled with new IDs. Yes, it means we "forget" where some empty slots are and, if many new domains/clients/DNS cache records are created, this can eventually lead to "wasting" some memory. There are, in general, two obvious mitigations possible:
Both can also be combined. |
Signed-off-by: DL6ER <[email protected]>
How can the stats be empty, but it says it recycled some domains and cache entries just a few lines above?
|
There were two domains and five DNS cache records recycled:
as also seen in the summary:
Then, new queries arrived and created four new domains:
out of which the first two have reused exactly the two IDs recycled above (732 and 733), and also nine new DNS cache records:
out of which the first five have reused the recycled records from above and then four new ones are created.
You are seeing zero available recycled IDs here as all of them have already been used before you've sent |
Ok. I think I misunderstood how to interpret the logs |
What does this implement/fix?
Following up on #2084, this PR seeks to improve creation speed of new records (domains, clients, and DNS cache records). Previously, we used linear scanning whenever a new entry is to be added. Usually, there are some "holes" poked in the tables by preceding recycling activity but nothing guarantees us that (a) something has been recycled at all or (b) that such records are "early" in the table. Hence, linear searching through the tables usually scans the entire table just to find that there are no gaps and we have to create a new record at the very end of the table.
This PR improves this by adding a new shared memory object which memorizes if there are any recycled entries in these tables. If so, we use them right away without having to search for this. If there are none, we know that can immediately create a new entry at the end of the table.
In any case, we can avoid the (possibly lengthy) linear searching at all. Thanks for a list-with-known-length implementation, no memory copying/moving is ever required making this extremely fast.
Furthermore, this PR avoid creating DNS cache records when we don't actually need them during database importing.
Both changes improve the overall speed of FTL during DNS operation but are actually most notable during startup on low-end hardware as history importing gets roughly 5-20% faster depending on whats the limiting factor there on your device. If it is your CPU that is being slow, you may see a very notable impact. If, however, your disk/SDcard, etc., is slow, then this change may have much less of an impact for you.
Related issue or feature (if applicable): N/A
Pull request in docs with documentation (if applicable): N/A
By submitting this pull request, I confirm the following:
git rebase
)Checklist:
developmental
branch.