-
-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ability to utilize StopForumSpam database in an offline way #7
Comments
Hello @matchboxbananasynergy and thank you for the request! I absolutely agree with what you are saying here, and by the looks of it StopForumSpam do offer their data for download. We would have to refactor the extension logic in order to work either online or offline, and setup a scheduled task in order to download updated data on a regular basis, but I think this would be achievable fairly easily. I'm pretty tied up for the next few days at least, but I will certainly tackle this as soon as I have the time 👍 |
@matchboxbananasynergy thanks for sharing the propsal. I think it would be a great improvement. I did pick up that proposal (https://discuss.flarum.org/d/33802-improve-privacy-of-fofanti-spam-by-making-stopforumspam-calls-offline) to run along other improvements that have already been suggested to further improve anti-spam and combined them in a bounty (https://discuss.flarum.org/d/33803-big-fofanti-spam-improvement-bounty) to get them implemented more quickly. |
We're currently implementing this with nginx, and as part of implementing that we figured out some details about how this should be implemented. StopForumSpam has separate downloads for data from the past 1 day, 7 days, 30 days, 90 days, 180 days and 365 days. The merged IPv4 + IPv6 data is the most relevant and there isn't much reason to use the others. They have a list with the IPs and a full summary which has the number of reports for each, which is potentially useful and is likely how they calculate their scores but they don't really have the data required to calculate a good score since they have no data on registrations of non-spammers to figure out which % of registrations from a given IP may be spammers. They allow updating most of the data 2 times daily, but the 1 day, 7 day and 30 day data is generated hourly. That means when configured to use longer than 1 day, the best approach is downloading 1 day hourly and merging it with the longer time period. This is now what we're doing for discuss.grapheneos.org. They also have a list of toxic IP ranges which may as well always be blocked even if IPs haven't been seen recently. Update script: nginx geo configuration, which loads the data: nginx location block configuration for Daily systemd unit setup: https://github.com/GrapheneOS/discuss.grapheneos.org/blob/405af15a069f2567c15f7c7bf71fed8f9ade7b08/systemd/system/stopforumspam-update-daily.service Hourly systemd unit setup: https://github.com/GrapheneOS/discuss.grapheneos.org/blob/405af15a069f2567c15f7c7bf71fed8f9ade7b08/systemd/system/stopforumspam-update-hourly.service Hopefully this is helpful in terms of the ideas on how to do this well. In the past, before Flarum had per-email rate limiting for email confirmations and forgot password, we used to implement that via nginx. It would be possible to do username and emails bans via nginx in a similar way and we might do that. We lack experience with Flarum and PHP so it's hard for us to implement things for it and we prefer not having downstream modifications which could delay updates. |
This is where we removed our custom per-email rate limiting for email confirmation and forgot password which we implemented at the nginx layer: GrapheneOS/discuss.grapheneos.org@974629b The general approach is still useful for doing other similar limits. It's easier to do that way when under attacks compared to do it more cleanly/properly inside Flarum. We might temporarily use a similar approach for using the username/email data from Stop Forum Spam. |
IPv6 data only has /64 blocks which I think is too broad, so I think it's worth considering only supporting IPv4 for people who are going to use Stop Forum Spam blocking. There will be too many false positives via IPv6. IPv4 will also have more and more false positives due to CGNAT so there aren't really any great answers. |
Feature Request
Is your feature request related to a problem? Please describe.
As far as I can tell, in order to be able to check new registrations against the StopForumSpam database, you currently have to send that sign-up information to a server and check. This poses a privacy issue and makes it a non-workable solution for our community (discuss.grapheneos.org).
Describe the solution you'd like
Would it be possible for the checks to be done in an offline manner where the database is downloaded and checked against instead of sending information to an online service?
Describe alternatives you've considered
The current way we do things is by banning specific keywords that are commonly used by spam accounts and manually getting rid of them, but being able to benefit from the database would be a tremendous help to our community.
The text was updated successfully, but these errors were encountered: