Ability to utilize StopForumSpam database in an offline way #7

matchboxbananasynergy · 2023-12-13T19:24:00Z

Feature Request

Is your feature request related to a problem? Please describe.

As far as I can tell, in order to be able to check new registrations against the StopForumSpam database, you currently have to send that sign-up information to a server and check. This poses a privacy issue and makes it a non-workable solution for our community (discuss.grapheneos.org).

Describe the solution you'd like

Would it be possible for the checks to be done in an offline manner where the database is downloaded and checked against instead of sending information to an online service?

Describe alternatives you've considered

The current way we do things is by banning specific keywords that are commonly used by spam accounts and manually getting rid of them, but being able to benefit from the database would be a tremendous help to our community.

imorland · 2023-12-14T09:01:48Z

Hello @matchboxbananasynergy and thank you for the request!

I absolutely agree with what you are saying here, and by the looks of it StopForumSpam do offer their data for download.

We would have to refactor the extension logic in order to work either online or offline, and setup a scheduled task in order to download updated data on a regular basis, but I think this would be achievable fairly easily.

I'm pretty tied up for the next few days at least, but I will certainly tackle this as soon as I have the time 👍

GreXXL · 2023-12-14T10:30:29Z

@matchboxbananasynergy thanks for sharing the propsal. I think it would be a great improvement. I did pick up that proposal (https://discuss.flarum.org/d/33802-improve-privacy-of-fofanti-spam-by-making-stopforumspam-calls-offline) to run along other improvements that have already been suggested to further improve anti-spam and combined them in a bounty (https://discuss.flarum.org/d/33803-big-fofanti-spam-improvement-bounty) to get them implemented more quickly.

thestinger · 2023-12-19T17:30:03Z

We're currently implementing this with nginx, and as part of implementing that we figured out some details about how this should be implemented. StopForumSpam has separate downloads for data from the past 1 day, 7 days, 30 days, 90 days, 180 days and 365 days. The merged IPv4 + IPv6 data is the most relevant and there isn't much reason to use the others. They have a list with the IPs and a full summary which has the number of reports for each, which is potentially useful and is likely how they calculate their scores but they don't really have the data required to calculate a good score since they have no data on registrations of non-spammers to figure out which % of registrations from a given IP may be spammers.

They allow updating most of the data 2 times daily, but the 1 day, 7 day and 30 day data is generated hourly. That means when configured to use longer than 1 day, the best approach is downloading 1 day hourly and merging it with the longer time period. This is now what we're doing for discuss.grapheneos.org.

They also have a list of toxic IP ranges which may as well always be blocked even if IPs haven't been seen recently.

Update script:

https://github.com/GrapheneOS/discuss.grapheneos.org/blob/405af15a069f2567c15f7c7bf71fed8f9ade7b08/stopforumspam-update

nginx geo configuration, which loads the data:

https://github.com/GrapheneOS/discuss.grapheneos.org/blob/405af15a069f2567c15f7c7bf71fed8f9ade7b08/nginx/nginx.conf#L143-L148

nginx location block configuration for /register:

https://github.com/GrapheneOS/discuss.grapheneos.org/blob/405af15a069f2567c15f7c7bf71fed8f9ade7b08/nginx/nginx.conf#L366-L368

Daily systemd unit setup:

https://github.com/GrapheneOS/discuss.grapheneos.org/blob/405af15a069f2567c15f7c7bf71fed8f9ade7b08/systemd/system/stopforumspam-update-daily.service
https://github.com/GrapheneOS/discuss.grapheneos.org/blob/405af15a069f2567c15f7c7bf71fed8f9ade7b08/systemd/system/stopforumspam-update-daily.timer

Hourly systemd unit setup:

https://github.com/GrapheneOS/discuss.grapheneos.org/blob/405af15a069f2567c15f7c7bf71fed8f9ade7b08/systemd/system/stopforumspam-update-hourly.service
https://github.com/GrapheneOS/discuss.grapheneos.org/blob/405af15a069f2567c15f7c7bf71fed8f9ade7b08/systemd/system/stopforumspam-update-hourly.service

Hopefully this is helpful in terms of the ideas on how to do this well.

In the past, before Flarum had per-email rate limiting for email confirmations and forgot password, we used to implement that via nginx. It would be possible to do username and emails bans via nginx in a similar way and we might do that. We lack experience with Flarum and PHP so it's hard for us to implement things for it and we prefer not having downstream modifications which could delay updates.

thestinger · 2023-12-19T17:32:03Z

This is where we removed our custom per-email rate limiting for email confirmation and forgot password which we implemented at the nginx layer:

GrapheneOS/discuss.grapheneos.org@974629b

The general approach is still useful for doing other similar limits. It's easier to do that way when under attacks compared to do it more cleanly/properly inside Flarum.

We might temporarily use a similar approach for using the username/email data from Stop Forum Spam.

thestinger · 2023-12-22T22:07:27Z

IPv6 data only has /64 blocks which I think is too broad, so I think it's worth considering only supporting IPv4 for people who are going to use Stop Forum Spam blocking. There will be too many false positives via IPv6. IPv4 will also have more and more false positives due to CGNAT so there aren't really any great answers.

matchboxbananasynergy added the enhancement New feature or request label Dec 13, 2023

matchboxbananasynergy changed the title ~~Ability to utilize abuse list in an offline way~~ Ability to utilize StopForumSpam database in an offline way Dec 13, 2023

imorland self-assigned this Dec 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ability to utilize StopForumSpam database in an offline way #7

Ability to utilize StopForumSpam database in an offline way #7

matchboxbananasynergy commented Dec 13, 2023 •

edited

Loading

imorland commented Dec 14, 2023

GreXXL commented Dec 14, 2023

thestinger commented Dec 19, 2023 •

edited

Loading

thestinger commented Dec 19, 2023

thestinger commented Dec 22, 2023

Ability to utilize StopForumSpam database in an offline way #7

Ability to utilize StopForumSpam database in an offline way #7

Comments

matchboxbananasynergy commented Dec 13, 2023 • edited Loading

Feature Request

imorland commented Dec 14, 2023

GreXXL commented Dec 14, 2023

thestinger commented Dec 19, 2023 • edited Loading

thestinger commented Dec 19, 2023

thestinger commented Dec 22, 2023

matchboxbananasynergy commented Dec 13, 2023 •

edited

Loading

thestinger commented Dec 19, 2023 •

edited

Loading