Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to utilize StopForumSpam database in an offline way #7

Open
matchboxbananasynergy opened this issue Dec 13, 2023 · 5 comments
Assignees
Labels
enhancement New feature or request

Comments

@matchboxbananasynergy
Copy link

matchboxbananasynergy commented Dec 13, 2023

Feature Request

Is your feature request related to a problem? Please describe.

As far as I can tell, in order to be able to check new registrations against the StopForumSpam database, you currently have to send that sign-up information to a server and check. This poses a privacy issue and makes it a non-workable solution for our community (discuss.grapheneos.org).

Describe the solution you'd like

Would it be possible for the checks to be done in an offline manner where the database is downloaded and checked against instead of sending information to an online service?

Describe alternatives you've considered

The current way we do things is by banning specific keywords that are commonly used by spam accounts and manually getting rid of them, but being able to benefit from the database would be a tremendous help to our community.

@matchboxbananasynergy matchboxbananasynergy added the enhancement New feature or request label Dec 13, 2023
@matchboxbananasynergy matchboxbananasynergy changed the title Ability to utilize abuse list in an offline way Ability to utilize StopForumSpam database in an offline way Dec 13, 2023
@imorland
Copy link
Member

Hello @matchboxbananasynergy and thank you for the request!

I absolutely agree with what you are saying here, and by the looks of it StopForumSpam do offer their data for download.

We would have to refactor the extension logic in order to work either online or offline, and setup a scheduled task in order to download updated data on a regular basis, but I think this would be achievable fairly easily.

I'm pretty tied up for the next few days at least, but I will certainly tackle this as soon as I have the time 👍

@imorland imorland self-assigned this Dec 14, 2023
@GreXXL
Copy link

GreXXL commented Dec 14, 2023

@matchboxbananasynergy thanks for sharing the propsal. I think it would be a great improvement. I did pick up that proposal (https://discuss.flarum.org/d/33802-improve-privacy-of-fofanti-spam-by-making-stopforumspam-calls-offline) to run along other improvements that have already been suggested to further improve anti-spam and combined them in a bounty (https://discuss.flarum.org/d/33803-big-fofanti-spam-improvement-bounty) to get them implemented more quickly.

@thestinger
Copy link

thestinger commented Dec 19, 2023

We're currently implementing this with nginx, and as part of implementing that we figured out some details about how this should be implemented. StopForumSpam has separate downloads for data from the past 1 day, 7 days, 30 days, 90 days, 180 days and 365 days. The merged IPv4 + IPv6 data is the most relevant and there isn't much reason to use the others. They have a list with the IPs and a full summary which has the number of reports for each, which is potentially useful and is likely how they calculate their scores but they don't really have the data required to calculate a good score since they have no data on registrations of non-spammers to figure out which % of registrations from a given IP may be spammers.

They allow updating most of the data 2 times daily, but the 1 day, 7 day and 30 day data is generated hourly. That means when configured to use longer than 1 day, the best approach is downloading 1 day hourly and merging it with the longer time period. This is now what we're doing for discuss.grapheneos.org.

They also have a list of toxic IP ranges which may as well always be blocked even if IPs haven't been seen recently.

Update script:

https://github.com/GrapheneOS/discuss.grapheneos.org/blob/405af15a069f2567c15f7c7bf71fed8f9ade7b08/stopforumspam-update

nginx geo configuration, which loads the data:

https://github.com/GrapheneOS/discuss.grapheneos.org/blob/405af15a069f2567c15f7c7bf71fed8f9ade7b08/nginx/nginx.conf#L143-L148

nginx location block configuration for /register:

https://github.com/GrapheneOS/discuss.grapheneos.org/blob/405af15a069f2567c15f7c7bf71fed8f9ade7b08/nginx/nginx.conf#L366-L368

Daily systemd unit setup:

https://github.com/GrapheneOS/discuss.grapheneos.org/blob/405af15a069f2567c15f7c7bf71fed8f9ade7b08/systemd/system/stopforumspam-update-daily.service
https://github.com/GrapheneOS/discuss.grapheneos.org/blob/405af15a069f2567c15f7c7bf71fed8f9ade7b08/systemd/system/stopforumspam-update-daily.timer

Hourly systemd unit setup:

https://github.com/GrapheneOS/discuss.grapheneos.org/blob/405af15a069f2567c15f7c7bf71fed8f9ade7b08/systemd/system/stopforumspam-update-hourly.service
https://github.com/GrapheneOS/discuss.grapheneos.org/blob/405af15a069f2567c15f7c7bf71fed8f9ade7b08/systemd/system/stopforumspam-update-hourly.service

Hopefully this is helpful in terms of the ideas on how to do this well.

In the past, before Flarum had per-email rate limiting for email confirmations and forgot password, we used to implement that via nginx. It would be possible to do username and emails bans via nginx in a similar way and we might do that. We lack experience with Flarum and PHP so it's hard for us to implement things for it and we prefer not having downstream modifications which could delay updates.

@thestinger
Copy link

This is where we removed our custom per-email rate limiting for email confirmation and forgot password which we implemented at the nginx layer:

GrapheneOS/discuss.grapheneos.org@974629b

The general approach is still useful for doing other similar limits. It's easier to do that way when under attacks compared to do it more cleanly/properly inside Flarum.

We might temporarily use a similar approach for using the username/email data from Stop Forum Spam.

@thestinger
Copy link

IPv6 data only has /64 blocks which I think is too broad, so I think it's worth considering only supporting IPv4 for people who are going to use Stop Forum Spam blocking. There will be too many false positives via IPv6. IPv4 will also have more and more false positives due to CGNAT so there aren't really any great answers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants