- cross-posted to:
- lemmy_integrations@lemmy.dbzer0.com
- cross-posted to:
- lemmy_integrations@lemmy.dbzer0.com
cross-posted from: https://lemmy.dbzer0.com/post/15112791
Hey y’all. I’ve been working on this little project ever since the recent spam wave started. This is a very basic Python automoderator bot which will monitor the comments and posts federated into your instance for specific regex instances and then automatically report, delete, ban etc.
The Bot setup is very simple, as you can just chuck its docker-compose entry into your existing lemmy one. You just need to fill in the relevant environment variables.
The bot works by constantly polling your incoming reports, posts and comments, and matching them against provided regex.
I wanted to keep things simple for admins, so the bot configuration happens via a simple PM syntax. The README goes into details on this. But you basically send a message like this to the Bot to add a new filter
threativore add comment filter: `trial period` reason: `Spam comment` action: `REMOVE` description: `Known spam string`
All bot controls work the same way. Eventually I want to add a UI to it.
The bot is built with collaboration in mind. So you can add more people to help you maintain your filters (even if they’re not admins), you can add users whose reports will be treated more seriously, and you can even mark users as “ham” (i.e. known not spammers) to prevent them ever being filtered.
This is just the very first release and I have a lot of ideas to improve it in the future. Here’s some stuff in my roadmap which should make the threativore a much more collaborative/crowdsourced process between multiple instance admins and the larger userbase. Stay tuned.
PRs and suggestion are welcome.
PS: The bot is already active on https://lemmy.dbzer0.com, so you can check the modlog for its actions.
Feel free to submit a PR for these ideas. For post similarity, ML learning techniques can be used to calculate the “distance” between two posts, but I don’t know if with an increasing amount of spam could work computation wise. Especially if spammers start using their own GenerativeAI engines.
That’s why I was suggesting such a simple approach, it doesn’t require AI or machine learning except in the most basic sense. If you want to try applying fancier stuff you could use those basic word-based filters as a first pass to reduce the cost.
There’s likely a lot of anti spam tactics we can employ. I hope people will help improve it