Handling the increase in LLM/bot Spam posts from new users 🤖

grace · August 1, 2025, 7:18pm

Hi @beryl @dkayiwa @ibacher @jayasanka @suubi7 @wodpachua, (posting this publicly so other folks worried about forum spam can see what we’re doing/thinking)

I think we’ve all noticed an increase in spam-looking talk posts that aren’t caught by the usual spam-filter mechanisms, e.g.:

Suspiciously Half-Legit: Accounts with posts that look real, but the actual accounts look suspiciously bot-generated, to build trust first (seemingly normal account, AI-generated photo, make a few harmless, decent posts to build credibility) - the concern is, they’ll wait until mods stop paying close attention, then try to post disguised spam, send spammy DMs, and even try to add malicious/spammy links. Example here / here, and here.
LLM-Looking Blah-Blah: New posts from new users who look like they’re either LLM-generated cruft, or could be well-intentioned very-newbies who are trying to build their OpenMRS portfolio pre-GSOC-2026. Example here and here.

We knew this day was coming due to the explosion of AI/LLM tooling. (I’m surprised it took this long TBH.)

Proposed Strategy to Deal with These Types of Posts

My proposal:
- Keep deleting/silencing/suspending accounts with obvious spam (thank you to Daniel and Ian who are usually so on top of this that by the time I see the report they’ve already handled it)
- For ones where we’re unsure, we can Lock the Trust Level to 0 in the User Admin settings. This seems like the only way we can “flag” accounts without completely silencing or suspending them.

Questions

Any other ideas for ways to “orange-flag” accounts?
Do you think we should just be more aggressive? Maybe I’m giving accounts more benefit-of-the-doubt than I should.

ibacher · August 1, 2025, 8:14pm

I’ve got mixed feeling here. I’m generally pretty lenient towards the type of posts you’ve labelled “Suspiciously Half-Legit”. We’ve had posts like that that don’t seem completely like spam in the past (and I even moved that suspicious job thing to the job board, which was probably not the right call). For those, I’m inclined to take a “wait-and-see” approach. After all, we aim to be a global community, even if this forum is primarily in English and I understand that non-native speakers may find it helpful to use AI to help them post on a forum like this (though I would ask people not include their name in post titles, as I don’t think it helps anyone).

For posts that appear to entirely consist of LLM generated content, I have long been in favour of nuking them (and I have done so for those linked; for those curious, what they look like, open up any LLM chat application and ask it a question about OpenMRS; the output will be basically the same). However, I usually don’t ban accounts that post such things unless they spam multiple messages.

I think the other thing we can do, in addition to your great suggestions here is kindly ask the members of our community to please hit the flag button for anything they feel detracts from the experience of the forum. It should be right next to the little “link” button under each post:

Post Buttons

When you flag a post, you can indicate why you’re doing so and even write a little message. While doing so won’t guarantee that we’ll take a post down (anyone who moderates this forum should do so with their best judgement!), I am certainly more inclined to take aggressive action (including blocking accounts) on posts that both I think are very low-value and that another member of the community believes to be so.

I’d also like to give a shout-out to @mksd who I frequently see handle flags.

grace · August 1, 2025, 9:39pm

Thank you Ian - great point re. encouraging folks to use the flag.

(And, thank you @mksd!! )

Jayasanka pointed out to me today that the Name-in-Title thing is, in his experience, usually a sign it’s a bot.

backloguy · August 2, 2025, 4:07am

As a non-native English speaker, I often rely on AI tools to help me express myself more clearly and ensure that my intentions are properly understood. I believe AI can be a powerful tool—like a double-edged sword. On one side, it can greatly enhance productivity and support positive contributions. On the other, it can be misused to cause harm.

Using AI to generate spam or disrupt communities should not be tolerated. It’s important that we promote responsible use of these tools so that they empower rather than undermine open, respectful collaboration.

Maybe we could implement something like CAPTCHA during signup or even for posting the first few times to verify the user is human. Combined with rate limits and strong moderation, this could help reduce misuse while still supporting responsible AI use for clarity and communication.

example CAPTCHA from steam:-

backloguy · August 2, 2025, 5:41am

I tried the current CAPTCHA system myself, and it seems to be the basic “I am not a robot” checkbox. From what I’ve read, a lot of modern bots can bypass that quite easily. So I looked into it a bit and wanted to share a few ideas that could complement the existing systems like trust levels and manual flagging:

Method	Benefit
Harder CAPTCHA (e.g., hCaptcha, image-based puzzles)	Slows down low-effort bots
Phone Verification	Adds friction to account creation and deters automated signups
Spam Detection (like Akismet)	Flags or blocks posts with known spam patterns or links
Honeypots	Hidden form fields that bots tend to fill out but real users don’t
Behavioral Analysis	Detects unusual activity patterns (e.g., mass posting, rapid link sharing)

grace · August 4, 2025, 3:02pm

Harder Captcha is a good idea. Looks like Discourse just recently added core support for hCaptcha. Discourse hCaptcha - Plugin - Discourse Meta

And 19 days ago they added it into discourse core. @burke I think this may have been included in the recent discourse default-plug-ins you added like last week, because I see it here - pre-installed, just not enabled. Would you be up for configuring/setting up hCaptcha?

ibacher · August 4, 2025, 4:17pm

@grace I think, as @backloguy indicated, we’d need to do that on our Keycloak since no one logs in through Discourse directly. That does seem is doable, though.

jayasanka · August 12, 2025, 2:33pm

We’ve already got reCAPTCHA integrated into Keycloak. While it looks like just the “I’m not a robot” tick, it’s actually doing quite a bit under the hood. It checks dozens of factors, things like mouse movement, click timing, browser history, IP reputation, and even how you’ve interacted with the page before clicking. Based on that, it decides whether to let you through instantly or show you an extra image challenge. So even though it seems simple on the surface, it’s pretty good at filtering out a lot of bot traffic already.

backloguy · August 12, 2025, 3:53pm

Wow, that’s really insightful — though it seems they might still have some limitations, given that we’re still discussing them here. Thank you for sharing your knowledge, @jayasanka.