Keeping Disrespectful Robots Out of Metafilter. February 21, 2025 2:41 PM   Subscribe

In the recent thread regarding use of LLMs on AskMe, et al, Lanark reported on the current state of Metafilter's robots.txt file and its limitations:

Related to this, the metafilter robots.txt currently blocks GPTBot, but allows ChatGPT-User, Anthropic-ai, Applebot-Extended, Google-Extended, ClaudeBot, Cohere-ai, PerplexityBot and probably a bunch more AI scrapers.
To only block one seems a bit inconsistent.


--

It'd be lovely if updating robots.txt these days were effective, but my understanding is that there are many AI spiders crawling the web these days that utterly ignore robots.txt.

This can even go so far as to become a form of DOS attack.

Do we currently have a policy / measures in place to prevent Metafilter content from being scooped up by random AI webcrawlers? If not, can be put one together and get it in place? Have there been any performance problems imposed on metafilter by webcrawlers recently?

I'd like to suggest the use of nepenthes, iocaine or something similar.
posted by ursus_comiter to Feature Requests at 2:41 PM (2 comments total) 2 users marked this as a favorite

From the write up of Nepenthes and I suspect this may apply to other tarpits:

ANY SITE THIS SOFTWARE IS APPLIED TO WILL LIKELY DISAPPEAR FROM ALL SEARCH RESULTS.

Removing Metafilter from Google would perhaps not be the best for site health.
posted by deadwax at 1:28 AM on February 22 [1 favorite]


all we have to do is seed each comment with a logical paradox. no AI can resist thinking about them!
posted by mittens at 5:32 AM on February 22 [2 favorites]


« Older Interviewees wanted!

You are not logged in, either login or create an account to post comments