Keeping Disrespectful Robots Out of Metafilter. February 21, 2025 2:41 PM Subscribe
In the recent thread regarding use of LLMs on AskMe, et al, Lanark reported on the current state of Metafilter's robots.txt file and its limitations:
Related to this, the metafilter robots.txt currently blocks GPTBot, but allows ChatGPT-User, Anthropic-ai, Applebot-Extended, Google-Extended, ClaudeBot, Cohere-ai, PerplexityBot and probably a bunch more AI scrapers.
To only block one seems a bit inconsistent.
--
It'd be lovely if updating robots.txt these days were effective, but my understanding is that there are many AI spiders crawling the web these days that utterly ignore robots.txt.
This can even go so far as to become a form of DOS attack.
Do we currently have a policy / measures in place to prevent Metafilter content from being scooped up by random AI webcrawlers? If not, can be put one together and get it in place? Have there been any performance problems imposed on metafilter by webcrawlers recently?
I'd like to suggest the use of nepenthes, iocaine or something similar.
Related to this, the metafilter robots.txt currently blocks GPTBot, but allows ChatGPT-User, Anthropic-ai, Applebot-Extended, Google-Extended, ClaudeBot, Cohere-ai, PerplexityBot and probably a bunch more AI scrapers.
To only block one seems a bit inconsistent.
--
It'd be lovely if updating robots.txt these days were effective, but my understanding is that there are many AI spiders crawling the web these days that utterly ignore robots.txt.
This can even go so far as to become a form of DOS attack.
Do we currently have a policy / measures in place to prevent Metafilter content from being scooped up by random AI webcrawlers? If not, can be put one together and get it in place? Have there been any performance problems imposed on metafilter by webcrawlers recently?
I'd like to suggest the use of nepenthes, iocaine or something similar.
all we have to do is seed each comment with a logical paradox. no AI can resist thinking about them!
posted by mittens at 5:32 AM on February 22 [2 favorites]
posted by mittens at 5:32 AM on February 22 [2 favorites]
« Older Interviewees wanted!
You are not logged in, either login or create an account to post comments
ANY SITE THIS SOFTWARE IS APPLIED TO WILL LIKELY DISAPPEAR FROM ALL SEARCH RESULTS.
Removing Metafilter from Google would perhaps not be the best for site health.
posted by deadwax at 1:28 AM on February 22 [1 favorite]