Now that we know AI bots will ignore robots.txt and churn residential IP addresses to scrape websites, does anyone know of a method to block them that doesn’t entail handing over your website to Cloudflare?
Now that we know AI bots will ignore robots.txt and churn residential IP addresses to scrape websites, does anyone know of a method to block them that doesn’t entail handing over your website to Cloudflare?
If I’m reading your link right, they are using user agents. Granted there’s a lot. Maybe you could whitelist user agents you approve of? Or one of the commenters had a list that you could block. Nginx would be able to handle that.
They just Fake User Agents If you Block them
Thank you for the reply, but at least one commenter claims they’ll impersonate Chrome UAs.
You can read more Here
https://pod.geraspora.de/posts/17342163
Except it’s not denying service, so it’s just a D.
In the hackernews comments for that geraspora link people discussed websites shutting down due to hosting costs, which may be attributed in part to the overly aggressive crawling. So maybe it’s just a different form of DDOS than we’re used to.