DDoS from Anthropic AI

Going by Twitter, I'm not the only one who's seeing a huge amount of bot traffic coming from AI company Anthropic. Their bot ("ClaudeBot") doesn't respect robots.txt or ai.txt and they appear to be using hundreds of Amazon AWS servers to scape content, enough to overwhelm my servers last night.

Just in case anyone else is having issues and wants to block them at the firewall level, these are the ranges the company is using based on my server logs:

3.12.0.0/16
3.14.0.0/15
3.20.0.0/14
3.128.0.0/15
3.132.0.0/14
3.136.0.0/13
3.144.0.0/13
13.58.0.0/15
18.116.0.0/14
18.188.0.0/16
18.189.0.0/16
18.190.0.0/16
18.191.0.0/16
18.216.0.0/14
18.220.0.0/14
18.224.0.0/14
52.14.0.0/16

5 Replies

Although it can definitely be frustrating to deal with performance issues caused by bad actors, this seems like the perfect use case for our Cloud Firewall. Since it is external to your Linode, the firewall rules/processing are handled by our system instead of using an individual Linode's finite virtual resources.

Since you've already identified the source IP range, those subnets can easily be added as block rules to prevent that traffic from reaching your Linode. For more information about configuration and applying Cloud Firewalls to existing Linodes, be sure to check out the following Docs guides:

Otherwise, although it doesn't apply in this instance, if you ever detect malicious activity originating from other Linode, please be sure to submit a report through our Abuse Portal. This helps streamline our Trust & Safety Team's ability to review activty and take action as-needed to help protect customers on our platform and users of non-Linode systems worldwide:

There's some more background on this here -- Amazon have poured money into the company and seemingly this is being used to spin up hundreds of AWS servers to start grabbing content from websites.
https://www.theguardian.com/technology/2024/mar/27/anthropic-amazon-ai-startup

In the end, I saw around 700 different IP addresses hammering my websites over a couple of hours, all with a user agent string containing "ClaudeBot/1.0" and within the ranges above. There are some suggestions online to use robots.txt file to block the bot, but due to the URLs requested I can confirm that it ignores anything in the file.

Assuming you don't want a multi-billion dollar company stealing your content and giving you nothing back in return, it's definitely worth looking at options to configure your web server software to block requests based on the user agent string. I've combined that with fail2ban to generate a firewall level block for the each IP address used by the bot.

I'm having the same issue. Because it's a DDoS it's not easy to block. Currently finding a way to block via user agent or some other string

I added this to .htaccess

<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^.*ClaudeBot.*$
RewriteCond %{REQUEST_URI} !robots\.txt
RewriteRule .* http://www.anthropic.com [R,L]
</IfModule>

My setup is a little more complex and uses a combination of nginx and fail2ban.

For nginx, I've got a .conf file containing "bad bots" which is loaded as part of the overall web server config:

map $http_user_agent $blocked_bots {
  default 0;
  "~Claude-Web" 1;
  "~ClaudeBot" 1;
}

…you can add extra lines to block other bots based on whatever is unique in their user agent string. Within individual server { ... } sections, you can then choose to do something if the requester is a bad bot, e.g. to return no response at all to the request:

if ( $blocked_bots = 1 ) { return 444; }

As an alternative to returning a 444, you could return a 403 response to let the bot know that you're forbidding access to the requested URL.

You can then create a new fail2ban jail that monitors the nginx error log for requests that result in a 444 response and then block the IP at the server firewall level for a specific period of time. You could either create a regex that matches the response code (in this case 444 if you wanted to block all bad bots) or a regex that just matches specific user agent strings.

The benefit of the above setup is that the bot only gets the one chance to make a request before its IP address is blocked on the firewall.

The following single line command is a useful way of monitoring which IP addresses are currently blocked by fail2ban:

fail2ban-client status | grep "Jail list:" | sed "s/ //g" | awk '{split($2,a,",");for(i in a) system("fail2ban-client status " a[i])}' | grep "Status\|IP list"

Reply

Please enter an answer
Tips:

You can mention users to notify them: @username

You can use Markdown to format your question. For more examples see the Markdown Cheatsheet.

> I’m a blockquote.

I’m a blockquote.

[I'm a link] (https://www.google.com)

I'm a link

**I am bold** I am bold

*I am italicized* I am italicized

Community Code of Conduct