Let that be a lesson to me!

Earlier this month I was shocked and astounded to see that I'd used 1% of my bandwidth in 1 day. WTF? I don't get that much email and don't serve that many pages! Hmm… Now it's the 9th and I've only used 2%. OK… so what ahappened?

Wow, my personal web site has 1500 hits already. I migrated the site to the linode on 16th May and for that half of the month there was 800 hits total.

More research… Ah hah… 3 or 4 search engines decided to reindex my site at the beginning of the month, and I have 3 cgi scripts that generate 2300 links to data views…

I think I need a robots.txt file to prevent those cgi's from being indexed!

Let that be a lesson to me :-)

2 Replies

I've been told by reputable software engineers that the robots.txt file is ignored by most search engines's robot web-crawlers.

Search engines will ignore an improperly constructed, improperly located or otherwise unusable robots.txt file. All of Google, Bing, Duck Duck Go and Yahoo! will honor them (if they can).

As far as I can tell, the Russian (Yandex) and Chinese (Baidu) crawlers do not honor them. A good reason to put all their networks in blacklists…

The response from @normangdrum78 was spam. I reported it.

-- sw

Reply

Please enter an answer
Tips:

You can mention users to notify them: @username

You can use Markdown to format your question. For more examples see the Markdown Cheatsheet.

> I’m a blockquote.

I’m a blockquote.

[I'm a link] (https://www.google.com)

I'm a link

**I am bold** I am bold

*I am italicized* I am italicized

Community Code of Conduct