Seeing a ton of Apache requests that don't belong to me

Over the past 24 hours, I've been seeing a ton of requests on my linode that are not for my site; they have urls like:

http://www.puboclic.com/cpm.php?eid=1374110870&tbn=1

http://tag.contextweb.com/TagPublish/Ge … n-us,en-us">http://tag.contextweb.com/TagPublish/GetAd.aspx?tagver=1&ca=VIEWAD&cp=540626&ct=110603&cn=1&epid=&esid=&cf=300X250&rq=1&dw=1413&cwu=http%3A%2F%2Fwww.economying.com%2F&cwr=http%3A%2F%2Fwww.bing.com%2Fsearch%3Fq%3Dpayment%2Bgateway%2Bmerchant%2Baccount%26form%3DMSNH14%26qs%3Dn%26x%3D136%26y%3D13%26first%3D141%26FORM%3DPORE&mrnd=27084348&if=0&tl=1&pxy=60,998&cxy=1413,1013&dxy=1413,1013&tz=480&ln=en-us,en-us,en-us

etc

I'm also seening requests for bing.com, yahoo.com, a bunch of sites that I obviously do not host.

Overall, I'm seeing about 4,000 - 6,000 requests per minute.

These are all going to the 'default' vhost in my Apache configuration, which I have now set to 'deny from all'. However, quite a few requests still appear to be being replied to with an HTTP status code of 200! My vhost config contains:

Options FollowSymLinks

AllowOverride None

deny from all

My questions are:

  • Why are there still requests coming through that are not 403's?

  • I assume these requests are coming to my server due to a DNS misconfig or a deliberate attack; is there anything better that I could be doing to deny these requests / recede their occurrence?

1 Reply

If the traffic contains "GET /robots.txt HTTP/1.1" in the request, then those are from the search engine crawlers trying to index the content of your site.

http://en.wikipedia.org/wiki/Robots.txt

or

http://www.robotstxt.org/robotstxt.html for more details on what to do with it.

If the traffic contains "GET /favicon.ico HTTP/1.1", then that is IE and other browser looking for the icon that can be placed next to the URL when bookmarking/adding your site to its favorites

http://en.wikipedia.org/wiki/Favicon

Unfortunately you will always get other "background noise" from worms looking for vulnerable software. [for me recently, there seems to be an increase in scans for vulnerable phpmyadmin sites]

Just make sure that all your admin sites are secured/restricted/firewalled and no easily guessable passwords. A friend once had a script "locate" his admin pages within 1 hr of starting a config on a site (before he had a chance to change the default credentials)

I last calculated that the backgorund noise hit was ~ 4% of my bandwidth.

Reply

Please enter an answer
Tips:

You can mention users to notify them: @username

You can use Markdown to format your question. For more examples see the Markdown Cheatsheet.

> I’m a blockquote.

I’m a blockquote.

[I'm a link] (https://www.google.com)

I'm a link

**I am bold** I am bold

*I am italicized* I am italicized

Community Code of Conduct