Refer(r)er spam and visitor statistics

This is what the usage statistics (July 1-17) of a specialised, Pagerank 5 weblog on European politics look like after an attack of “referer spam“. Of the top 45 referers*, only 2 are genuine: and

Refer(r)er spam
Referer spamming means: making page requests for a target site using a fake referer url pointing to a spamsite with the only purpose to promote this spamsite into the list of referring websites.
Some sites, especially blogs, publish a list of referring websites, and by targeting thousands of websites and blogs, the spammer hopes to get visibility on at least a few of them.
Note: in the contect of a referring web page, the misspelled “referer” is often used instead of “referrer”.

Reports based on the webserver logs, such as produced by Webalizer or AWstats, become harder to interpret. Referer spam is not the only issue: the number of machines hitting your webserver is growing faster than the number of human visitors: the multitude of feed-polling rss clients and rss-indexers, search bots/spiders and even shameless content harvesters.

A good log analyzer will let you distinguish human “visitors” (producing a click trail from consecutive page views or at least the succession of page load and image loads) from isolated page views (that’s what most bona fide search bots produce). You also might enter the arms race with referer spammers by blacklisting them in your log configuration.

However, there’s an another way to distinguish machines from humans: with a combination of javascript and dynamically generated images instead of logging the served pages. Most bots/harvesters/spammers don’t download images or interpret javascript, (almost all) browsers used by humans (normally) do. (Several statcounter services are based on that idea, but you’ll have to put at least a logo and link on the webpages you want to track, and the free editions of these services are limited in time or detail). More explanation in the following post on phpOpenTracker on how to set up your own alternative tracking system running at your own host.

Webalizer usage statistics savaged by referer spam

2 Responses to “Refer(r)er spam and visitor statistics”

  1. WP-ShortStat plugin broken after upgrading to Wordpress 2.0.2 Says:

    […] It’s a very quick solution if you want to diagnose a blog problem and need to monitor all behaviour on your website (searchbots, feedtraffic and browsers).  But if you want really meaningful information on human visitors, you’ll have to switch to a webbug-based system (see these postings on refer spam and phpopentracker)  […]

  2. Casey Says:

    I DONOT like WebAlizer.
    Main reasons:
    1) Not very correct stats
    2) Refspam through webalizer logs

    Refspam is popular in my country, and in case they make it more often the site with WebAlizer may me ddosed.

    Thats what i think