<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Notes, links and conversation &#187; Spamblogs</title>
	<atom:link href="http://pascal.vanhecke.info/category/spamblogs/feed/" rel="self" type="application/rss+xml" />
	<link>http://pascal.vanhecke.info</link>
	<description></description>
	<lastBuildDate>Thu, 11 Aug 2011 14:38:56 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Kailash Nadh, Splogspot.com and Antisplog.net</title>
		<link>http://pascal.vanhecke.info/2005/09/27/kailash-nadh-splogspotcom-and-antisplognet/</link>
		<comments>http://pascal.vanhecke.info/2005/09/27/kailash-nadh-splogspotcom-and-antisplognet/#comments</comments>
		<pubDate>Mon, 26 Sep 2005 22:31:55 +0000</pubDate>
		<dc:creator>Pascal Van Hecke</dc:creator>
				<category><![CDATA[Conversation]]></category>
		<category><![CDATA[Spamblogs]]></category>

		<guid isPermaLink="false">http://pascal.vanhecke.info/2005/09/27/kailash-nadh-splogspotcom-and-antisplognet/</guid>
		<description><![CDATA[Kailash Nadh is probably someone we&#8217;ll hear from more. He&#8217;s 18, and author of the BoastMachine, a php/Mysql Blogging engine. The Boast machine is, though maybe a bit less sophisticated than WordPress, very nice and easy to install. It takes some ideas from bulletin boards: threaded comments, BBcode. Other projects of his are a news [...]<div class="tantan-getcomments"><a href="http://pascal.vanhecke.info/2005/09/27/kailash-nadh-splogspotcom-and-antisplognet/#comments"><img src="http://pascal.vanhecke.info/wp-content/plugins/tantan/get-comments.php?p=112" width="100" height="15" style="border:0;" /></a></div>]]></description>
			<content:encoded><![CDATA[<p>Kailash Nadh is probably someone we&#8217;ll hear from more.<br />
He&#8217;s <a href="http://kailashnadh.name/pages/about">18</a>, and author of the <a href="http://boastology.com/">BoastMachine</a>, a php/Mysql Blogging engine.  The Boast machine is, though maybe a bit less sophisticated than WordPress, very nice and easy to install.  It takes some ideas from bulletin boards: threaded comments, BBcode.  Other projects of his are a <a href="http://newzpile.com/pages/about">news aggregator</a> and a <a href="http://www.pingoat.com/">Pingoat</a>, a <a href="http://pingomatic.com/">Ping-o-matic</a> clone. <span id="more-112"></span></p>
<p>Some time ago, he wrote a <a href="http://www.kailashnadh.name/docs/spam_blog/spamblog_hypothesis.html">paper</a> on spamblogs, that he&#8217;s apparently been putting into practice since: hence <a href="http://splogspot.com/">Splogspot</a>, &#8220;The web&#8217;s only Spam search engine&#8221;, he <a href="http://boastology.com/blog/post/index/161/SplogSpotcom">announced</a> yesterday.  It&#8217;s a bit of a funny way to generate publicity for what should become a public blacklist  of splog, accessible via an api you can check any url against (api doesn&#8217;t yet function apparently, see my <a href="http://boastology.com/blog/post/index/161/SplogSpotcom#cmt">comments</a> at his blogpost).  The database has been fed by the 1 million blog pings he has received so far for Pingoat, of which only about 6% passed through his filter.</p>
<p>An analogous (and, as far as I can judge, more advanced) project is <a href="http://www.antisplog.net/">Antisplog.net</a>, featuring the simplest url-append api imaginable: this blog returns a <a href="http://www.antisplog.net/check/pascal.vanhecke.info">0</a> and is apparently not a spamblog.  They&#8217;ve actively <a href="http://www.phpmagazine.net/28_antisplog_network/archive/999_antisplog_index_feeded_with_2_milion_blogs_.html">spidered about two million blogs</a> to test the algorithm.  Judging by this <a href="http://aixtal.blogspot.com/2005/09/splogs-antisplognet-system.html">review</a> (a test run of a sample of 42 spam and normal blogs) it works both fast and relatively accurate.  To be continued!</p>
<h3 id="toc-update">Update</h3>
<p>The <a href="http://splogspot.com/pages/api_help">api help page</a> for Splogspot got fixed, you can test it by appending a url to &#8220;http://splogspot.com/api?url=&#8221;, this blog returns <a href="http://splogspot.com/api?url=http://pascal.vanhecke.info">false</a>.</p>
<div class="tantan-getcomments"><a href="http://pascal.vanhecke.info/2005/09/27/kailash-nadh-splogspotcom-and-antisplognet/#comments"><img src="http://pascal.vanhecke.info/wp-content/plugins/tantan/get-comments.php?p=112" width="100" height="15" style="border:0;" /></a></div>]]></content:encoded>
			<wfw:commentRss>http://pascal.vanhecke.info/2005/09/27/kailash-nadh-splogspotcom-and-antisplognet/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Google experiments with &#8220;Remove Result&#8221;  in personalised search</title>
		<link>http://pascal.vanhecke.info/2005/09/24/google-experiments-with-remove-result-and-report-spam-in-personalised-search/</link>
		<comments>http://pascal.vanhecke.info/2005/09/24/google-experiments-with-remove-result-and-report-spam-in-personalised-search/#comments</comments>
		<pubDate>Sat, 24 Sep 2005 14:25:44 +0000</pubDate>
		<dc:creator>Pascal Van Hecke</dc:creator>
				<category><![CDATA[Google]]></category>
		<category><![CDATA[Search]]></category>
		<category><![CDATA[Spamblogs]]></category>

		<guid isPermaLink="false">http://pascal.vanhecke.info/2005/09/24/google-experiments-with-remove-result-and-report-spam-in-personalised-search/</guid>
		<description><![CDATA[Google&#8217;s response to requests to block unwanted sites (the discussion on tagging spam blogs as an example). Full post at Google engineer Matt Cutts blog, but this is a summary: In a one-click action, you can remove a result for a specific search query, with an additional click-and-submit you remove the page or an entire [...]<div class="tantan-getcomments"><a href="http://pascal.vanhecke.info/2005/09/24/google-experiments-with-remove-result-and-report-spam-in-personalised-search/#comments"><img src="http://pascal.vanhecke.info/wp-content/plugins/tantan/get-comments.php?p=111" width="100" height="15" style="border:0;" /></a></div>]]></description>
			<content:encoded><![CDATA[<p>Google&#8217;s response to requests to block unwanted sites (the discussion on <a href="http://pascal.vanhecke.info/2005/06/01/hunting-spamblogs/#updatejuly13">tagging spam blogs</a> as an example). Full post at Google engineer <a href="http://www.mattcutts.com/blog/remove-result/">Matt Cutts blog</a>, but this is a summary:<span id="more-111"></span></p>
<ul>
<li>In a <a href="http://www.mattcutts.com/images/remove1.gif">one-click action</a>, you can remove a result for a specific search query, with an <a href="http://www.mattcutts.com/images/remove3.gif">additional click-and-submit</a> you remove the page or an entire domain for all future searches.</li>
<li>Requirement: Google&#8217;s <a href="http://www.google.com/psearch">personalised search</a>, for which you need to be logged in with your Google account. There&#8217;s no other way than an account in some way or another if you want to make your ban list persistent</li>
<li>Privacy concerns: you can switch of the &#8220;<a href="http://www.google.com/searchhistory/">search history</a>&#8221; feature of personalised search if you don&#8217;t want Google to record and store your search requests all day (which are, sort of, the footprints of your thoughts..)</li>
<li>For the time being, removing pages from you search results has only consequences, for your own future personalised searches, it is not a collaborative spam tagging effort yet. I assume Google will investigate whether this can be used as a collaborative spam tagging effort, but obviously they&#8217;ll have to tackle possible abuses first (such as blacklisting competitors etc&#8230;)</li>
</ul>
<div class="tantan-getcomments"><a href="http://pascal.vanhecke.info/2005/09/24/google-experiments-with-remove-result-and-report-spam-in-personalised-search/#comments"><img src="http://pascal.vanhecke.info/wp-content/plugins/tantan/get-comments.php?p=111" width="100" height="15" style="border:0;" /></a></div>]]></content:encoded>
			<wfw:commentRss>http://pascal.vanhecke.info/2005/09/24/google-experiments-with-remove-result-and-report-spam-in-personalised-search/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Hunting spamblogs continued: splogs and  splogreporter</title>
		<link>http://pascal.vanhecke.info/2005/08/22/hunting-spamblogs-continued-splogs-and-splogreporter/</link>
		<comments>http://pascal.vanhecke.info/2005/08/22/hunting-spamblogs-continued-splogs-and-splogreporter/#comments</comments>
		<pubDate>Mon, 22 Aug 2005 21:18:34 +0000</pubDate>
		<dc:creator>Pascal Van Hecke</dc:creator>
				<category><![CDATA[Spamblogs]]></category>
		<category><![CDATA[Technorati]]></category>
		<category><![CDATA[WebWatch]]></category>
		<category><![CDATA[blogsearch]]></category>
		<category><![CDATA[icerocket]]></category>
		<category><![CDATA[spam]]></category>
		<category><![CDATA[splog]]></category>
		<category><![CDATA[vigilantism]]></category>

		<guid isPermaLink="false">http://pascal.vanhecke.info/2005/08/22/hunting-spamblogs-continued-splogs-and-splogreporter/</guid>
		<description><![CDATA[&#8220;Spamblog&#8221; got rebranded as splog. Mark Cuban (from IceRocket, a worthy competitor for Technorati) suggested email-verification. &#8220;Somewhat Frank&#8221; came up with SplogReporter.com, a centralised public blacklist of spamblogs, similar to my (failed) suggestion of tagging spamblogs using services like furl or del.icio.us. I still don&#8217;t think it will work however. Apart from the fact that [...]<div class="tantan-getcomments"><a href="http://pascal.vanhecke.info/2005/08/22/hunting-spamblogs-continued-splogs-and-splogreporter/#comments"><img src="http://pascal.vanhecke.info/wp-content/plugins/tantan/get-comments.php?p=86" width="100" height="15" style="border:0;" /></a></div>]]></description>
			<content:encoded><![CDATA[<p>&#8220;Spamblog&#8221; got rebranded as <a href="http://en.wikipedia.org/wiki/Splog" rel = "tag">splog</a>.  Mark Cuban (from <a href="http://www.icerocket.com/">IceRocket</a>, a worthy competitor for <a href="http://www.technorati.com/">Technorati</a>) <a href="http://www.blogmaverick.com/entry/1234000870054492/">suggested email-verification</a>. &#8220;Somewhat Frank&#8221; <a href="http://www.somewhatfrank.com/2005/08/what_is_splog_a.html">came up with</a> <a href="http://www.splogreporter.com/">SplogReporter.com</a>, a centralised public blacklist of spamblogs, similar to my (failed) <a href="http://pascal.vanhecke.info/2005/06/01/hunting-spamblogs/">suggestion of tagging spamblogs</a> using services like furl or del.icio.us.  I still don&#8217;t think it will work however.<span id="more-86"></span></p>
<p>Apart from the fact that this will lead to just another arms race with black hat bots &#8220;<a href="http://seoblackhat.com/2005/08/22/splog-splogs/">flagging multiple random blogs as splogs</a>&#8220;, (a counterattack you &#8216;ll have to ward off with <a href="http://pascal.vanhecke.info/2005/06/01/hunting-spamblogs/#comment-22">some intelligent reputation algorithms</a>), I think the idea doesn&#8217;t work for the same reasons tagging didn&#8217;t work:
<ul>
<li>there is no <em>immediate reward</em> for the user denouncing a blog as splog, as long as blog search engines don&#8217;t integrate this in their interface and leave out the splogged results immediately (for that user at least)</li>
<li>Franks <a href="http://www.somewhatfrank.com/2005/08/what_is_splog_a.html">suggestion</a> to the blog search engines to have a link to Splogreporter is nice, but I’m afraid none of the services (Technorati, Feedster, Pubsub, Yahoo Myweb2.0,…) will be motivated to share its results though (the ability to effectively filter out spam being a major competitive advantage!).</li>
</ul>
<div class="tantan-getcomments"><a href="http://pascal.vanhecke.info/2005/08/22/hunting-spamblogs-continued-splogs-and-splogreporter/#comments"><img src="http://pascal.vanhecke.info/wp-content/plugins/tantan/get-comments.php?p=86" width="100" height="15" style="border:0;" /></a></div>]]></content:encoded>
			<wfw:commentRss>http://pascal.vanhecke.info/2005/08/22/hunting-spamblogs-continued-splogs-and-splogreporter/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Hunting spamblogs</title>
		<link>http://pascal.vanhecke.info/2005/06/01/hunting-spamblogs/</link>
		<comments>http://pascal.vanhecke.info/2005/06/01/hunting-spamblogs/#comments</comments>
		<pubDate>Tue, 31 May 2005 23:57:24 +0000</pubDate>
		<dc:creator>Pascal Van Hecke</dc:creator>
				<category><![CDATA[del.icio.us]]></category>
		<category><![CDATA[SocialSoftware]]></category>
		<category><![CDATA[Spamblogs]]></category>
		<category><![CDATA[Technorati]]></category>

		<guid isPermaLink="false">http://pascal.vanhecke.info/2005/05/20/hunting-spamblogs/</guid>
		<description><![CDATA[Several people have been complaining about Spamblogs and suggesting remedies: measures by weblog providers, such as Blogger (&#8220;Incorporated Subversion&#8221;) cyber vigilantism (&#8220;Simple Thoughts&#8221;) reputation-based filtering (Ross Mayfield, he links to an interview with a spammer btw!) search engine countermeasures (&#8220;Micro Persuasion&#8221;) It&#8217;s probably search engines that hold the key here. Spamblogs are annoying, not because [...]<div class="tantan-getcomments"><a href="http://pascal.vanhecke.info/2005/06/01/hunting-spamblogs/#comments"><img src="http://pascal.vanhecke.info/wp-content/plugins/tantan/get-comments.php?p=39" width="100" height="15" style="border:0;" /></a></div>]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.loiclemeur.com/english/2005/05/brands_blog_spa.html">Several</a> <a href="http://www.psfk.com/2005/03/could_spamblogs.html">people</a> have been complaining about <a href="http://pascal.vanhecke.info/2005/05/30/spamblogs/">Spamblogs</a> and suggesting remedies: </p>
<ul>
<li><a href="http://incsub.org/blog/?p=475">measures by weblog providers</a>, such as Blogger (&#8220;Incorporated Subversion&#8221;)</li>
<li><a href="http://blog.taragana.com/index.php/archive/new-horizons-in-spamming-aka/">cyber vigilantism</a> (&#8220;Simple Thoughts&#8221;)</li>
<li><a href="http://ross.typepad.com/blog/2005/04/persistent_spam.html">reputation-based filtering</a> (Ross Mayfield, he links to an interview with a spammer btw!)</li>
<li><a href="http://www.micropersuasion.com/2005/03/technorati_sees.html">search engine countermeasures</a> (&#8220;Micro Persuasion&#8221;)</li>
</ul>
<p>It&#8217;s probably search engines that hold the key here.<span id="more-39"></span>  Spamblogs are annoying, not because of their mere existence, but because you notice them: they show up in search results.  Technorati is very much aware of it and tries to do its best (<a href="http://www.sifry.com/alerts/archives/000298.html">David Sifry&#8217;s March report</a>): </p>
<blockquote><p>&#8220;Part of the growth of new weblogs (30,000 &#8211; 40,000 each day)  created each day is due to an increase in spam blogs &#8211; fake blogs that are&#8230;&#8221;</p></blockquote>
<blockquote><p>&#8220;&#8230;we feel that we&#8217;ve been able to capture and identify most of the spam out there, but one should note that there is definitely blog spam that we don&#8217;t catch&#8230;&#8221;</p></blockquote>
<h3 id="toc-why-not-lend-technorati-et-al-a-hand">Why not lend Technorati (et al) a hand&#8230;</h3>
<p>&#8230; and tag the spamblogs that slipped through the net.  If you get annoyed by a Spamblog showing up in your <a href="http://www.technorati.com/members/">Technorati Watchlist</a> (or <a href="http://www.googlealert.com/faqs.php#intro">Google Alert</a>, or any other <a href="http://libraryclips.blogsome.com/2005/05/20/rss-filter-and-re-mix/">search feed</a>), report it, tag it: &#8220;<a href="http://www.technorati.com/tag/spamblog">spamblog</a>&#8221; * !  You could use <a href="http://del.icio.us/">del.icio.us</a> or <a href="http://www.furl.net/">furl</a>, the services that are being <a href="http://www.technorati.com/help/tags.html">aggregated</a> by Technorati.  </p>
<p>Yes, I know, there are obvious objections.  If your furl/del.icio.us rss feeds are being republished, you increase the Spamblog&#8217;s exposure, and without <a href="http://blog.searchenginewatch.com/blog/050118-204728">rel=&#8217;nofollow&#8217;</a>, even its pagerank, exactly what the spammer is aiming for.  You might keep the &#8220;bad&#8221; links in another account, or another bookmarking service than you normal link collection.  </p>
<p>Still, hunting spamblogs by tagging them as spam is a tool users have available at this very moment.  And it is simple and easy to use**.  For the spambusters, using the spamblog tag rss feeds in (semi) automatic blacklisting is piece of cake.  And yes again, there&#8217;s the danger of having controversial, opinionated blogs ostracised by adversaries &#8211; but Technorati et al can easily outweigh spamtags against &#8220;affirmative&#8221; tags and  incoming links.  So <a href="http://www.furl.net/furled.jsp?topic=spamblog" rel = "nofollow">let&#8217;s</a> <a href="http://del.icio.us/tag/spamblog/"  rel = "nofollow">start</a>!</p>
<p>* <em>as suggested in my <a href="http://pascal.vanhecke.info/2005/05/30/spamblogs/">introductory post</a>, I would limit the tag &#8220;spamblog&#8221; to machine-generated blogs &#8211; and distinguish them from &#8220;fake&#8221; or &#8220;character&#8221; blogs&#8230;</em><br />
** <em>unlike e.g. the <a href="http://developers.technorati.com/wiki/VoteLinks">Votelinks</a> concept, that hasn&#8217;t taken off so far</em></p>
<h3 id="toc-update-20-00h-june-1">Update 20.00h, June 1</h3>
<p><em>Continuing on &#8220;outweighing spamtags against &#8216;affirmative&#8217; tags and  incoming links&#8221;</em>:  Pubsub, a service  similar to Technorati already has a system called &#8220;<a href="http://www.pubsub.com/linkranks_about.php">Linkranks</a>&#8221; they use to filter search feeds: see the discussion at: <a href="http://hyku.com/blog/archives/000251.html">hyku.com</a>.  Technorati probably has something similar.<br />
Without any doubt, blog search services like Pubsub and Technorati definitely should include in their own interfaces as well an easy way for their users to report spamblogs (like you can report spam in Yahoo of Gmail), but centralised, independent &#8220;blogspam reservoirs&#8221; such as <a href="http://del.icio.us/tag/spamblog"  rel = "nofollow">http://del.icio.us/tag/spamblog</a> or <a href="http://www.furl.net/furled.jsp?topic=spamblog"  rel = "nofollow">http://www.furl.net/furled.jsp?topic=spamblog</a> could help all of them and would definitely be a step forward from sending an email to feedback@, which is the procedure now.</p>
<h3 id="toc-update-june-25">Update June 25</h3>
<p>Adsense has an easyway now to report spamblogs that are running Adsense Ads.  From <a href="http://www.jensense.com/archives/2005/06/matt_cutts_anno.html">Jensense</a>:</p>
<blockquote><p>&#8220;If you see a site violating the AdSense terms, you can now file an anonymous spam report that will get to the quality team for checking. To file a report, you simply go to the page that is showing AdSense ads and click on the &#8220;Ads by Google&#8221; (or &#8220;Ads by Goooooogle&#8221;) link. In the form on the next page, include the term &#8220;spamreport&#8221; and put in a short reason about why you feel the site is violating the AdSense terms or policies. You can also enter your own email address, if you wish, then click submit.&#8221;</p></blockquote>
<h3 id="toc-update-july-13" id ="updatejuly13">Update July 13</h3>
<p>Apparently, the idea of building a public blacklist of spamblogs on del.icio.us or other bookmarking services hasn&#8217;t taken off.<br />
If I had given it more thought then, I should have seen why:  there is no incentive for denouncing a blog as spamblog as long as blog search engines don’t use it (a chicken and egg problem: they won&#8217;t do this before some substantial amount of data has been collected ).<br />
And even then, there&#8217;s no <em>immediate</em> reward for doing so.  What the user wants (as I suggested at <a href="http://johnaugust.com/archives/2005/annoying-trend-watch-technorati-spam-blogs">other</a> <a href="http://blog.forret.com/blog/2005/07/amy-cross-spamming-technorati.html">complaints</a> on spam blogs and technorati spam), is a simple simple “report as spam” button or link in the  (web/email/rss) interface itself (comparable with the email spam buttons in Gmail, Yahoo Mail, Hotmail etc…) so that annoying blogs or feeds are filtered from the resultset <em>immediately</em>.  Definitely something to look forward in a <a href="http://www.sifry.com/alerts/archives/000314.html">next Technorati release</a>, see the <a href="http://blogsurvey.backbonemedia.com/archives/2005/07/will_technorati.html">David Sifry&#8217;s comments on this blog search wish list</a>.<br />
So far for the idea of having a <em>public</em> blacklist, because I&#8217;m afraid none of the services (Technorati, Feedster, Pubsub, Yahoo Myweb2.0,&#8230;) will be motivated to share its results though (the ability to effectively filter out spam being a major competitive advantage!)&#8230;</p>
<div class="tantan-getcomments"><a href="http://pascal.vanhecke.info/2005/06/01/hunting-spamblogs/#comments"><img src="http://pascal.vanhecke.info/wp-content/plugins/tantan/get-comments.php?p=39" width="100" height="15" style="border:0;" /></a></div>]]></content:encoded>
			<wfw:commentRss>http://pascal.vanhecke.info/2005/06/01/hunting-spamblogs/feed/</wfw:commentRss>
		<slash:comments>19</slash:comments>
		</item>
		<item>
		<title>Spamblogs in Action</title>
		<link>http://pascal.vanhecke.info/2005/05/31/spamblogs-in-action/</link>
		<comments>http://pascal.vanhecke.info/2005/05/31/spamblogs-in-action/#comments</comments>
		<pubDate>Mon, 30 May 2005 23:26:51 +0000</pubDate>
		<dc:creator>Pascal Van Hecke</dc:creator>
				<category><![CDATA[Spamblogs]]></category>
		<category><![CDATA[Technorati]]></category>

		<guid isPermaLink="false">http://pascal.vanhecke.info/2005/05/30/spamblogs-in-action/</guid>
		<description><![CDATA[While browsing through some recent greasemonkey blog buzz I came across this Dutch-language post on a blog (moretheterrier.co.uk / giftbasket/) that seemed to be entirely in English further on. Its content had been grabbed from (the Feedburner RSS feed containing) the original posting at Hans Mestrum&#8217;s weblog. I was really wondering how the entries for [...]<div class="tantan-getcomments"><a href="http://pascal.vanhecke.info/2005/05/31/spamblogs-in-action/#comments"><img src="http://pascal.vanhecke.info/wp-content/plugins/tantan/get-comments.php?p=41" width="100" height="15" style="border:0;" /></a></div>]]></description>
			<content:encoded><![CDATA[<p>While browsing through some  recent <a href="http://www.technorati.com/cosmos/search.html?sub=toolsearch&#038;url=greasemonkey">greasemonkey blog buzz</a> I came across this  <a href="http://moretheterrier.co.uk/giftbasket/?p=76" rel = "nofollow">Dutch-language post</a> on a blog (<a href="http://moretheterrier.co.uk/giftbasket/" rel = "nofollow">moretheterrier.co.uk / giftbasket/</a>) that seemed to be entirely in English further on.  Its content had been grabbed from (the Feedburner RSS feed containing)  the <a href="http://hmestrum.blogs.com/my_weblog/2005/05/bloggen_past_in.html">original posting</a> at <a href="http://hmestrum.blogs.com/my_weblog/">Hans Mestrum&#8217;s weblog</a>.  <span id="more-41"></span></p>
<p>I <a href="http://pascal.vanhecke.info/wp-content/upload_images/200530530_spamblog_technorati_giftbasket.png"><img src='http://pascal.vanhecke.info/wp-content/upload_images/thumb-200530530_spamblog_technorati_giftbasket.png' alt='Technorati search on giftbasket' align = 'right'/></a><a href="http://pascal.vanhecke.info/wp-content/upload_images/200530530_spamblog_gone_wild.png"><img src='http://pascal.vanhecke.info/wp-content/upload_images/thumb-200530530_spamblog_gone_wild.png' alt='thousands of cron job results on a blog' align = 'left'/></a> was really wondering how the entries for this apparently <a href="http://pascal.vanhecke.info/2005/05/30/spamblogs/">SpamBlog </a> were being collected, so I did a quick <a href="http://www.technorati.com/cosmos/search.html?rank=&#038;url=giftbasket">Technorati search on &#8220;giftbasket&#8221;</a> (right):<br />
The results were hilarious and revealed this (probably <a href="http://mu.wordpress.org/">multi-user</a>) WordPress blog at <a href="http://www.moretheterrier.co.uk/moretheterrier/" rel = "nofollow">www.moretheterrier.co.uk / moretheterrier/</a>, not only containing <a href="http://moretheterrier.co.uk/moretheterrier/?p=1262" rel = "nofollow">the odd stolen blog posting</a> but as well thousands of ouput results from cron (= scheduler on Unix) jobs (<a href="http://pascal.vanhecke.info/wp-content/upload_images/200530530_spamblog_gone_wild.png">click thumbnail</a> left).</p>
<p>In fact, it turns out this installation is driving other spamblogs such as the one at <a href="http://moretheterrier.co.uk/giftbasket/"  rel = "nofollow">moretheterrier.co.uk / giftbasket/</a>, <a href="http://moretheterrier.co.uk/clock/"  rel = "nofollow">moretheterrier.co.uk / clock/</a> and <a href="http://moretheterrier.co.uk/dolphin/"  rel = "nofollow">moretheterrier.co.uk / dolphin/</a>.  All of these are pointing to fake directory sites such as <a href=" http://www.tattooaftercareaffiliates.co.uk/" rel = "nofollow">www.tattooaftercareaffiliates.co.uk</a>, <a href="http://www.moretheterrier.co.uk/" rel = "nofollow">www.moretheterrier.co.uk</a>, <a href="http://www.dr-rock.co.uk/" rel = "nofollow">www.dr-rock.co.uk</a> and <a href="http://www.inkcarts.biz/">www.inkcarts.biz</a>.</p>
<p>The <a href="http://pascal.vanhecke.info/wp-content/upload_images/200530530_spamblog_mail.png"><img src='http://pascal.vanhecke.info/wp-content/upload_images/thumb-200530530_spamblog_mail.png' alt='mail a spam blog!' align='left' /></a>  <a href="http://pascal.vanhecke.info/wp-content/upload_images/200530530_spamblog_mail_result_cron.png"> <img src='http://pascal.vanhecke.info/wp-content/upload_images/thumb-200530530_spamblog_mail_result_cron.png' alt='trigger the script!!'  align='right' /></a> content is being fetched from several mailboxes (giftblog@moretheterrier.co.uk, dolphinblog@&#8230;, clockblog@&#8230;) by a (scheduled) <a href="http://wordpress.org/support/topic/6956">WordPress script</a> that enables users to post to their blog via email.  I don&#8217;t have a clue actually by what mechanism the mails are being generated. The nice thing however is&#8230; you can do this yourself: mail one of the adresses (screenshot left), then run the corresponding script (e.g. <a href="http://moretheterrier.co.uk/giftbasket/wp-mail.php">moretheterrier.co.uk / giftbasket / wp-mail.php</a>, screenshot right), and this is the result: </p>
<p><a href="http://pascal.vanhecke.info/wp-content/upload_images/200530530_spamblog_mail_result_online.png"><img src='http://pascal.vanhecke.info/wp-content/upload_images/thumb-200530530_spamblog_mail_result_online.png' alt='the result online!' /></a></p>
<p>Oh, by the way, there&#8217;s even more&#8230;  a Technorati search for <a href="http://www.technorati.com/cosmos/search.html?rank=&#038;url=http%3A%2F%2Fwww.moretheterrier.co.uk%2F">moretheterrier.co.uk</a> showed still some other (earlier) test setups for the same subject: </p>
<p><a href="http://moretheterrier2.blogspot.com/"  rel = "nofollow">http://moretheterrier2.blogspot.com/</a><br />
<a href="http://puppypottytraining1.blogspot.com/"  rel = "nofollow">http://puppypottytraining1.blogspot.com/</a></p>
<p>And there is also this (still earlier) setup at <a href="http://www.my-resource-site.com/"  rel = "nofollow" >http://www.my-resource-site.com/</a> using a &#8220;real <a href="http://www.my-resource-site.com/pets/seoautomator.php"  rel = "nofollow">SEO automator</a>&#8221; that, if I understand correctly, turns <a href="http://www.my-resource-site.com/pets/keywords1.txt"  rel = "nofollow">keywords</a> into <a href="http://www.my-resource-site.com/pets/puppies/"  rel = "nofollow">a list of</a> <a href="http://www.my-resource-site.com/pets/puppies/housebreaking-tips-for-puppies.php"  rel = "nofollow">keyword-optimized pages</a> linking to what maybe is the <a href="http://www.pottytrainyourpuppyin7days.com/"  rel = "nofollow">actual SEO-customer</a>&#8230; </p>
<p>Apparently WordPress turned out to be a more flexible system to build Spamblogs?</p>
<div class="tantan-getcomments"><a href="http://pascal.vanhecke.info/2005/05/31/spamblogs-in-action/#comments"><img src="http://pascal.vanhecke.info/wp-content/plugins/tantan/get-comments.php?p=41" width="100" height="15" style="border:0;" /></a></div>]]></content:encoded>
			<wfw:commentRss>http://pascal.vanhecke.info/2005/05/31/spamblogs-in-action/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Spamblogs</title>
		<link>http://pascal.vanhecke.info/2005/05/30/spamblogs/</link>
		<comments>http://pascal.vanhecke.info/2005/05/30/spamblogs/#comments</comments>
		<pubDate>Mon, 30 May 2005 19:46:26 +0000</pubDate>
		<dc:creator>Pascal Van Hecke</dc:creator>
				<category><![CDATA[SocialSoftware]]></category>
		<category><![CDATA[Spamblogs]]></category>
		<category><![CDATA[WebWatch]]></category>

		<guid isPermaLink="false">http://pascal.vanhecke.info/2005/05/30/spamblogs/</guid>
		<description><![CDATA[Social Software is, by its open nature, &#8220;stuff that gets spammed&#8221; according to Clay Shirky, who coined the term. Blogs are no exception. Comment spam, is one of the results of the automated scripts spammers use to target standardized blog software. Entire machine-generated blogs, &#8220;Spamblogs&#8221; are another. Why would anyone set up a blog that&#8217;s [...]<div class="tantan-getcomments"><a href="http://pascal.vanhecke.info/2005/05/30/spamblogs/#comments"><img src="http://pascal.vanhecke.info/wp-content/plugins/tantan/get-comments.php?p=42" width="100" height="15" style="border:0;" /></a></div>]]></description>
			<content:encoded><![CDATA[<p><a href="http://en.wikipedia.org/wiki/Social_software">Social Software</a> is, by its open nature,  &#8220;<a href="http://www.corante.com/many/archives/2003/11/16/jenn_theater_social_spam.php">stuff that gets spammed</a>&#8221; according to <a href="http://www.shirky.com/">Clay Shirky</a>, who coined the term.  </p>
<p>Blogs are no exception.  <a href="http://en.wikipedia.org/wiki/Comment_spam">Comment spam</a>,  is one of the results of the  automated scripts spammers use to target standardized blog software.  Entire machine-generated blogs, &#8220;<a href="http://www.technorati.com/tag/spamblogs">Spamblogs</a>&#8221; are another.<span id="more-42"></span>  </p>
<p>Why would anyone set up a blog that&#8217;s being written by automated scripts?</p>
<ul>
<li>scraping contextual advertising money, examples: <a href="http://cheapcarauctions4u.blogspot.com/" rel= "nofollow">Cheap Car Auctions 4U</a> and  <a href="http://cost-per-click-advertising.blogspot.com/" rel= "nofollow">Cost Per Click Advertising</a> (and yes, I did use the  <a href="http://blog.searchenginewatch.com/blog/050118-204728">rel = &#8216;nofollow&#8217;</a> attribute :-) )</li>
<li>boosting the pagerank of another (e-commerce) site: a special case of <a href="http://en.wikipedia.org/wiki/Link_farm">Link Farming</a>, a form of <a href="http://en.wikipedia.org/wiki/Spamdexing">search engine spam</a>*</li>
<li>doing both, such as  this <a href='http://www.moretheterrier.co.uk/giftbasket/' rel= 'nofollow'>gift basket blog</a>, both promoting a <a href="http://www.businessbrainwaves.com/" rel = "nofollow">Get Rich Quick</a> scheme and another <a href="http://www.tattooaftercareaffiliates.co.uk/Gift%20Baskets/unique-personalized-online-corporate-gift-basket-certificate.html" rel = "nofollow">AdSense Honeypot</a> on online gifts</li>
</ul>
<h3 id="toc-how-does-it-work">How does it work?  </h3>
<p>You start with some keyword sets and variations referring to expensive merchandise: cars, insurances, jewelry&#8230;  Then you use some <a href="http://backstage.bbc.co.uk/data/SearchApI?v=9w7">search engine APIs</a>** to fetch content being written on this subject, and pump fresh results into the fake blog with regular intervals, using the <a href="http://www.blogger.com/developers/api/1_docs/" >Blogger API</a>**&#8230;  and voilà!</p>
<p>By the way, I would  suggest to reserve the word &#8220;<a href="http://www.technorati.com/tag/spamblogs">spamblogs</a>&#8221; for machine-generated blogs in the strict sense, and indicate <a href="http://socialsoftware.weblogsinc.com/entry/1234000330039694/">fictitious blogs set up by marketeers</a>, as &#8220;<a href='http://www.technorati.com/tag/character+blogs'>Character Blogs</a>&#8220;, or, when they&#8217;re <a href="http://www.micropersuasion.com/2005/04/here_comes_anot.html">really bad taste</a> ***  &#8220;<a href="http://www.technorati.com/tag/fake+blogs">Fake Blogs</a>&#8221; :-).</p>
<p>Some more on <a href="http://pascal.vanhecke.info/2005/05/31/spamblogs-in-action/">Spamblogs in Action</a> and <a href="http://pascal.vanhecke.info/2005/06/01/hunting-spamblogs/">Hunting Spamblogs</a> now.</p>
<p><em>* one could argue a search engine like Google is social software too, since its <a href="http://en.wikipedia.org/wiki/PageRank">main algorithm</a> tries to produce relevant results for your search term by weighing the votes (hyperlinks) of all  participants on the web  (webmasters, bloggers&#8230;).</em><br />
<em>** an API is a way to programmatically do things: search engine APIs let you fetch and process search results using your own software, with the Blogger API, that is supported by most blogging software, you can use still other software to post and edit content on blogs</em><br />
<em>*** see the discussion in the comments of  the linked <a href="http://www.micropersuasion.com/2005/04/here_comes_anot.html">Micropersuasion blog entry</a> on when to call a blog either &#8220;fake&#8221; or &#8220;character&#8221;.  I would use &#8220;fake&#8221; for blogs that are desperately trying to look authentic ;-)</em></p>
<div class="tantan-getcomments"><a href="http://pascal.vanhecke.info/2005/05/30/spamblogs/#comments"><img src="http://pascal.vanhecke.info/wp-content/plugins/tantan/get-comments.php?p=42" width="100" height="15" style="border:0;" /></a></div>]]></content:encoded>
			<wfw:commentRss>http://pascal.vanhecke.info/2005/05/30/spamblogs/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

