Hunting spamblogs

Several people have been complaining about Spamblogs and suggesting remedies:

It’s probably search engines that hold the key here. Spamblogs are annoying, not because of their mere existence, but because you notice them: they show up in search results. Technorati is very much aware of it and tries to do its best (David Sifry’s March report):

“Part of the growth of new weblogs (30,000 – 40,000 each day) created each day is due to an increase in spam blogs – fake blogs that are…”

“…we feel that we’ve been able to capture and identify most of the spam out there, but one should note that there is definitely blog spam that we don’t catch…”

Why not lend Technorati (et al) a hand…

… and tag the spamblogs that slipped through the net. If you get annoyed by a Spamblog showing up in your Technorati Watchlist (or Google Alert, or any other search feed), report it, tag it: “spamblog” * ! You could use del.icio.us or furl, the services that are being aggregated by Technorati.

Yes, I know, there are obvious objections. If your furl/del.icio.us rss feeds are being republished, you increase the Spamblog’s exposure, and without rel=’nofollow’, even its pagerank, exactly what the spammer is aiming for. You might keep the “bad” links in another account, or another bookmarking service than you normal link collection.

Still, hunting spamblogs by tagging them as spam is a tool users have available at this very moment. And it is simple and easy to use**. For the spambusters, using the spamblog tag rss feeds in (semi) automatic blacklisting is piece of cake. And yes again, there’s the danger of having controversial, opinionated blogs ostracised by adversaries – but Technorati et al can easily outweigh spamtags against “affirmative” tags and incoming links. So let’s start!

* as suggested in my introductory post, I would limit the tag “spamblog” to machine-generated blogs – and distinguish them from “fake” or “character” blogs…
** unlike e.g. the Votelinks concept, that hasn’t taken off so far

Update 20.00h, June 1

Continuing on “outweighing spamtags against ‘affirmative’ tags and incoming links”: Pubsub, a service similar to Technorati already has a system called “Linkranks” they use to filter search feeds: see the discussion at: hyku.com. Technorati probably has something similar.
Without any doubt, blog search services like Pubsub and Technorati definitely should include in their own interfaces as well an easy way for their users to report spamblogs (like you can report spam in Yahoo of Gmail), but centralised, independent “blogspam reservoirs” such as http://del.icio.us/tag/spamblog or http://www.furl.net/furled.jsp?topic=spamblog could help all of them and would definitely be a step forward from sending an email to feedback@, which is the procedure now.

Update June 25

Adsense has an easyway now to report spamblogs that are running Adsense Ads. From Jensense:

“If you see a site violating the AdSense terms, you can now file an anonymous spam report that will get to the quality team for checking. To file a report, you simply go to the page that is showing AdSense ads and click on the “Ads by Google” (or “Ads by Goooooogle”) link. In the form on the next page, include the term “spamreport” and put in a short reason about why you feel the site is violating the AdSense terms or policies. You can also enter your own email address, if you wish, then click submit.”

Update July 13

Apparently, the idea of building a public blacklist of spamblogs on del.icio.us or other bookmarking services hasn’t taken off.
If I had given it more thought then, I should have seen why: there is no incentive for denouncing a blog as spamblog as long as blog search engines don’t use it (a chicken and egg problem: they won’t do this before some substantial amount of data has been collected ).
And even then, there’s no immediate reward for doing so. What the user wants (as I suggested at other complaints on spam blogs and technorati spam), is a simple simple “report as spam” button or link in the (web/email/rss) interface itself (comparable with the email spam buttons in Gmail, Yahoo Mail, Hotmail etc…) so that annoying blogs or feeds are filtered from the resultset immediately. Definitely something to look forward in a next Technorati release, see the David Sifry’s comments on this blog search wish list.
So far for the idea of having a public blacklist, because I’m afraid none of the services (Technorati, Feedster, Pubsub, Yahoo Myweb2.0,…) will be motivated to share its results though (the ability to effectively filter out spam being a major competitive advantage!)…

19 Responses to “Hunting spamblogs”

  1. James Farmer Says:

    Thanks for the follow up Pascal.

    I like your idea but don’t think that my readers would necessarily be too happy, part of my concern is that my content is obviously being used to generate $ without my permission… it’s a breach of my CC license & Blogger et al should deal with it, or at least have a system in place, dontchathink?

    It needs a systematic approach too… or is that ever possible with spam?

  2. Hans on Experience Says:

    Thanks Pascal. I think it is a good idea to start with the tagging thing. We could hunt them by using the del.icio.us tag (it should be good if Sifri and Loic also nows this). It is important that more people are going to use that. So spread the word on all these examples you mentioned of people who are concerned.

  3. Hans on Experience Says:

    Hunting SPAMblogs: spread the word!

    Pascal Vanhecke did a great job in describing what spamblog are and how they work. Now he has a posting on how to hunt these spamblogs. The idea is very simple. When you find one, please save them in a spamblog del.icio.us tag. In this way they got kno…

  4. Hans on Experience Says:

    Pascal, I saw you did also a great job in letting the people involved know about your idea. One thing right now. I post my deliciouspostings on my blog daily. How can I leave these spamblogs out of this?

  5. Hans on Experience Says:

    A question from a reader of my blog: What if I tag your blog with that tag? Is there a way out of the tag.

  6. Pascal Says:

    @ James

    I like your idea but don’t think that my readers would necessarily be too happy

    & @Hans:
    I post my deliciouspostings on my blog daily. How can I leave these spamblogs out of this?

    There’s no solution for this. If you republish your del.icio.us link feed on your blog , you probably do not want to store your spamblog-tagged links in the same del.icio.us account. The same goes if you have subscribers for your link feed and you don’t want to annoy them. Unfortunately it’s not yet possible to exclude a the links with a certain tag, although there has been some brainstorming on the del.icio.us mailing list: http://lists.del.icio.us/pipermail/discuss/2004-September/000976.html (and I reformulated the question at http://lists.del.icio.us/pipermail/discuss/2005-June/003287.html). You could come up with a solution, although it’s not really handy:

    1. a second del.icio.us account (hint: if you’re using two different browsers, you stay logged in in each of the accounts and use e.g. the “view in IE” Firefox extension to quickly switch: http://ieview.mozdev.org/

    2. a furl account if you’re mainly using del.icio.us or vice versa. In fact, that’s what I just did now: I created an extra Furl account. Furl.net has a bit of a bloated interface for my taste, but it has this handy feature of the “Fast – “Furl!!” bookmarklet (under “My Tools” when logged in). If you then set the default category to “spamblog” in your preferences, you just tag a spamblog with just one click, no further input required (see http://www.furl.net/furled.jsp?topic=spamblog and http://www.technorati.com/tag/spamblog for the result).
    I do admit however that creating a second account takes away my main argument for tagging: using a simple tool people already use.

  7. Pascal Says:

    @ James:

    t’s a breach of my CC license & Blogger et al should deal with it, or at least have a system in place, dontchathink?

    Sure! But even a better anti-spam policy at Blogger (they could be helped by as well by user-tagging ;-) ) will not solve the problem of spamblogs… they’re just going to move to other, more lax services or fall back on self-hosted solutions, like multi-user WordPress, such as the (rather clumsy) setup I described in a previous post “spamblogs in action

  8. Pascal Says:

    @ Hans:

    What if I tag your blog with that tag? Is there a way out of the tag.

    No, there’s no way out. If someone doesn’t like you, he can tag your blog as “spamblog”. Even more people could do so. However, if Technorati, Pubsub and similar services would start to use these spamblog-tags, it would definitely be just an extra help in a system they already have. So for example, if I tag your blog as “spamblog”, Technorati’s algorithm would compare my negative “vote” with “positive” votes by other bloggers linking to your blog. Or compare it with your blog’s Google Pagerank – and decide that this one negative opinion doesn’t really count (that’s what I meant with “outweigh spamtags against ‘affirmative’ tags and incoming links” in my post).
    “Real spamblogs” (guess that’s an oxymoron) typically do not have pageranks or “real” incoming links – so they could be filtered out succesfully

  9. Martin Says:

    If they are pulling content off other web sites and the actual sites are identifiable ( I suspect in some areas all of it is copied, repeated, etc. so much that this is not possible. It might be possible to compare blog postings with pages from a site (complearn http://www.complearn.org/ would be my first choice), something that is between all of them but has no content that is unique to it – declare it spam. Pair this with the sentance parsing sections of a natural language parsing program to detect random text and it might be possible to catch some of these automatically.

    Alternatively, extract (feature extraction algorithm of some sort) some of the keywords that are used regularly in the blog, feed them through $SEARCH_ENGINE, if it comes up in the top 20, report it for human moderation. You’d only catch successful spam blogs, but if that’s hitting them where it hurts…

    Or perhaps I’ve enitrely misunderstood the problem…

    hi Pascal BTW.

  10. Surf The Mind » If spam sites in your search results are getting you down… Says:

    […] can also enter your own email address, if you wish, then click submit. In a similar vein, Pascal Vanhecke has a post suggesting methods of reporting spamblogs using social netwo […]

  11. Online Idea Buzz Says:

    Kick spam — use Technorati

    Hunting spamblogs suggests: … and tag the spamblogs that slipped through the net. If you get annoyed by a Spamblog showing up in your Technorati Watchlist (or Google Alert, or any other search feed), report it, tag it: “spamblog”

  12. Backbone Blogging Survey Says:

    Will Technorati.com Remove the Spammers?

    In my work helping several clients on blog relations strategies for Backbone Media I’ve been using Technorati.com more often than ever before in the last few months. For eight months I was Director of Marketing for 48hourprint.com where I was…

  13. John Cass Says:

    I entirely agree with your statement “the ability to effectively filter out spam being a major competitive advantage.” One of the reasons I suggested giving people to sort posts by the number of links to a site was because I thought that would push the legit content to the top. Though one of my blogging/SEO colleagues tells me many of the spammers have a number of links anyway. Whether that’s because they set up multiple blogs or because people are linking to their typically ripped content we did not research.

    Any other ideas on how the feed search engines can sort their content?

  14. Pascal Says:

    @John on counting incoming links:
    Every search engine and feed search engine will have or need to have its own way to rank “trust”, “authority”, “relevance” or whatever you’ll call it.
    They will not reveal how it works of course: competitive advantage etc…

    As you indicated yourself, the number of incoming links is probably not a good indication on itself. An algorithm to build in index of “trusted” sources by following links all starting from a number of trustworthy, human-verified websites, was published a while ago: you can find a summary here:

    Another path is the one followed by Yahoo MyWeb 2.0 http://myweb2.search.yahoo.com/ (building further upon the idea of del.icio.us and other social bookmark services): let your network of friends filter your search results, in a bit the same way as online social networks filter out interesting new contacts:
    http://www.corante.com/many/archives/2005/06/28/yahoo_social_search_act_ii.php

  15. Kailash Nadh Says:

    I receive thousands of spam blog pings every day at Pingoat.com ;( [ Even after all the antispam measures that have been employed ].

    A public black list, or a certification etc.. wont just work. Its very difficult to create a black list of a million spam blogs (growing steeply) or to certify ten million blogs.

    I am constantly thinking and thinking. For now, I am working on an algorithm that’ll predict whether a blog is spam or not. (It wont be the perfect solution, but it should work to an extend)

    Regards,
    Kailash Nadh

  16. Splog Fighter Says:

    I for one have started a splog fighting blog and I will be going after splogs one by one. Current splogs tend to be AdSense policy violators and that makes them bit easier to curb the growth. I do believe the hard core email spammers will eventually move into splog territory. That’s when things are going to get ugly.

  17. Hunting spamblogs continued: splogs and splogreporter Says:

    […] ith SplogReporter.com, a centralised public blacklist of spamblogs, similar to my (failed) suggestion of tagging spamblogs using services like furl or del.icio.us. I still don&# […]

  18. a feminist, lost in the patriarchy Says:

    Spam Blogs Putting A Spanner In The Works…

    I’m officially on my Easter academic break right now, and what better way to spend a lazy Saturday afternoon than browsing the blogsphere?
    The tool of choice has lately been Technorati. I tend to randomly put words into their search engine and s…

  19. Quaequam Blog! » Blog Archive » The rise of the spamblog Says:

    […] the spamblog. I’m not sure if that is the correct term for them (although I notice at least one other person refer to them as such), but they are those weblogs, apparently entirely bot created which do […]