Legal aspects of screen scraping

Is it fair use to extract images and their captions from an online newspaper and republish it? Some considerations that come into play:

  • the right to quote is often used to justify screenscraping. But am I quoting the newspaper… or republishing a full work (cartoon and caption) in its own right?
    And am I “republishing” the images (which are ZaZa-copyrighted), if all I do is link to the cartoon images in their original location, without copying or modifying them – see my second bullet?
    And is “quoting systematically” still considered quoting – or is it “data theft” – see also 3d bullet?
  • the nature of linking with respect to copyright. Deep linking is never considered a violation of copyright by the web community, whereas inline linking or “hot” linking is met with outcry. The distinction is… rather blurry, since it depends on the browser you’re using.
    • Modern browsers will show the Feedburner feed with images, but if you omit just one line from the feed (the link to the Feedburner stylesheet – it can be switched off in the Feedburner settings as well BTW), the feed looks like this. The result in an rss reader (which is just another way of browsing the web) will still be the same regardless the stylesheet – with images.
    • And what about this Greasemonkey userscript, that tells Firefox to show linked images on the linking page itself? (Try it on these del.icio.us links to gif files)
    • As a sidenote: one can ask yourself whether this “historical” aversion of hotlinking still is relevant in times of abundance of bandwidth – it strikes me that the web tends to be a lot more tolerant towards the more recent phenomenon of remotely playable mp3 files – see for example how del.icio.us and Bloglines let you listen at their sites to mp3’s they don’t host themselves.
  • the legal protection of databases. Screenscraping relies on recurring patterns in html output, with content that is pulled from a database. Screenscraping and publishing hyperlinks (with titles and summary) boils down to reproducing (part of) a database, which is an infringement, even if every single hyperlink-with-quote could be considered fair use.
    • If Belgian newspapers were to take Google to court because of news.google.be (see this story on the conflict), this law would be probably their main argument. The similar kranten.com case in the year 2000 however, was lost by the newspapers. If you find its argumentation an interesting read, you might also be interested in this (Dutch language) article on deep linking and related legal issues.
    • And yet another intriguing idea: what if a link collection is not published by one individual, but a collaborative effort? As an example: Nivi suggested using del.icio.us to create a Ricky Gervais video podcast by letting individuals save the video file locations with a specific tag (that was before there was an official video podcast on RickyGervais.com)**.
      If many different people use that same tag for many different urls, who is responsible for the resulting link collection aka podcast? Neither one of the individual users nor the platform (del.icio.us) it seems to me… Spamming the system would be an easy countermeasure however :-).

So, what do you think? Do you know of any other court cases where sentence was passed? Leave a comment or trackback…

** Technically, it wouldn’t work as things stand now however, because del.icio.us doesn’t create enclosures for video or images – yet – as it does for mp3 files. Compare the .mov feed with the .mp3 feed. Letting the feed pass through Feedburner seems to be a remedy for .mov, but not for images, see my improvised Del.icio.us .mov vodcast as an illustration.

3 Responses to “Legal aspects of screen scraping”

  1. Zaza Cartoons RSS feed (screenscraped from www.standaard.be) Says:

    […] Notes, links and conversation « Digg Citation Search bookmarklet Legal aspects of screen scraping » […]

  2. Belgian court ruling on Google News both confusing and a dangerous precedent Says:

    […] These points are combined with arguments that traditionally have been used against news aggregators and deeplinking (so far unsuccesfully, see the Kranten.com case as discussed in a previous posting): the news aggregator disintermediates news publishers by making people skip the home page, reducing the stickyness of the news site, etc… […]

  3. Anonymous Says:

    It’s interesting point of view.
    So, what about when you scrap a public web site … so .. the information is public … are there rights over this information ?