Hosted screenscraping: HTML to RSS with Dapper
There are several screen-scraping services out there, but Dapper is one that’s both versatile and visual. With a bit of trial and error, everyone can transform html web pages (or more precisely: changes in web pages) into email notifications, a startpage widget, RSS or another syndication format. Take this example:
Jan, who I happen to know as the guy who wants to be the first “Jan” in Google, has this blog where he writes about a variety of subjects… I would like to subscribe to his Ruby postings, but there’s no tag feed. I am sure he would be able to come up with one if I asked him, but with Dapper, it took only 4 minutes to create an RSS feed from the tag page:
The screencast is really simple and straightforward and bypasses most of the features - my only goal was tho create a simple “Dapp” with this feed as a result.
More on Dapper
If it makes you curious, and you want to learn more on Dapper, have a look at the more comprehensive introductory demo, parameterizable Dapps or refining your Dapps. Browse the Dapps published by other users. More Dapper coverage on Techcrunch, Mashable and Readwriteweb.
Other Screen scraping services
From my del.icio.us selection:
-
Feed43 (Feed For Free) : Convert any web page to news feed on the fly
“converts free-form HTML or XML documents to valid RSS feeds by extracting snippets of text or HTML by means of applying search patterns, and then joining these snippets together using output templates to form user-friendly content of feed’s items. ” : search by patterns in the html
-
Feedity - RSS Web Feed
Generator for Web Pages without Syndication“Another html to rss screenscraping service” : similar to Feed43
-
openkapow
“Combination of 1. RoboMaker, a desktop visual scripting tool, with which you define screenscraping scripts. 2. OpenKapow, an online service where you can host and share your scripts as REST, RSS, ATOM or HTML services”: a more heavy-duty editing environment and hosting service for developers
-
Page2RSS - Create an RSS feed
from any web site“It is a service that helps you monitor web sites that do not publish feeds. It
will pull the updates from any site and deliver them right to your favorite
RSS reader.”: produces an notification feed for any changes on a page -
ChangeNotes.com - Monitor web site changes
-
WatchThatPage - Monitor web pages extract new information
Two email notification services that monitor changes on your selected pages
If you know of other services, post them in the comments.
May 18th, 2007 at 15:38
[...] Van Hecke has a new post up about Dapper.net. It’s an excellent service that allows you to create feeds out of html [...]
May 21st, 2007 at 06:11
I like Feedity better than Dapper. It is much simpler!
May 6th, 2008 at 12:30
I tried using Dapper but didnt have too much success, I think it tries to be too user friendly and so loses out the ability to strip information out accurately.
May 6th, 2008 at 13:50
You might be better off with Yahoo Pipes nowadays, especially if you’re familiar with html markup. I’ve been planning to write more on that, just lack time.
June 4th, 2008 at 01:50
[...] tijdje geleden toonde ik een vriend-journalist mijn screencast over Dapper html-naar-rss-screenscraping. Hij is zwaar Netvibes-gebruiker, maar vloekt maar al te vaak als een organisatie anno 2008 [...]
July 15th, 2008 at 03:45
I would like to suggest a great new site that organizes your RSS feeds.
It employs a bayesian filter for RSS feeds where you can train the filter what you like and
what you don’t like. It’s free, try it at http://www.filteredrss.com.