It was somewhat annoying that Google started to offer free email notifications for their search results - that's pretty much what searchalert.net was doing. As a weekend project, I didn't expect searchalert to really become a company, but that event pretty much put an end to the whole notifications-as-business idea.
Recently I've started looking at what pubsub.com is doing - their recent performance numbers sounded intriguing (something like 2.4M 'matches' a second).
In order to understand what all these have in common, I've tried to apply REST architectural concepts - specifically, resource modelling - and from there I'll try to predict where these will go next and who else will get into the game.
So let's start with pubsub.com - they provide subscription and notification technology for notifying users about web logs, newsgroups and Edgar fillings. Now let's compare with searchalert.net - they provide subscription and notification technology for notifying users about web search results. And lastly, Google - they provide subscription and notification technology for notifying users about web search results. (Notice that searchalert.net and Google are annoyingly similar.)
Even though pubsub.com talks about publish/subscribe technology, there isn't any 'publish' in their technology. This isn't a bad thing, the Web is full of publishing technologies to choose from.
Let's break things down into subscriptions and notifications. In the area of subscribing, pubsub.com supports subscriptions over three sets of data - blogs, newsgroups and Edgar financial filling. Both searchalert.net and Google support subscriptions over two sets of data - general web search results and web search results focused on 'news'.
Subscription management varies across these three providers. pubsub.com will list your subscriptions and give you edit and delete capabilities. searchalert.net will also list your subscriptions and provide edit and delete capabilities. Google has no way of listing subscriptions or modifying them, but you can cancel a subscription from a link within the email notification itself.
For notifications pubsub.com supports Jabber-based IM and an RSS or Atom feed hosted on their site. searchalert.net supports daily or immediate notifications via email as well as Web notifications - sending XML to a Web address of your choice (this includes Atom formated XML to your blog, Weblogger API calls, etc). Searchalert also has search results in RSS, but it isn't a public feature and trying to scale the load would be too much at this point. Google supports daily email notifications.
So how does REST and resource modelling fit into all this and how RESTful are these different approaches?
In each of these systems, a user provides search terms and the system sends notifications about the search results - basically a 'saved search'. There are two resources - the search and the collection of search results. It's the collection of search results that is key. The notification system pays attention to this resource - items added to this collection would generate a notification. Theoretically, items removed or re-sequenced could also generate notifications. Essentially, these systems are a large search index. The resource that is the collection of search results has several representations. For Google, an HTML representation is their first choice (and how they became rich and famous). For pubsub.com, it's Atom flavored XML.
The resource that is the search results is interesting, because it is so similar to what a blog is - a collection of items. An RSS or Atom feed is essentially a format for a list of items. (Simple HTML with unordered-lists and list-items could do that, but where is the glory in that?) Blogs are generated manually by an author or editor, feeds are generated automatically - like from a search index. Search results are lists of items - that's what pubsub.com does for blogs and Google does for the Web and for news. And it's what Amazon does with popular products. All of these are just Web resources that are search results generated from a large search index.
I think pubsub.com is in a losing game because creating, hosting and serving up very large search indices is the core application from Google, Amazon and others. There are companies such as Technorati that do this just for blogs. It would be straightforward for Technorati or Google to provide search results in RSS and Atom. Instant competition.
So how RESTful are these approaches?
Unfortunately, I don't have the time to do a full analysis, but here's what I've found so far:
- Google supports creating a resource for search results in one step - merely put the search terms in the URI. Both RESTful and useful.
- Technorati supports creating a resource for search results in one step - merely put the search terms in the URI. Both RESTful and useful. They should add a 'view as Atom' button (easy) or a 'tell me when this changes' button (harder) and they'd be in the pub/sub business too.
- searchalert.net and pubsub.com require a two step process - submit the search terms and a magic URI is created. RESTful but not the most useful. (Shame on me for doing it this way... I see another weekend project coming up)
- pubsub.com provides an XML representation of search results, but they also use client-side stylesheets to display the results as HTML in a browser. Nicely RESTful and extra credit for using client-side processing in a standards-compliant way.
- Google's email notifications have a URI to cancel your subscription, and merely visiting the page cancels the subscription. Very not RESTful and double plus ungood. A utility that automatically pre-fetches pages referenced in your email would auto-cancel a lot of things.