August 27, 2004

Coral Content distribution network - DNS/HTTP based P2P

From Gojomo - this is very cool:
To take advantage of CoralCDN, a content publisher, user, or some third party posting to a high-traffic portal, simply appends .nyud.net:8090 to the hostname in a URL. For example:

http://news.google.com/ --> http://news.google.com.nyud.net:8090/

Through DNS redirection, oblivious clients with unmodified web browsers are transparently redirected to nearby Coral web caches. These caches cooperate to transfer data from nearby peers whenever possible, minimizing the load on the origin web server and possibly reducing client latency.


Wicked cool and finally a REST based scalable p2p network. I wonder how I could use that at Amazon...

August 25, 2004

What's XMPP?

phil wilson mentioned XMPP as being used to deliver Atom feeds. While I don't know much about XMPP (must research...).
I'm disappointed that so many people require so many hoops to publish XML messages in near real-time, when HTTP is sufficient.

More fragmentation of the network and address space used for exchanging messages. Remember - "There can be only one".

August 24, 2004

Flickr REST Service not so RESTful

Lucas points out that the Flickr service's "REST" interface isn't very RESTful at all. It's such a shame, because using HTTP directly is so simple and there are plenty of examples and discussions. You know, it's okay to get a little help from your friends...


August 22, 2004

Blogs, pubsub and Web Collections

I've been looking at publish/subscribe and Web based notifications for quite some time. A few years ago I built searchalert.net in order to learn directly what it might take to do Web scale notifications. Of course, I haven't done anything near large scale, but it still gives me a chance to write software to do something fun, while my day jobs drift farther from directly coding cool stuff.

It was somewhat annoying that Google started to offer free email notifications for their search results - that's pretty much what searchalert.net was doing. As a weekend project, I didn't expect searchalert to really become a company, but that event pretty much put an end to the whole notifications-as-business idea.

Recently I've started looking at what pubsub.com is doing - their recent performance numbers sounded intriguing (something like 2.4M 'matches' a second).

In order to understand what all these have in common, I've tried to apply REST architectural concepts - specifically, resource modelling - and from there I'll try to predict where these will go next and who else will get into the game.

So let's start with pubsub.com - they provide subscription and notification technology for notifying users about web logs, newsgroups and Edgar fillings. Now let's compare with searchalert.net - they provide subscription and notification technology for notifying users about web search results. And lastly, Google - they provide subscription and notification technology for notifying users about web search results. (Notice that searchalert.net and Google are annoyingly similar.)

Even though pubsub.com talks about publish/subscribe technology, there isn't any 'publish' in their technology. This isn't a bad thing, the Web is full of publishing technologies to choose from.

Let's break things down into subscriptions and notifications. In the area of subscribing, pubsub.com supports subscriptions over three sets of data - blogs, newsgroups and Edgar financial filling. Both searchalert.net and Google support subscriptions over two sets of data - general web search results and web search results focused on 'news'.

Subscription management varies across these three providers. pubsub.com will list your subscriptions and give you edit and delete capabilities. searchalert.net will also list your subscriptions and provide edit and delete capabilities. Google has no way of listing subscriptions or modifying them, but you can cancel a subscription from a link within the email notification itself.

For notifications pubsub.com supports Jabber-based IM and an RSS or Atom feed hosted on their site. searchalert.net supports daily or immediate notifications via email as well as Web notifications - sending XML to a Web address of your choice (this includes Atom formated XML to your blog, Weblogger API calls, etc). Searchalert also has search results in RSS, but it isn't a public feature and trying to scale the load would be too much at this point. Google supports daily email notifications.

So how does REST and resource modelling fit into all this and how RESTful are these different approaches?

In each of these systems, a user provides search terms and the system sends notifications about the search results - basically a 'saved search'. There are two resources - the search and the collection of search results. It's the collection of search results that is key. The notification system pays attention to this resource - items added to this collection would generate a notification. Theoretically, items removed or re-sequenced could also generate notifications. Essentially, these systems are a large search index. The resource that is the collection of search results has several representations. For Google, an HTML representation is their first choice (and how they became rich and famous). For pubsub.com, it's Atom flavored XML.

The resource that is the search results is interesting, because it is so similar to what a blog is - a collection of items. An RSS or Atom feed is essentially a format for a list of items. (Simple HTML with unordered-lists and list-items could do that, but where is the glory in that?) Blogs are generated manually by an author or editor, feeds are generated automatically - like from a search index. Search results are lists of items - that's what pubsub.com does for blogs and Google does for the Web and for news. And it's what Amazon does with popular products. All of these are just Web resources that are search results generated from a large search index.

I think pubsub.com is in a losing game because creating, hosting and serving up very large search indices is the core application from Google, Amazon and others. There are companies such as Technorati that do this just for blogs. It would be straightforward for Technorati or Google to provide search results in RSS and Atom. Instant competition.

So how RESTful are these approaches?

Unfortunately, I don't have the time to do a full analysis, but here's what I've found so far:

  • Google supports creating a resource for search results in one step - merely put the search terms in the URI. Both RESTful and useful.
  • Technorati supports creating a resource for search results in one step - merely put the search terms in the URI. Both RESTful and useful. They should add a 'view as Atom' button (easy) or a 'tell me when this changes' button (harder) and they'd be in the pub/sub business too.
  • searchalert.net and pubsub.com require a two step process - submit the search terms and a magic URI is created. RESTful but not the most useful. (Shame on me for doing it this way... I see another weekend project coming up)
  • pubsub.com provides an XML representation of search results, but they also use client-side stylesheets to display the results as HTML in a browser. Nicely RESTful and extra credit for using client-side processing in a standards-compliant way.
  • Google's email notifications have a URI to cancel your subscription, and merely visiting the page cancels the subscription. Very not RESTful and double plus ungood. A utility that automatically pre-fetches pages referenced in your email would auto-cancel a lot of things.


August 18, 2004

Information finds you

A long time ago I worked on building an enterprise information portal product. At the end of my stint I designed and implemented the subscription and notification system. My mantra at the time was "You don't have to find the information, the information finds you." This is now more and more possible with a wide variety of services. Google monitors the Web and has email notifcations for search results, pubsub.com monitors blogs and provides RSS feeds of the results, and I finally have my old SearchAlert.net service running again.

What SearchAlert does is monitor searches and send notifications when new results are found. One of the advanced features of SearchAlert are to let a user choose between searches against news items or the full web. A new feature added recently is to send Web notifications in addition to email notifications. The Web notfications send XML (or whatever) to whatever receiver you choose, and pre-defined templates support the Atom API, the Weblogger API and others.

August 17, 2004

PubSub growing up

The pubsub.com company sure has gotten more polished and grown up looking. They also claim to perform 3 billion matches per second. That's a lot. I wonder if they will go to a distributed system - subscriptions, notifications, etc. Distributed outside the pubsub.com firewall that is.

August 16, 2004

Teepee - internet scale event notification

Rohit is on a roll again via The Now Economy

One of the primary thrusts of our work at CN Labs will be a new kind of internet-scale event notification service: an application-layer router. Just like there's an IP packet format at the network layer, there ought to be a new standard that unifies the welter of application-layer protocols: smTP, htTP, fTP, nnTP, and more.

TP, a Transfer Protocol, merely provides a best-effort delivery service for named, MIME-typed bags of bits. Rather than using IP addresses, those names are the endpoints that identify multiple services.


This sounds like a protocol geared to one-to-many delivery.
(Although the example of 'delta > 5' seems like a bad choice as it is susceptible to the sampling frequency of the event source.)

Which are you?

Most excellent and objective analysis from Mr Pilgrim Why specs matter

Most developers are morons, and the rest are assholes.
New links for metaweblog api
NewsIsFree: Yahoo! Group: MetaWeblog API: Information About This ...

August 14, 2004

Microsoft PM has Web 'Aha' moment

Dare has an interesting Aha moment - directly and indirectly courtesy of The Web.

Put more succintly, a technology doesn't have to solve every problem just enough problems to be useful. Two examples come to mind which hammered this home to me; Tim Berners-Lee's World Wide Web and collaborative filtering which sites like Amazon use.


I love how with one simple phrase the global, open-ended, all-encompassing Web is minimized by assigning ownership to TimBL. I predict still more Aha moments ahead for the young Redmond Jedi.

A JavaScript XmlSerializer

Hey look A JavaScript XmlSerializer - now why does that sound familiar... oh, yeah, I did that at KnowNow three years ago. Here's an example mod-pubsub app that uses soap encoded messages to demo an offer/bid system within a browser. The specific SOAP encoder/decoder JavaScript library is also available (it's composed of several individual files - take a look at the source if interested).

August 13, 2004

Middleware Matters: ESBs vs. decentralization

Vinoski says
"Ah," you say, "but connecting endpoints directly to each other brings us back to the N^2 connection problem." Hypothetically, yes. In practice, no. In practice, nobody ever needs to connect everything to everything else. The N^2 problem just doesn't arise. Installing a bus to solve the hypothetical problem and trading away all possibilities for speed, efficiency, and flexibility in the process is often an extremely poor choice.

August 07, 2004

Vogels at Amazon!

Good Gawd Werner Vogels is going to Amazon - this is so cool. I can't wait to get back from vacation and find out what's up.

It is my distinct pleasure to announce that I have accepted a position as Director of Systems Research at Amazon.com starting in September 2004. [...] If you are an architect or a seasoned developer who wants work on very advanced, complex systems, this is the moment to start considering Amazon.com.

Unanticipated benefits - streaming to you know

Jon Udell's article on Prime-Time Hypermedia highlights something that highly technical Web enthusiasts have known about (and promoted to some degree) but also can serve as an example of how a decent foundational protocol like HTTP can foster 'unanticipated benefits'.
Something that might not be obvious after reading Jon's article is that a resource-modelling viewpoint provides for a nearly infinite number of resources synthesized from one concrete media file. For example, if an intermediary provided a URI with query terms for segments of an underlying audio file, there could be an infinite number of segments - some might exist and some might not (like if they were off the end of the file). A resource is just a logical concept mapped to some concrete set of (dynamic or static) data.

Bosworth resumes blogging

Adam Bosworth's KISS and The Mom Factor is a welcome sight - an indicator that he has surfaced from what must have been many months of angst and searching. I'm very glad to see him resume his industry shaping trajectory. It would have been nicer if I'd known and could have connected him with Amazon rather than Google, but oh well.