April 28, 2006

Microsoft is building a Google cluster

From Greg Linden's blog, a quote from Microsoft:

"the people who could build a viable [Web] services infrastructure of scale are companies that have both the will and the capacity to invest staggering amounts of money."

Hmm. I suppose that would exclude Google, BitTorrent and the Web itself. Unless they meant "the people that need to catch up will need staggering amounts of money, otherwise they lose".

dojo.storage: Client-Side Storage

From Ajaxian, here is a post about
offline access and client-side storage via Dojo. We are getting close to the tipping point for disconnected use and client-side state - and with the strength of the REST buzz (I can't believe how many people are actually applying REST!) this really bodes well for the next few years of Internet-scale application development. If nothing else, at least it's yet another way to route around the damage that is the Win32 API.

April 24, 2006

Pubsub .vs. polling - again

This is a great read from Bob Wyman on Dave Winer's comments about polling compared to notification :
From As I May Think...: Dave Winer: Show me that mathematical proof!

Dave's lack of understanding of the issues related to scaling can be seen in the history of the weblogs.com site that he struggled to build and maintain for so long. That site takes "pings" from blogs and then consolidates them into tremendous "change lists" which must be polled. Essentially, this site converts an efficient push-based update notification system (pinging) into an inefficient polling based system. Weblogs.com, as Dave built it, didn't even support common methods like eTags or RFC3229+feed to improve polling efficiency and scaling. The result was that it simply didn't scale and was frequently incapable of providing the service levels that people expected. Only now that Verisign has taken over the site and dedicated much better engineering staff and much more hardware, has the weblogs.com service begun to be somewhat useful again. However, since weblogs.com is still based on the terribly inefficient polling of change-lists that Dave supports, it is still a far cry from being what it might be.

Ouch. But I did like the nod to KnowNow and mod-pubsub. Not sure if mod-pubsub is still alive and kicking though.

April 19, 2006

Best Google Image

The latest home page graphic from Google is my absolute favorite -

Delta feeds - RFC3229+feed

About a year and a half ago, Bob Wyman was instrumental in defining an approach to greatly reduce the load and bandwidth used by applications that polled for changes to RSS/Atom feeds.
The other week, he noted that Microsoft will support the RFC3229+feed approach as well - which is good.

The only problem I have with this approach is that I think it is simpler to use hyperlinks, and I haven't seen a real comparison between the two. Both approaches have the client application maintain state of what data was last retrieved, but using hyperlinks has more chances for pre-existing caching servers to work without modification. I think the Atom protocol has defined something like this, but I couldn't follow the email threads.

To use hyperlinks, the data returned in a feed would have a link to the 'next' (more recently changed) posts. The client would then follow that link, which would either be empty (and optionally have a cache-control header to indicate how long to wait before checking again) or have more data - along with another 'next' link. The client just keeps following the links. The client would have two URIs - the original, well-known location that new readers start from, and the changing one which is the set of data most recently retrieved by that particular client. The server decides what the 'next' link is and what it contains - the data would be very cachable across all clients, merely by checking the URI.
The downside of this approach is the need to put the link within the content of the response - or add a response header for that location if the content isn't easily extended.

I put together a sample application that shows how this works - this is a simple html/javascript chat client, this is a link to the list of messages.

April 12, 2006

Broken as Designed

Sith Obasanjo in Broken as Designed can't decide which way is away from the dark side -
"However I do think some Web/REST advocates need to look around and realize what's happening on the Web instead of arguing from an 'ideal' or 'theoretical' perspective."

The Web advocates need to realize what's happening on the Web??

Okay, we all know some resources have broken content-type headers when you retrieve them. Others don't. Some clients never use content-type correctly, others do and many use heuristics. That's okay. Just do your homework - and part of that is to read the TAG finding and use it where appropriate - and be a good web citizen. That's simply practical advice based on practice. The "don't use content-type" is a theory that is unproven in practice.

Update - it looks like Dare found breakage in Cookies as well, but after further investigation it turned out to be a problem with the data returned by the server. But if the bloggers using RSS can't correctly control this Cookie response header - would we have no choice but to drop support for that header as well, as suggested for the Content-Type header? I mean, the theory of Cookies is all well and good, but if you look around to what the major players like MSN are actually doing, who are we to stick with something broken as designed merely because of theory?

April 11, 2006

Persistent Search and OpenSearch

It looks likes there's more interest (again) on saved searches and search alerts -
unto.net - Persistent Search and OpenSearch

Hopefully things have changed since I last reviewed this space in 2004.

Content-Type is dead

What a simply stellar idea here - Hixie's Natural Log: Content-Type is dead - browsers were broken, server configs are broken, so there can't possibly be any reason to use this header. The browser can't use it, so nothing else should either.
I think it may be time to retire the Content-Type header, putting to sleep the myth that it is in any way authoritative, and instead have well-defined content-sniffing rules for Web content.

You shouldn't throw away the Content-Type header even if server configs aren't easily controllable by the author. Go ahead and do the right thing - use the document context and tags as a hint on how to handle the content, use the content-type along with the content itself. There's nothing wrong with applications retrieving the resources referenced by an img tag to assume that the retrieved content is an image.
The only arguments people may have is when there is little context available when retrieving content (no hypertext source document) and the retrieved content could be interpreted in several ways.

April 09, 2006

Telomolecular Nanocircles

This description of telomere technology from Telomolecular looks interesting:

Synthetic DNA Nanocircles are a biomedical nanotechnology invented by Dr. Eric Kool and colleagues of Stanford University. These nanometer-sized circular DNAs have been shown to elongate chromosomal telomeres in vitro. They consist of DNA bases arranged in a sequence that templates the lengthening of telomeres by repeated addition of new TTAGGG sequences. Nanocircles have shown promise in telomere elongation in human tissues. By combining nanocircles with new proprietary gene therapy and delivery technologies, Telomolecular believes that nanocircles might work efficiently in living animals.

April 06, 2006

Google Base Storms Into Europe

From webpronews.com:

"The search advertising company will have a lot of catching up to do in Europe, where Amazon has relationships with businesses like UK-based Marks and Spencer. Google may have to pitch something beyond its online capabilities, though, according to the FT report:
One big UK retailer with no online presence said on Wednesday that Google's retail offer would be of interest if the internet company could also arrange for distribution. This potentially huge task has raised doubts about the long-term business models of other online retailers such as Amazon.com.

Doubts over Amazon's distribution don't have much grounding in reality. The company has built up its distribution network over the past decade and does have some knowledge in the area. Google's expertise at distribution probably does not go beyond making a change in a router's access control list and opening up a website. "

I like that last bit - Google's expertise at distribution probably does not go beyond making a change in a router's access control list and opening up a website.

The article goes on to suggest that Google team up with UPS for the distribution center. I wonder if that would work.

Google Base - Attributes

From a post by Mark Baker, I started looking at Google Base - they have a lot of interesting features - easy creation, bulk upload, extensible attributes and a
predefined set of attributes.
I've only looked at the tab-delimited bulk upload, but it looks pretty reasonable so far.

For listing products, the process seems pretty easy - http://base.google.com/base/a/1214338/9143072239359280355 . Their payment system is also integrated, so Google now has a real offer listing system with decent search. I imagine the next step is to syndicate the content as xml, wrap the data in a skin and provide good indexed search.

April 01, 2006

Service Oriented What?

Maybe I'm misreading this, but I almost fell over laughing at the post on Service Oriented Enterprise

3. REST, as the principle foundation of the Web, has proven that it too can scale - but the areas where it has proven to scale the best are related to the movement of unstructured HTML documents using a constrained set of verbs.

The but part cracked me up - from my viewpoint, it's the movement of unstructed documents and using a constrained set of verbs that allow for the scaling demonstrated by the Web.