April 19, 2006

Delta feeds - RFC3229+feed

About a year and a half ago, Bob Wyman was instrumental in defining an approach to greatly reduce the load and bandwidth used by applications that polled for changes to RSS/Atom feeds.
The other week, he noted that Microsoft will support the RFC3229+feed approach as well - which is good.

The only problem I have with this approach is that I think it is simpler to use hyperlinks, and I haven't seen a real comparison between the two. Both approaches have the client application maintain state of what data was last retrieved, but using hyperlinks has more chances for pre-existing caching servers to work without modification. I think the Atom protocol has defined something like this, but I couldn't follow the email threads.

To use hyperlinks, the data returned in a feed would have a link to the 'next' (more recently changed) posts. The client would then follow that link, which would either be empty (and optionally have a cache-control header to indicate how long to wait before checking again) or have more data - along with another 'next' link. The client just keeps following the links. The client would have two URIs - the original, well-known location that new readers start from, and the changing one which is the set of data most recently retrieved by that particular client. The server decides what the 'next' link is and what it contains - the data would be very cachable across all clients, merely by checking the URI.
The downside of this approach is the need to put the link within the content of the response - or add a response header for that location if the content isn't easily extended.

I put together a sample application that shows how this works - this is a simple html/javascript chat client, this is a link to the list of messages.

2 comments:

Bob said...

"Simple" isn't always the goal. Also, it is necessary to ask "Simple for who?"

The link approach is viewed by many to be simpler for the feed generator, however, it introduces greater complexity for feed readers and increases network load in unfortunate ways.

If the link approach is used, then "following the links" will often result in an increase in the number of HTTP requests made to the server. Some of the impact of these multiple requests can be reduced by using persistent connections, however, even so, the number of round-trip packets that are needed to read a particular set of entries is increased over the case with RFC3229+feed. The result is a larger number of network packets, increased log file sizes, etc. If persistent connections are not used, then each request will result in a new TCP/IP session being established. Each session will be subject to TCP/IP "slow start", will clutter server connection tables, and generally wastes bandwidth as a result of session establishment code as well as increased packet counts.

In any case, client code will need to be more complex when using a link based approach since clients will need to implement algorithms to determine how "far back" in the chain of links to follow and will then need to reconstruct "feeds" from the parts they obtain. Some "simple" clients that are currently in use won't be useful. Those are the ones that pull an entire feed and then present it to the user -- often after an XSLT transformation to make it readable. (This is often done on small or mobile client devices in order to reduce client complexity.) Forcing the client to pick up and reassemble pieces will be a burden.

It should also be noted that the need to determine "how far back" in the list of links to go would probably force servers to maintain strict "when published order" in the feeds they generate since clients will generally probably only follow back links until they find an item they consider a duplicate of something they have seen before. What this means is that updates or modifications to previously published items *must* be put at the "head" of the list of entries otherwise they will never be seen by the clients. There are a variety of reasons that feed developers would not be pleased with this requirement to strictly order the contents of their feeds.

The above notes are just a few considerations that should be weighed when arguing that link-based approaches are "simpler."

bob wyman

Mike said...

Bob,
Thank you very much for taking the time to write down this very good comparison. I've read it briefly definitely agree that both producers and consumers of feeds need to be considered. I will be looking into it more closely in the next week and respond as I can.