April 25, 2007

Sparkly things factory

Oh, this quote from Performancing.com is beautiful!

"Snap's preview anywhere gizmo is ruining the reading experience for millions of people. Its intrusive, obstructive and unuseful in almost every respect and use case. The fact that so many big blogs are using it, big well respected blogs, does not mean that it's useful, it just means that they, like most bloggers, have all the self restraint of a magpie in a sparkly things factory."

April 24, 2007

Amazon.com Widgets and hypertext

It looks like Typepad and Amazon are collaborating to help bloggers link to Amazon products. They have three widgets outlined, and one of them - the quick linker widget - uses custom attributes on an HTML anchor tag to make it easier to reference a set of products. The 'old school' would just use a URI, but those are hard to construct, hard to type and the wizards slow people down, blah blah blah. This not-quite-micro-formats approach is more understandable and more forgiving for hand-crafted markup. They define a new 'type' attribute for the anchor with the value "amzn". Then there are several more attributes like 'search' or 'category' or you can create a direct link via an 'asin' attribute. I assume some snippet of javascript would scan the page after it was loaded and construct the URI on the fly and set the href attribute on these anchors.
This is a creative solution to the "how do you construct a URI" problem, but it does leave spiders out in the cold, breaking the hyperlinking which defines the Web.

However, there is a simple approach they could use which would make a fully declarative and locally described document work and continue to allow auto-discover of hyperlinks to work - add a 'meta' tag to the head of the document with the URI template that corresponds to the 'type' attribute. I think the URI template proposal may need to do a bit of work related to optional or conditional patterns, or the URI template could stay static and the type="amzn" could change to be type="amzn-direct-link" or some other more qualified value.

April 21, 2007

Chocolate. Real chocolate.

The other day I was lucky to tag along with Jordan to the ESIF (Early Stage Investment Forum) in Seattle. The event was held by NWEN (Northwest Entrepreneur Network) and is sort of a 'graduation' milestone for local entrepreneurs and startups. We weren't the only software startup, but there were many other types of companies represented - bio-tech, high tech, textiles and gourmet foods. Not just any gourmet food - gourmet, hand-crafted and wonderfully made chocolate. Real chocolate. Not the burn-your-throat, over sugared, waxy poo that comes from Pennsylvania. I had heard rumors of a coconut curry chocolate so during a lull in booth duty I wandered around to score some chocolate. Several years ago a friend visited Mexico and brought home some chocolate that was flavored with chili powder and granulated sugar and the taste was fantastic so I assumed this curry chocolate would be just as good. I found the booth with Theo Chocolate and talked with two very friendly folks and asked about the curry chocolate. Naturally they offered a taste and I must say it was great - spicy but smooth and rich with chocolate. I told them that I'm always on the lookout for new chocolate because my wife really likes good chocolate, especially Dutch dark chocolate. Suprisingly, they started pulling out bars and explaining each - this is 91% cacao, this is 65% cacao and so on. Their flavors are intriguing, not only do they have coconut curry chocolate, but also Chai Tea chocolate and exotic chocolate using cacao beans from Ghana and Madagascar and Venezuela. I have only tried the Venezuela chocolate (91% cacao) and it's really becoming addictive - crisp chocolate with a dusky bitterness without being unpleasantly bitter. I hope the 'Special Limited Edition' label doesn't mean it will be hard to get in the future.



(Oh, and I just found this useful Flickr tool to upload photos directly from your desktop with a right-click menu. Quite a time saver.)

April 20, 2007

Tim Bray stole my flowers

I regularly read Tim Bray's posts, but admit that I skip the details of a few posts about hardware now and again... however his recent flurry of spring flowers made me laugh because it's like he's talking about my backyard.
It first started with a star magnolia, his was blooming a bit after ours had started flowering - here's a photo of ours.



Then it was a rhododendron, which is fairly common in the Pacific Northwest. We have a white with pink tinge and half a dozen more that haven't bloomed yet - they come out at different times of the year each with different colors, very cheerful.


Then there was the trillium which is my favorite flower. My wife bought a bulb and planted it out back by a little pond because she knew I loved them so much. Three came up this year and I missed the chance to take a good photo, but Tim's will do very nicely.


And there are others - tulips, more rhodies, daffodils. The grape hyacinths are pretty, but their fragrance is the best. Put these near your front door and you'll really know when Spring arrives. You might need to check the particular kind of grape hyacinth - ours looks a bit different.

If Tim next publishes photos of hibiscus and sunflowers, I'm going to get suspicious.

April 17, 2007

CSS - new style, same old sheet

I've been working heavily in the world of HTML and browsers over the past six months and for the most part enjoy it. It's much much better than the scene six years ago when I was at KnowNow writing fairly advanced Javascript client code that was supposed to work on Netscape Navigator and Internet Explorer - what a nightmare. Today's browser world is infinitely better. Now people only complain about box models being a pixel or two off. Well, and there's that z-index bug Microsoft hasn't fixed and probably doesn't even know about, even though everyone else does...

Anyway, my most recent learning experience has been with information extraction from Web pages - essentially extracting meaningful keywords from HTML. I must say, there's a lot of room for learning to take place here. There are several research papers I've found that are really educational, especially those that talk about extraction in the absence of a large body of other documents (corpus) to measure relevance.

As I was going through some experiments I realized that doing a decent job of extracting text from HTML requires knowledge of what 'markup' is and what the particular elements of HTML are defined to mean. Extracting meaningful phrases from markup means to ignore the markup and get to the underlying text which was marked up. But then I began to notice something - in all the advanced HTML pages that use the latest CSS to accomplish 'semantic HTML' (a phrase that I've heard tossed around pretty loosely) something is going wrong. The underlying text that is marked up is becoming gibberish. This is due to the use of CSS for layout and ignoring the effect of the tags on the text. For example, when a span tag is applied to text it is considered an 'inline' element - the underlying text is not meant to be fragmented and split apart and any extraction tool (especially a naive one that I was experimenting with) should merge the text fragments before, within and after the span element with no whitespace. But often designers will add layout and margins to the span tag in order to visually separate the text - yet the underlying markup indicates the text fragments are contiguous. How annoying. There is a simple solution - tag the text as it is intended to be read and understand the difference between 'inline' and 'block' semantics for narrative text.

April 16, 2007

Inertial electrostatic fusion

I haven't heard any news recently about inertial electrostatic (confinement) fusion, so I was pleased to read a note (on the Google Research blog of all places) mentioning Robert Bussard (former Asst. Director of the AEC) talking about inertial electrostatic fusion. Apparently it's a video - I hate video. I'd rather read about it, but this should be good. I tried to get some information from EMC2 (Bussard's company) a while back, but they didn't answer emails. Hopefully this video will give some interesting info on what they've been doing.

Wow - this is very very cool. Bussard has been working for over ten years on Navy contract to research and build a spherically confined fusion reaction - and they've done it. The funding from the Navy dried up, his company has the patent and now the next step is to get funding to build a full scale prototype. After that - cheap and limitless energy for the world. Wow.

DoubleClick and Google

There's been a lot of industry angst over Google's pending acquisition of DoubleClick. I'm fairly new to the game of online advertising so I don't know enough to predict the fallout.
Here are two good posts to read - the first is from Pulse360 Blog and is a sky-is-falling viewpoint. The second is better and is from Jordan's blog (the CEO of the startup I work for) and gives a good analysis of the value to Google and why they made the acquisition.

There are a lot of terms that I wasn't familiar with several months ago, so here is my cheat sheet of terms:

  • banner advertising - wide advertisement usually at the top of a page
  • inventory - this is what web page publishers have to offer, the space on their pages and the audience that will view those pages
  • remnant inventory - areas of a website that are not very popular and the web page publisher cannot charge lots of money for ad placement (because there are no viewers)
  • ad network - provider of ad listings
  • eCPM - effective click-per-mille (which is click-per-thousand page views)
  • creative - the 'creative content' that shows up in the ad listing. Google made big bucks doing just text and links and left the annoying flashing ads to others.
  • AdSense network - the Google system that provides ad listings to folks publishing web pages. It's supposed to provide an ad listing that is highly relevant to the page it is injected into, but that sometimes doesn't work.


(more words here)

It's interesting that the space on web pages is called 'inventory' - coming from Amazon I had a different view of 'inventory'. There are a lot of recurring themes between the world of product catalogs and inventory management that I am familiar with and the new world of online advertising. Maybe it'll all make sense eventually.

April 13, 2007

Sinterklaas Boot

And now for something completely different...
Over on my music blog I posted about a great techno dance song called Boten Anna - but this Sinterklaas Boot video which is a spoof from Holland (those crafty Dutch!) is the real gem!

April 11, 2007

Space Conference

Here's something interesting from Wired Private Launches, New Tech ... This Isn't Your Parents' Space Age
This week an estimated 7,000 government officials, corporate representatives and space enthusiasts will converge at the annual National Space Symposium here to hash out the technological, cultural and political issues surrounding the next decade's push for manned exploration of space.


I hadn't heard about the Constellation program from NASA, but it's very exciting to see the money put into space resulting in good work.

Here are a couple space blogs with a little news Space Politics and Space Report.

April 08, 2007

How to (Teach how to) Write a Spelling Corrector

Through Bill de hÓra's Bzzt Questions blog post, I found this post from Peter Norvig on How to Write a Spelling Corrector. The spelling corrector post was interesting initially because I've been doing a little text processing recently and have his code echoed the simplicity of approach I needed to use to squeeze the algorithm into JavaScript for use in a widget. But it wasn't strictly the code that really struck me - it was the multi-faceted learning opportunity that the post represents. On one hand, there is the lesson that thinking well before coding is a Really Good Thing. In this case, thinking of the problem in terms of lists, sets and maps clarifies the tasks that the software should perform. On the other hand, demonstrating the simplicity and expressiveness of Python shows that the actual tool used can be important. Especially a tool that removes obstacles between theory and practice. And on the third hand, Peter Norvig's post is a great example of education. It educates readers about processing natural language in the wild, it educates programmers on how programming languages reflect the mental model of the developer, it educates designers on how theory can and should influence the practice of software development and at a higher level it educates everyone on what a real engineering looks like.

Oh, and check this out
Fortunately, Google has released a database of word counts for sequences of up to five word sequences, gathered from a corpus of a trillion words.

April 05, 2007

Spring in the Northwest

Spring has definitely arrived here in Kirkland. One of the rhododendron and the Star Magnolia tree are in bloom, brilliant white blossoms finally brightening the yard as well as adding their perfume to the air. There is something about the warming of the forest at the beginning of the year that gives a smell of the outdoors that charges me up. I'm very lucky to be able to work from my home over the past six months and I took a break from work this afternoon to split some wood from trees that had fallen in our winter storm. (If you ever get tired of spinning your wheels and want to do something with visible results, come on over, pick up an axe and split some wood!)
Being outside is so wonderful - the sound of birds (especially the Northern Flickers), the random bumbling path of a bumble bee and even the first butterfly I've seen this year all make me grateful for the place I live. There was even a pileated woodpecker hopping around the lower part of the Doug Fir outside my office window. If my cat hadn't been asleep on my lap he would have gone nuts!