November 18, 2009

The Fables of Aesop

I like to collect folk tales and old books - especially ones with good illustrations. Here are some scans from a book originally copyright in 1894 (the edition I have was printed in 1917).

Posted via email from Kinetic

November 17, 2009

A drawing that my daughter made

My daughter received a bunch of art supplies from her friends for her birthday and did a little sketch of a tree on a hill just for fun.

Posted via email from Kinetic

November 16, 2009

Making the Web faster - SPDY

Those crafty people at Google are doing some cool work to "make the Web faster". The first I had heard of this initiative it turned out to be how to make "pages" faster - a decent thing, but fairly well known. But recently some folks over there have started to look at the actual underlying issues with the gears grinding out the Web - mainly networking latency. Trying to improve the network protocol of the Web is a tricky thing - lots of people (and egos) can get involved. Surprisingly their effort seems to be off to a good start and everybody is taking it at face value and being supporting and questioning things in a positive way.

One really cool thing mentioned in their whitepaper isn't a direct 'latency' thing - it's about 'server push'. If they can really make this happen a whole knew world of application development would open up.

To enable the server to initiate communications with the client and push data to the client whenever possible.

November 06, 2009

IE and heinous "operation aborted" error

We ran into a heinous bug in IE regarding using Javascript to modify the DOM while the page is loading. It turns out that IE6 and IE7 will show a modal error dialog and then clear the page when the user dismisses the error message. On IE8 it was fixed to merely stop rendering the page at that point. How helpful.

You can find out more here on an MDSN blog

If you are unable to defer Javascript execution until after the page finishes loading, the following snippet may work in your use case.


var tags = document.getElementsByTagName("*");
tags[tags.length-1].parentNode.appendChild(n);

October 24, 2009

Zombie Ramone

My daughter is going to Redmond for a Thriller zombie dance movie thing. The only redeeming thing about a Michael Jackson related event is that she found a Ramones t-shirt at Value Village to wear. (but she cut it up, omg)

Posted via email from Kinetic

October 20, 2009

Coffee at Zokas in Kirkland

Haven't been to the new coffee place in Kirkland. They have the largest single block of wood table I've ever seen

Posted via email from Kinetic

October 12, 2009

The UnderTown in Pt Townsend

This past weekend we headed out of town to visit Pt. Townsend on the Peninsula. The weather couldn't have been better for this time of year - blue sky and sunny from the time we arrive to when we left on Sunday. We did a little walking around the beaches and forests of Ft Worden doing some geocaching and after a dinner we were looking for a cool place to hang out. Rinneke spotted this brightly lit stairway going down underground into who knows where. We could hear music drifting up so we went down. It turned out to be the UnderTown, a coffee/wine bar and they had live music on Saturday night.
It was a great way to relax, have a warm drink and spend some time together. If you are ever in Pt Townsend check it out.

Posted via email from Kinetic

September 29, 2009

Tokyo Tyrant tuning parameters

We've been working with Tokyo Tyrant for some large scale key-value lookups and the performance has been very nice, but has degraded over time. I've been poking around the various options to try to improve the performance and although there is documentation of various options, the pages are hard to read and figure out what's what. So I thought I'd collect them here for reference. I'll describe the results of tuning and tweaking in a future post.

The most recent authoritative references are here:


Tokyo Tyrant (actually Tokyo Cabinet – the storage engine) supports various types of storage – B+ Tree indexing, hash index, etc. This is configured by setting the filename or file extension to a particular value:
  • If the name is "*", the database will be an in-memory hash database.
  • If it is "+", the database will be an in-memory tree database.
  • If its suffix is ".tch", the database will be a hash database.
  • If its suffix is ".tcb", the database will be a B+ tree database.
  • If its suffix is ".tcf", the database will be a fixed-length database.
  • If its suffix is ".tct", the database will be a table database.
Each has its own set of options and while different flavors of storage may accept the same option name (like bnum), the optimal value likely should be different across storage types.
Tuning parameters can trail the filename, separated by "#". Each parameter is composed of the name and the value, separated by "=". For example, "casket.tch#bnum=1000000#opts=ld" means that the name of the database file is "casket.tch", and the bucket array size is 1000000, and the options are large and deflate.

For disk-based storage, several tuning parameters specify the on-disk layout while others specify memory and caching settings. Changing the on-disk layout requires scanning and re-writing the database data file which requires exclusive access to the file – which means taking the database offline. This scanning and re-writing process is done via tools provided with the distribution (ex: tchmgr and tcbmgr). Changing the memory and caching settings only requires a restart of Tokyo Tyrant.

We've been working only with on-disk storage via the hash and B+ Tree database engines. For a hash database the tuning parameters for the on-disk layout is limited to the size of the bucket array and the size of an element in the bucket array (choosing 'large' gets you 64-bit addressing and addressable data greater than 2GB). When a hash database file is first created, space is allocated on disk for the full bucket array. For example a database with 100M bucket size and 'large' option would start out at around 800MB. This region of the data file is accessed via memory mapped IO. There is an additional 'extra mapped memory' setting which default to 64MB – I'm not sure what this is used for, but for performance more memory is always better.

For a B+ Tree database, there are additional tuning parameters for the structure of the B+ Tree – how many members (links to child nodes) in an interior non-leaf node and how many members in a leaf node. Records are not stored in the B-Tree leaf nodes, but within 'pages'. The leaf nodes point to these pages and each page holds multiple records and is accessed via an internal hash database (and since this is a B+ Tree the records within a page are of course stored in sorted order). There is also a parameter for the bucket size of this internal hash database. One subtle detail is that the bucket size for a B+Tree database is the number of pages, not the number of elements (records) being stored – so this would likely be a smaller number than a hash database for the same number of records.

I've not yet figured out how the dfunit tuning parameter works or what impact that has on a running server, but it looks interesting.


In memory hash
bnum
the number of buckets
capnum
the capacity number of records
capsiz
the capacity size of using memory. Note - records spilled the capacity are removed by the storing order.



In memory tree
capnum
the capacity number of records
capsiz
the capacity size of using memory. Note - records spilled the capacity are removed by the storing order.


Hash
opts
"l" of large option (the size of the database can be larger than 2GB by using 64-bit bucket array.), "d" of Deflate option (each record is compressed with Deflate encoding), "b" of BZIP2 option, "t" of TCBS option
bnum
number of elements of the bucket array. If it is not more than 0, the default value is specified. The default value is 131071 (128K). Suggested size of the bucket array is about from 0.5 to 4 times of the number of all records to be stored.
rcnum
maximum number of records to be cached. If it is not more than 0, the record cache is disabled. It is disabled by default.
xmsiz
size of the extra mapped memory. If it is not more than 0, the extra mapped memory is disabled. The default size is 67108864 (64MB).
apow
size of record alignment by power of 2. If it is negative, the default value is specified. The default value is 4 standing for 2^4=16.
fpow
maximum number of elements of the free block pool by power of 2. If it is negative, the default value is specified. The default value is 10 standing for 2^10=1024.
dfunit
unit step number of auto defragmentation. If it is not more than 0, the auto defragmentation is disabled. It is disabled by default.
mode
"w" of writer, "r" of reader,"c" of creating,"t" of truncating ,"e" of no locking,"f" of non-blocking lock


B-tree
opts
"l" of large option,"d" of Deflate option,"b" of BZIP2 option,"t" of TCBS option
bnum
number of elements of the bucket array. If it is not more than 0, the default value is specified. The default value is 32749 (32K). Suggested size of the bucket array is about from 1 to 4 times of the number of all pages to be stored.
nmemb
number of members in each non-leaf page. If it is not more than 0, the default value is specified. The default value is 256.
ncnum
maximum number of non-leaf nodes to be cached. If it is not more than 0, the default value is specified. The default value is 512.
lmemb
number of members in each leaf page. If it is not more than 0, the default value is specified. The default value is 128.
lcnum
maximum number of leaf nodes to be cached. If it is not more than 0, the default value is specified. The default value is 1024.
apow
size of record alignment by power of 2. If it is negative, the default value is specified. The default value is 8 standing for 2^8=256.
fpow
maximum number of elements of the free block pool by power of 2. If it is negative, the default value is specified. The default value is 10 standing for 2^10=1024.
xmsiz
size of the extra mapped memory. If it is not more than 0, the extra mapped memory is disabled. It is disabled by default.
dfunit
unit step number of auto defragmentation. If it is not more than 0, the auto defragmentation is disabled. It is disabled by default.
mode
"w" of writer, "r" of reader,"c" of creating,"t" of truncating ,"e" of no locking,"f" of non-blocking lock


Fixed-length
width
width of the value of each record. If it is not more than 0, the default value is specified. The default value is 255.
limsiz
limit size of the database file. If it is not more than 0, the default value is specified. The default value is 268435456 (256MB).
mode
"w" of writer, "r" of reader,"c" of creating,"t" of truncating ,"e" of no locking,"f" of non-blocking lock



Table
opts
"l" of large option,"d" of Deflate option,"b" of BZIP2 option,"t" of TCBS option
idx
specifies the column name of an index and its type separated by ":"
bnum
number of elements of the bucket array. If it is not more than 0, the default value is specified. The default value is 131071. Suggested size of the bucket array is about from 0.5 to 4 times of the number of all records to be stored.
rcnum
maximum number of records to be cached. If it is not more than 0, the record cache is disabled. It is disabled by default.
lcnum
maximum number of leaf nodes to be cached. If it is not more than 0, the default value is specified. The default value is 4096.
ncnum
maximum number of non-leaf nodes to be cached. If it is not more than 0, the default value is specified. The default value is 512.
xmsiz
size of the extra mapped memory. If it is not more than 0, the extra mapped memory is disabled. The default size is 67108864.
apow
size of record alignment by power of 2. If it is negative, the default value is specified. The default value is 4 standing for 2^4=16.
fpow
maximum number of elements of the free block pool by power of 2. If it is negative, the default value is specified. The default value is 10 standing for 2^10=1024.
dfunit
unit step number of auto defragmentation. If it is not more than 0, the auto defragmentation is disabled. It is disabled by default.
mode
"w" of writer, "r" of reader,"c" of creating,"t" of truncating ,"e" of no locking,"f" of non-blocking lock

September 26, 2009

At Sixty Acres for Stephan's soccer game

Great game so far and weather is sunny now.
Stephan's team dominated the first half and are having a good second half so far

Posted via email from Kinetic

September 25, 2009

Working hard is overrated

Very insightful post about startups and hard work from someone who has been there.

We agreed that a lot of what we then considered "working hard" was actually "freaking out". Freaking out included panicking, working on things just to be working on something, not knowing what we were doing, fearing failure, worrying about things we needn't have worried about, thinking about fund raising rather than product building, building too many features, getting distracted by competitors, being at the office since just being there seemed productive even if it wasn't -- and other time-consuming activities.


Much more important than working hard is knowing how to find the right thing to work on. Paying attention to what is going on in the world. Seeing patterns. Seeing things as they are rather than how you want them to be. Being able to read what people want. Putting yourself in the right place where information is flowing freely and interesting new juxtapositions can be seen. But you can save yourself a lot of time by working on the right thing. Working hard, even, if that's what you like to do.

Korean BBQ with Rubicon team

This past Wednesday several of the Rubicon Project engineering team went out for dinner at a Korean BBQ in LA. It was a good mix of a working meeting - talking about engineering practices and development in general - and good food and drink. The meat was all very tasty and only at the end did I find out what some of it was. I had never had beef tongue before - I always swore I wouldn't taste anything that could taste me back - but it was all really good, especially the soju (a lot like vodka).

Posted via email from Kinetic

September 20, 2009

PubSubHubBub - feed futures

Cool - Bob Wyman is involved in the PubSubHubBub discussion group. In this post he hints at content-based routing - not just topic based routing - being possible in the future with PSHB. It's time to find some excuse to use this new PSHB technology at my day job.

For instance, while today we think mostly about "topic-based" distribution -- i.e. subscribing to known feeds by name, in the future, people might like to subscribe to "concepts" or "words" that appear in the content of updates. Rather than saying "Tell me whenever Tom's feed changes!", you might like to say: "Tell me whenever any feed mentions PSHB." In that case, down stream systems are going to want to have the content (not just a notification of change) in order to match updates to subscriptions.

September 18, 2009

Real-time web, take 2

Bernard Lunn has a good post over on ReadWriteWeb putting the recent PubSubHubBub/RSSCloud news into context. Very funny that he calls KnowNow a "blow out", but I think he correctly identified their issue being a focus on the enterprise market (when that market had fairly established solutions).

Wish I hadn't been so busy over the past two years and could have worked on helping build PubSubHubBub-style technology.