April 08, 2007

How to (Teach how to) Write a Spelling Corrector

Through Bill de hÓra's Bzzt Questions blog post, I found this post from Peter Norvig on How to Write a Spelling Corrector. The spelling corrector post was interesting initially because I've been doing a little text processing recently and have his code echoed the simplicity of approach I needed to use to squeeze the algorithm into JavaScript for use in a widget. But it wasn't strictly the code that really struck me - it was the multi-faceted learning opportunity that the post represents. On one hand, there is the lesson that thinking well before coding is a Really Good Thing. In this case, thinking of the problem in terms of lists, sets and maps clarifies the tasks that the software should perform. On the other hand, demonstrating the simplicity and expressiveness of Python shows that the actual tool used can be important. Especially a tool that removes obstacles between theory and practice. And on the third hand, Peter Norvig's post is a great example of education. It educates readers about processing natural language in the wild, it educates programmers on how programming languages reflect the mental model of the developer, it educates designers on how theory can and should influence the practice of software development and at a higher level it educates everyone on what a real engineering looks like.

Oh, and check this out
Fortunately, Google has released a database of word counts for sequences of up to five word sequences, gathered from a corpus of a trillion words.

No comments: