The reason this is needed is that figuring out mime types is hard enough that it requires specialists, and the specialists need to be able to draw on resources too big to be packaged as software. You can't package up a big database and related CPU to travel around the net. This is a job for a web service.
They have other responsibilities. The creator of a video has to know about cinematography and editing. The server operator has to know about routing and caching. The browser user has to know about their own job, whatever that might be. They have their specialties, I have mine. How can we work together?
I think that the question "How can we work together?" should be answered at the architectural level. I also believe that content should be tagged with its content-type - which implies that it is in a response or in a PUT or POST request. I think the problem is not one of architecture but in the realm of software tools, server configurations and application design (both client-based and server-based applications).
I'm not sure I undestand this statement:
The client and server need to be able to defer to my web app to find out what type of object it is, meaning that it has to be possible for me to coerce mime types of external objects.
From what little I know of WebJay (pretty neat idea), the server provides links to media which the client then retrieves - presumably to play in a media player. The issue seems to be that the 'industry' (the collection of low budget multimedia hosting parties) hasn't figured out multimedia and mime types - which is similar to the HTML scene ten years ago. Everybody knows to sent 'text/html' nowadays, but back then client apps tended to 'sniff' the content to guess at what should happen. Microsoft liked doing that because they know better than the content owners. They forget that the community learns and their software stays dumb.
Another wrinkle (I'm guessing) is the client to WebJay maintains a mapping of mime types to helper applications. This mapping isn't able to deal with wrong or unknown mime types - although I'm guessing that Internet Explorer probably still does some content sniffing and prompts the user with a best guess.
The goal of 'coercing the mime type' is intended to bypass or foil this client (i.e. web browser) behavior. Which is a losing battle since that is an area of security concern - and we all know how much Microsoft focuses on security in it's browsers.
Anyway - the answer while staying within the architectural boundaries are to clean up all the multimedia hosting servers, and have the multimedia clients do some content sniffing. Hmmm, perhaps a new mime type of 'audio/guess' could be used and have just everybody use it and let the client do the guessing...