News

The Road to CoreObject Part 2: Why Bother?

Posted on 30 July 2007 by David Chisnall

Since the last post, a lot of people have asked me 'why are you doing this? What advantage does it actually give?' In this post, I'll try to explain.

One Abstraction, Two Uses

What is a file? Over the last year, I've asked a number of people that, from computer scientists to technophobes. None has managed to give me a clear answer. The next question I asked is 'What is a document?' Everyone I asked gave me a clear answer.

From a user interface perspective, it's clear that a document is a better abstraction than a file. A file is a very convenient abstraction for operating systems; it's basically a virtualised block device with a simple text key (the path/filename) that can be used to uniquely identify it. It is not a good abstraction for users.

Files are used for two things:

Storing a document.
Publishing a document.

From a user interface perspective, these are very different tasks. Storing a document is not something that should ever need to be done explicitly. Raskin's first law states:

A program shall not harm a user's data, or through inaction allow the user's data to come to harm.

Everything I do to a document should automatically be stored if possible. In some situations, such as sudden power failure, some data loss is inevitable, but the program should do everything it can to minimise the chance of avoidable data loss. A simple corollary to this is that versioning information should also be stored. If I hit select all, delete, then I don't want the stored form of my document to be overwritten with an empty document. I want an undo feature, and I don't want this to be contingent on keeping the document in memory (select all, delete, {autosave}, power failure, panic).

CoreObject's serialisation function does this. You don't need to explicitly save a document. From the time an application tells CoreObject to manage the object graph representing the document model, you have the ability to replay every single change you've made to it (this actually works in the version in /trunk now, although it needs more testing).

While you don't have to save a document explicitly, you might want to tag it with some metadata. Some of this will be created automatically for all objects (creation dates, modification dates, etc). Some will be created automatically for certain object types (e.g. colour depth, word count, table of contents). Some can be specified manually. This will be indexed by the higher layers of CoreObject. These tags can either be assigned to a specific version, or to the latest version. You might tag a book you are working on with the book title, and also tag the version you sent to the proof readers, so you can jump back to that one to compare with the comments they gave you.

Publishing is a very different problem. When you publish a document, you typically don't want to include revision information, you want a snapshot. A few government agencies have been embarrassed in recent years by forgetting that Word Documents are intended for storing, not publishing, and include a lot of revision information.

How does CoreObject help with the publishing? Well, the current implementation doesn't (yet), but the plan is to integrate something like Apple's UTI (or, more likely, UTI itself). This is a type hierarchy supporting multiple inheritance that is orthogonal to the object hierarchy. Each compliant object will publish a number of types that is inherits from, such as rich text, or image. It will also support exporting its contents as each of these. For complex compound documents, the root document will simply query the enclosed components, and assemble a composite of images, text etc. Each object only needs to be able to export to something one layer up the type hierarchy. For example, a word processor might export as rich text, and the system would then convert this to text using a shared component.

What About My Friends

The other important feature of CoreObject is collaboration, which is central to the Étoilé vision. CoreObject's serialisation of invocations allows these to be sent across any kind of network connection. In 0.3, there will be a XML-over-XMPP system for this. This will stream changes between two (or, in theory, more) users as they are made. Some systems exist for doing this in very specific cases, such as SubEthaEdit for text and a few whiteboarding solutions for images. CoreObject will allow us to do this in the general case. Any document that works with CoreObject will be able to be shared in this way.

Because it only sends the deltas, this approach will scale to relatively large object types. Imagine something like a raw digital photograph. These can easily be several tens of megabytes. The changes made to them, however, are usually of the form 'alter the brightness level by 5%,' or 'apply this filter with these parameters.' These are not very big, and so once the photograph is initially shared, it can be tweaked in a collaborative fashion easily.

This is even true of video editing. Something like Apple's Final Cut does non-destructive editing. While the source footage is often tens of gigabytes, the project file is very small, since all it contains are instructions like 'take insert ten seconds from source file x at y in the timeline,' and 'cross fade for 10 seconds.' With CoreObject, we get this kind of non-destructive editing for free, and we also get the ability to collaborate on documents like this for free. We could have two people editing the same video on their own machines and having the changes automatically kept in sync. Once it's done, they export it as something like MPEG-4, and anyone can view it irrespective of whether they're using Étoilé.

Labels: CoreObject

News: Stay up to date

News

The Road to CoreObject Part 2: Why Bother?

One Abstraction, Two Uses

What About My Friends

Search

Status

Archive