SC 34 meetings, Copenhagen

by Alex Brown 27. June 2009 14:57

This week I attended 4 days of meetings of SC 34 working groups. WG 4 (OOXML maintenance) and WG 5 (OOXML/ODF interoperability). Last year I predicted that OOXML would get boring and, on the whole, the content of these meetings fulfilled that prophecy (while noting, of course, that for us markup geeks and standards wonks “boring” is actually rather exciting). There was however, one hot issue, which I’ll come to later …

Defects

Since the publication of OOXML in November last year, the work of WG 4 has been almost exclusively focussed on defect correction. To date over 200 defects have been submitted (the UK leading the way having submitted 38% of them). Anybody interested in what kinds of thing are being fixed can consult the material on the WG 4 web site for a quick overview. During the Copenhagen meeting WG 4 closed 53 issues meaning that 71% of all submitted defects submitted have now been resolved. By JTC 1 standards that is impressively rapid. The defects will now go forward to be approved by JTC 1 National Bodies before they can become official Amendments and Corrigenda to the base Standard. Among the many more minor fixes, a couple of agreed changes are noteworthy:

  • In response to a defect report from Switzerland, for the Strict version (only) of IS 29500, the XML Namespace has been changed, so that consumers can know unambiguously whether they are consuming a Strict or Transitional document without any risk of silent data loss. This is (editorially) a lot of work, but the results will be I think worthwhile.
  • As I wrote following the Prague meeting, there was a move afoot to re-instate the values “on” and “off” as permissible Boolean values (alongside “true”, “1”, “false” and “0”) so that Transitional documents would accurately reflect the existing corpus of Office documents, in accord with the stated scope of the standard. This change has now been agreed by the WG.

The ISO Date Issue

The “hot issue” I referred to earlier is ISO dates. What better way to illustrate the problem we face than by using one of Denmark’s most famous inventions, the LEGO® brick …


f*cked-up lego brick
OOXML Transitional imagined as a LEGO® brick

More precisely, the problem is about date representation in spreadsheet cells. One of the innovations of the BRM was to introduce ISO 8601 date representation into OOXML. However the details of how this have been done are problematic:

  1. Despite the text of the original resolution declaring that ISO 8601 dates should live alongside serial dates (for compatibility with older documents), one possible reading of the text today is that all spreadsheet cell dates have to be in ISO 8601 format
  2. Having any such dates in ISO 8601 format is particularly problematic for Transitional OOXML, which is meant to represent the existing corpus of office documents. Since none of these existing documents contain ISO 8601 dates having them here makes no sense
  3. Even more seriously, if people start creating “Transitional” OOXML documents which contain 8601 dates, then a huge installed base of software expecting Ecma-376 documents will silently corrupt that data and/or produce surprising results. (My concern here is more for things like big ERP and Financial systems, rather than desktop apps like MS Office). Hence the odd LEGO brick above: like those bricks, such files would embody an interoperability disaster
  4. Even the idea of using ISO 8601 is pretty daft unless it is profiled (currently it is not). ISO 8601 is a big complex standards with many options irrelevant to office documents: it would be far more sensible for OOXML to target a tiny subset of ISO 8601, such as that specified by W3C XML Schema Definition Language (XSD) 1.1 Part 2: Datatypes
  5. Many date/time functions declared in spreadsheetML appear to be predicated on date/time values being represented as serial values and not ISO 8601 dates. It is not clear if the Standard as written makes any sense when ISO 8601 dates are used.

The solution?

Opinions vary about how the ISO date problem might best be solved. My own strong preference would be to see the Standard clarified so that users were sure that Transitional documents were guaranteed to correspond to the existing document reality – in other words that Transitional documents only contain serial date representations, and not “ISO 8601” date representations. In my view the ISO dates should be for use in the Strict variant of OOXML only.

If a major implementation (Excel 2010, say) appears which starts pumping incompatible, ISO 8601-flavoured Transitional documents into circulation, then that would be an interop disaster. The standards process would be rightly criticised for producing a dangerous format; users would be at risk of corrupted data; and guilty vendors too would be blamed accordingly.

It is imperative that this problem is fixed, and fixed soon.

Impressions of Copenhagen

Our meetings took place at the height of midsummer, and every day was glorious (all meetings started with a sad closing of the curtains). Something of a pall was cast over proceeding by thefts, in two separate incidents, of delegates’ laptop computers; but there is no doubt Copenhagen is a wondeful city blessed with excellent food, tasty beer, and an abundance of good-looking women. Dansk Standard provided most civilised meeting facilities, and entertainment chief Jesper Lund Stocholm worked hard to ensure everyone enjoyed Copenhagen to the full, especially the midsummer night witch-burning festivities! Next up is the SC 34 Plenary in Seattle in September; I’m sure there will be many more defect reports on OOXML to consider by then, and that WG 4's tireless convenor Murata-san will be keeping the pressure on to mantain the pace of fixes  …


Jesper Among the Beers
Jesper Lund Stocholm

ODF Forensics

by Alex Brown 14. June 2009 19:35

The other day I had a phone call from Michiel Leenaars of the NLnet Foundation, who is busy gearing up for this week's ODF plugfest in the Hague. Michiel had seen my blog post on ODF validation using pipelines, and was interested in whether I could come up with something quick and dirty for providing forensic information about pairs of ODF documents, so they could be assessed before and after they are used by a tool. This could help users check if anything has been incorrectly added or taken away during a round-trip. Here's what I came up with …

Reaching for XProc

Yes again, I am going to use an XProc pipeline to do the processing. The basic plan of attack is:

  1. take two documents
  2. generate a “fingerprint” for each of them
  3. compare the fingerprints
  4. display the result in a meaningful, human-readable form

Fingerprinting XML

For a basic comparison between document I chose simply to compare the elements used, and the number of them. This obviously leaves out quite a bit which might also be compared (attributes, text) etc – but is a useful smoke test about whether major structures have been added or lost during a round-trip.

So the overall pipeline will look like this:

<?xml version="1.0"?>
<pipeline name="get-opc-rels" xmlns="http://www.w3.org/ns/xproc"
xmlns:xo="http://xmlopen.org/odf-fingerprint"
xmlns:mf="urn:oasis:names:tc:opendocument:xmlns:manifest:1.0"
xmlns:cx="http://xmlcalabash.com/ns/extensions">
<import href="extensions.xpl"/>
<!-- the URLs of the ODF documents to be processed -->
<option name="package-a" required="true"/>
<option name="package-b" required="true"/>
<!-- get the first fingerprint ... -->
<xo:make-fingerprint name="finger-a">
<with-option name="package-url" select="$package-a"/>
</xo:make-fingerprint>
<!-- ... and the second ... -->
<xo:make-fingerprint name="finger-b">
<with-option name="package-url" select="$package-b"/>
</xo:make-fingerprint>
<!-- combine them into a single document for input into an XSLT -->
<wrap-sequence wrapper="fingerprint-pair">
<input port="source">
<pipe step="finger-a" port="result"/>
<pipe step="finger-b" port="result"/>
</input>
</wrap-sequence>
<!-- style into an HTML report of differences -->
<xslt name="transform-to-html">
<input port="stylesheet">
<document href="style-diffs.xsl"/>
</input>
</xslt>
</pipeline>

A number of things are of note:

  • The ODF packages are interrogated using the JAR URI mechanism I described here
  • We’re using a custom step <xo:make-fingerprint>, which takes as its input the URL of an ODF document (“package-url”), and which emits a “fingerprint” as an XML document. Obviously this step is not built into any XProc processor, so we’ll need to write it ourselves
  • We using XProc’s wrap-sequence step to combine the two “fingerprint” documents into a single document
  • We’ll be relying on an XSLT transform to turn this combined document into a nice report, which will be the end result of the pipeline.

Writing the fingerprinting pipeline

To define our custom pipeline <xo:make-fingerprint> we simply author a new pipeline, and give it the type “xo:make-fingerprint”. This can then be invoked as a step. Here’s what this sub-pipeline looks like:

<pipeline type="xo:make-fingerprint">
<!-- the URL of the ODF file to fingerprint -->
<option name="package-url" required="true"/>
<!-- load its manifest -->
<load>
<with-option name="href" select="concat('jar:',$package-url,'!/META-INF/manifest.xml')"/>
</load>
<!-- visit each entry in the manifest which refs an XML resource -->
<viewport name="handle"
match="mf:file-entry[@mf:media-type='text/xml'
and not(starts-with(@mf:full-path,'META-INF'))]">
<cx:message message="Loading item ..."/>
<!-- load the entry -->
<load name="load-item">
<with-option name="href" select="concat('jar:',$package-url,'!/',/*/@mf:full-path)"/>
</load>
<!-- accumulate everything in a <wrapper> document -->
<wrap-sequence wrapper="wrapper">
<input port="source">
<pipe step="load-item" port="result"/>
</input>
</wrap-sequence>
</viewport>
<!-- transform the accumulated mass into a fingerprint -->
<xslt name="transform-to-fingerprint">
<input port="stylesheet">
<document href="make-fingerprint.xsl"/>
</input>
</xslt>
<!-- label it with the package URL, as an attribute on the root element-->
<add-attribute match="/*" attribute-name="package-url">
<with-option name="attribute-value" select="$package-url"/>
</add-attribute>
</pipeline>

Things to notice here:

  • We iterate through the ODF manifest looking for XML documents
  • All of the XML in the entire package is retrieved and combined into a single mega-document wrapped in an element named<wrapper>
  • We’re relying on an XSLT transform, “make-fingerprint.xsl” to do the heavy lifting and turn our mega-document into meaningful (and smaller) “fingerprint” document
  • We add the URL of the ODF document to the fingerprint using XProc’s nifty add-attribute step

The Heavy Lifting: XSLT

The XSLT to boil a document down into a fingerprint can be seen here. What it produces is a summary of the elements used in each of the namespaces the document mentions. This extract gives a flavour of the kind of result it produces:

<namespace name="urn:oasis:names:tc:opendocument:xmlns:manifest:1.0">
<element name="file-entry" count="1"/>
</namespace>
<namespace name="urn:oasis:names:tc:opendocument:xmlns:meta:1.0">
<element name="generator" count="1"/>
</namespace>
<namespace name="urn:oasis:names:tc:opendocument:xmlns:office:1.0">
<element name="automatic-styles" count="1"/> 
<element name="body" count="1"/>
<element name="document-content" count="1"/> 
<element name="document-meta" count="1"/>
<element name="document-styles" count="1"/>
<element name="font-face-decls" count="2"/>
<element name="meta" count="1"/>
<element name="spreadsheet" count="1"/>
<element name="styles" count="1"/>
</namespace>

Returning now to our main pipeline, we can see it makes two calls to (or should that be “sucks on”) the sub-pipeline to generate two fingerprints. These are then wrapped with a wrap-sequence step, and we have all we need to generate the final report. Again, an XSLT transform is used to do a comparison operation and the result is emitted as an HTML document intended for human consumption. An example of what this looks like (comparing the OpenOffice and Google Docs versions of Maya’s wedding planner) can be found here.

Putting it to use

The results of this process need to be interpreted on a case-by-case basis. Just because two applications represent notionally the same document with different XML is not necessarily a fault (though I’d like to know why Maya’s Wedding Planner has 2,000 spreadsheet cells according to Google Docs, and only 51 according to OO.o).

The most useful application of this pipeline is to check for untoward data loss when a document is processed by an application – and I understand this is a particular concern of the Dutch government. With this in mind it is possible to take this pipeline further still, checking attribute differences and even textual differences. Though there comes a point when diff'ing XML that it is best to use a specialist tool such as the excellent DeltaXML (I have no association with this product, except knowing it is well-respected through its use among clients). Many an unsuspecting programmer has come to grief under-estimating the complexities of comparing XML documents.

Online ODF Validation

Michiel also asked whether it would be possible to make the ODF validation pipeline I blogged about previously available as an online service. Coincidentally this was something I was working on anyway, though using Java rather than XML pipelines. The result is now available here. Enjoy …

The Cam at Dawn

by Alex Brown 14. June 2009 18:24
At Granchester
At Granchester
Originally uploaded by alexbrn

Another in my ongoing attempt to photograph the Cam at Grantchester in the early morning. Here is was at 05:17 this morning. One day I might catch "it" right..

No one supports ISO ODF today?

by Alex Brown 10. June 2009 11:09

IBM employee and ODF TC co-chair Rob Weir’s latest blog entry seeks to rebut what he terms a “disinformation campaign being waged against ODF”. The writing is curiously disjointed, and while at first I thought Rob’s famously fluent pen had been constipated by his distaste at having to descend further into the ad hominem gutter, on closer inspection I think it is perhaps a tell of Rob’s discomfort about his own past statements.

In particular, Rob takes issue with a statement that he condemns as “Microsoft FUD […] laundered via intermediaries”:

There is no software that currently implements ODF as approved by the ISO

Now Rob Weir is a great blogger, a much-praised committee chair, and somebody who can, on occasion, fearlessly produce the blunt truth like a rabbit from a hat. For this reason, I know his blog entry, “Toy Soldiers” of July 2008 has enjoyed quite some exposure in standards meetings around the world, most particularly for its assertions about ODF. He wrote:

  1. No one supports ODF 1.0 today. All of the major vendors have moved on to ODF 1.1, and will be moving on to ODF 1.2 soon.
  2. No one supports OOXML 1.0 today, not even Microsoft.
  3. No one supports interoperability via translation, not Sun in their Plugin, not Novell in their OOXML support, and not Microsoft in their announced ODF support in Office 2007 SP2.

While the anti-MS line here represents the kind of robust corporate knockabout stuff we might expect, it is Rob’s statement that “no one supports ODF 1.0 today. All of the major vendors have moved on […]” which has particularly resonated for users. A pronouncement on adoption from a committee chair about his own committee’s standard is significant. And naturally, it has deeply concerned some of the National Bodies who have adopted ODF 1.0 (which is ISO/IEC 26300) as a National standard, and who now find they have adopted something which, apparently, “no one supports”.

So, far from being “Microsoft FUD”, the idea that “No one supports ODF 1.0” is in fact Rob Weir’s own statement. And it was taken up and repeated by Andy Updegrove, Groklaw and Boycott Novell, those well-known vehicles of Microsoft’s corporate will.

Today however, this appears to have become an inconvenient truth. The rabbit that was pulled out of the hat in the interest of last summer’s spin, now needs to be put into the boiler. Consequently we find Rob’s blog entry of July 2008 has been silently amended so that it now states:

  1. Few applications today support exclusively ODF 1.0 and only ODF 1.0. Most of the major vendors also support ODF 1.1, one (OpenOffice 3.x), now supports draft ODF 1.2 as well.
  2. No one supports OOXML 1.0 today, not even Microsoft.
  3. No one supports interoperability via translation, not Sun in their Plugin, not Novell in their OOXML support, and not Microsoft in their announced ODF support in Office 2007 SP2.

The pertinent change is to item 1 on this list, which now has a weasel-worded (and tellingly tautological) assertion that might make the unsuspecting reader think that ODF 1.0 was somehow supported by the major vendors. Well, is it? Who is right, the Rob Weir of 2008 or the Rob Weir of 2009? Maybe I’ve missed something, but personally I’m unaware of an upsurge in ODF 1.0 support during the last 11 months. My money is on the former Rob being right here.

Okay, I use OpenOffice.org 1.1.5 (despite its Secunia level 4 advisory) out of a kind of loyalty to ISO/IEC 26300 (ODF 1.0), but I’m often teased about being the only person on the planet who must be doing this, and onlookers wonder what the .swx (etc) files I produce really are.

Blog Etiquette

As a general rule, when making substantive retrospective changes to blog entries, especially controversial blog entries, it is honest dealing to draw attention to this by striking-through removed text and prominently labelling the new text as “updated”. Failing to do this can lead to the suspicion that an attempt to re-write history is underway …

Dew Daisy

by Alex Brown 25. May 2009 19:09
Dew Daisy
Dew Daisy
Originally uploaded by alexbrn

Really enjoying the capabilities of the AF-S DX NIKKOR 35mm f/1.8G. Thoroughly recommended for all Nikon DX shooters.

Tags: , ,

Chasing the Light

by Alex Brown 23. May 2009 11:16

Before dawn things didn't look auspicious in Oxfordshire this morning - it seemed likely to be one of those days which start with a succession of gradually lighter grays. But suddenly, past five o'clock the sun blazed out a lovely yellow. There was mist in the valleys and everything looked resplendent.

I hopped in the car and drove looking for a vantage point, but failed to find one. It was as if one could be in the beauy, but never on top of it. Not for the first time I thought a big step ladder would help to photograph the English countryside, whose hedgrerows are often in just the wrong places. Still, walking round in the early morning with a ladder is likely to look a bit suspicious.

Finally, I stopped in Churchill to photograph the church; and behind, a glimpse of the light in the next valley. Next time I will need to plan better.


Tags: , , ,

Notes on Document Conformance and Portability #4

by Alex Brown 16. May 2009 19:50

In my last post I wrote about the reaction to Microsoft's ODF support in the recent service pack released for their Office 2007 product, and in particular how claims of its "non-conformance" seemed ill-founded. Now, to look a little deeper at the conformance question, I will use an XML pipeline to validate some would-be ODF documents, to get a clear-sighted and spin-free look at what the state of ODF conformance really is.

XML Pipelines: The Next Big Thing

For many years pipelines have been recognised as something the XML community badly needed. Eager markup geeks would seek out Sean McGrath or Uche Ogbuji to hear miraculous tales of how XML pipelines could be put to work; some bold experimenters would try to coerce technologies like Apache Ant into action, and some pioneers would even specify and implement their own pipelining languages – witness, for example Eric van der Vlist's xvif, or maybe XPL, which happily sits at the heart of the awesome Orbeon Forms framework.

Now however, the W3C is on the cusp of finalising its XProc language and this looks set to bring pipelines into the mainstream. I am convinced that XProc is the most significant specification from the W3C since XSL, and fully expect it to become as pervasive in all XML shops.

So what are pipelines? Well, as we know XML processing models can be described as conforming to the model: "in; out; shake it all about". The "in" bit is catered for by XML storage technologies (eXist maybe), and the "out" bit is catered for by web servers; XProc is for the "shake it all about" bit, where, with XSLT it will become the engine of many an XML process. XSLT is great for transforms but less convenient for a number of day-to-day things we routinely want to do with XML: validating, stripping element, renaming attributes, glomming together, splitting up ...  Essentially, pipelines are for doing stuff to XML in a step-by-step way, but without the overhead of a full-on programming language, since XProc pipelines are written using nice, declarative XML.

Pipelines and Office Documents

One of these typical "day to day" tasks is validating XML inside ZIPs. Both ODF and OOXML resources are not simply XML documents, but "packages" (ZIP archives) of content which include several XML documents. So to perform a full validation, we need to visit the XML resources in the package and validate them all against their governing schemas to get an overall validation result. This is exactly the sort of scenario where XML pipelines can help.

A Walk Through

I am going to describe an XML pipeline for performing ODF validation using Calabash, a FOSS (GPL v2)  implementation of XProc for the JVM written by Norm Walsh (the XProc WG chair). I'm not going to cover the absolute basics, for those (and more) consult some of the excellent material on XProc already appearing on the web such as:

We start, immediately after the root element, with a couple of "option" elements. These allow values to be passed in from the outside. In our case, we need the name of the package we want to validate ...

<?xml version="1.0"?>
<pipeline name="validate-odf" xmlns="http://www.w3.org/ns/xproc"
  xmlns:cx="http://xmlcalabash.com/ns/extensions"
  xmlns:mf="urn:oasis:names:tc:opendocument:xmlns:manifest:1.0"
  xmlns:o="urn:oasis:names:tc:opendocument:xmlns:office:1.0">

  <!-- the URL of the package to be validated must be supplied by the caller -->
  <option name="package-url" required="true"/>

  <!-- whether to enforce use of the IEC/ISO 26300 schema -->
  <option name="force-26300-validation" select="'false'"/>

Next we import some extensions. Like XSLT, XProc is designed to be extensible and already additional sets of functions are becoming available. Calabash ships with a handy function for ZIP extraction which we are going to need.

  <!-- we use the Calabash extension in this library for looking inside ZIP files -->
  <import href="extensions.xpl"/>

Now we start the processing proper. This next step uses the ZIP extraction mechanism to pull the "manifest.xml" document out of the archive and outputs that XML for onward processing

  <!-- emits the package manifest -->
  <cx:unzip file="META-INF/manifest.xml">
    <with-option name="href" select="$package-url"/>
  </cx:unzip>

As a sanity check, we are going to make sure that this manifest actually conforms to the ODF manifest schema. I made this schema by manually extracting it from the ODF 1.1 specification (here referred to as "odf-manifest.rng"). As you can see, XProc makes this kind of document validation a cinch:

  <!-- validate the manifest against the manifest schema -->
  <cx:message message="Validating manifest ..."/>
  <validate-with-relax-ng assert-valid="false">    
    <input port="schema">
      <document href="odf-manifest.rng"/>
    </input>
  </validate-with-relax-ng>

[Update: I have added an @assert-valid="false" attribute here, as this is just a 'sanity check']

Now we start to visit the individual documents in the package referenced by the manifest. This is done here using the viewport step, which offers a kind of "keyhole surgery" option allowing us to isolate bits of a document. Here we're interested in all the <file-entry> elements in the manifest which (1) have a media type of "text/xml" and (2) aren't residing in the "META-INF" folder itself.

  <!-- visit each file entry in the manifest which targets an XML resource -->
  <viewport name="handle"
    match="mf:file-entry[@mf:media-type='text/xml'
    and not(starts-with(@mf:full-path,'META-INF'))]">

For each of these <file-entry> elements, a @full-path attribute specifies the name of an XML resource in the ZIP, again we use the unzip step to pull each of these XML documents from the archive:

    <!-- assume paths are relative to package base, and extract the XML resource -->
    <cx:unzip name="get-validation-candidate">
      <with-option name="href" select="$package-url"/>
      <with-option name="file" select="/*/@mf:full-path"/>
    </cx:unzip>

Once we've grabbed an XML resource, we need to work out which schema to use to validate it. Generally this can be done by looking at a @version attribute on the root element. However, ODF does not make this mandatory and so implementations are free to omit it. ODF specifies no fall-back rules, so we need to invent our own. What I've done here is to use the version specified, but fall back to the most recent published standard (1.1) when it is not specified.

    <!-- emits the schema RELAX NG that corresponds to the ODF version -->
    <choose name="get-relax-ng-schema">
      <when test="$force-26300-validation='true' or /*/@o:version='1.0'">
        <cx:message message="Validating with v1.0 schema ..."/>
        <load href="OpenDocument-schema-v1.0.rng"/>
      </when>
      <when test="/*/@o:version='1.2'">
        <cx:message message="Validating with draft v1.2 schema ..."/>
        <load href="OpenDocument-schema-v1.2-cd01-rev05.rng"/>
      </when>
      <otherwise>
        <cx:message message="Validating with v1.1 schema ..."/>        
        <load href="OpenDocument-schema-v1.1.rng"/>        
      </otherwise>
    </choose>
    <identity name="the-schema"/>

So now we have the document to validate, and the schema to use. We simply need to apply one to the other:

    <!-- and: validates the candidate against the schema -->
    <validate-with-relax-ng>
      <input port="schema">
        <pipe step="the-schema" port="result"/>
      </input>
      <input port="source">
        <pipe step="get-validation-candidate" port="result"/>
      </input>
    </validate-with-relax-ng>

  </viewport>
</pipeline>

Et voilà, a complete pipeline for validating ODF instances. Running it against packages which contain invalid XML will cause the pipeline processor to halt and report a dynamic error, for that is the default behaviour of the validate-with-relax-ng step.

Since ODF is clear that invalid XML signals non-conformance to the spec, we know that any package which fails this pipeline is, beyond argument, non-conformant.

Running It

Rob Weir helpfully provided a ZIP of the spreadsheets used for his Maya's Wedding Planner piece. Consult his blog entry for details of how these documents were produced. Putting these 7 test files through our pipeline we get this result:

Producer                   FAIL    PASS
---------------------------------------
Google                      X
KSpread                              X
Symphony                    X
OpenOffice                  ? *
Sun Plugin                  ? *
CleverAge                            X
MS Office 2007 SP2                   X
---------------------------------------
* See update below

So, Why the Failures?

  • Google failed because for some bizarre reason the manifest.xml document in its package specified a document type declaration referring to a non-existent "Manifest.dtd"; the processor cannot find this DTD and aborts with an IO Exception.
  • Symphony failed because its styles.xml document contained a date-value of "0-00-00". This fails to match the datatyping rules the ODF 1.1 schema uses to police date values.
  • OpenOffice failed because its manifest was not valid to the 1.1 schema. Now, this is an odd result as the manifest claims to be valid to version "1.2" of the ODF schema, yet consulting the latest drafts of ODF 1.2 it appears the manifest schema is not defined there, but has been planned for being specified in a new "Part 3" of ODF. I cannot find Part 3 of ODF in draft – maybe the OOo code has been written, but the standards text not fitted to it yet. If somebody can point me to a public draft of this schema, I'd like to re-run this test. [Update: I have now been pointed at the draft of Part 3 of ODF 1.2, and it does indeed contain a new schema. This draft is unfinished and contains non conformance clause, so it is not really possible to know for sure whether a package conforms to it. However, the OOo package here is invalid to the schema. I am going to assume that Part 3 will mirror the draft of Part 1 of ODF 1.2, and so will require schema validity. On that (reasonable) basis this OOo package is non-conformant; but of course the draft might change tomorrow. We do not know quite what version of the spec is being targetted here ...]
  • The Sun Plugin also failed because its manifest uses a @manifest:version attribute which the 1.1 schema does not declare. Again, maybe this is valid to some draft schema I have not seen, but it certainly does not conform to any published version of ODF. As above, if I can get a new schema I can re-run the test. [Update: see bullet above, it's the same here]

Conclusions

There had been a lot of spin in the blogosphere about who is, and who is not, supporting ODF at the moment. This validation test focusses on a small but important area of that discussion: conformance. One of the reasons it is important is that it is testable. From the test above we have the hard fact that most of the mainstream ODF applications are failing to emit standards-conformant ODF, even for a case as simple as "Maya's Wedding Planner". Surprisingly when assessing conformance it appears KOffice, Microsoft and CleverAge are leading the conformance pack; while Sun, Google and IBM have fallen behind.

To me this merely goes to confirm one of the fundamental dynamics of standardisation; done right, standards wrench "ownership" from those who thought they owned them, and distributes that ownership through the community at large. We, as users, should be applauding the widening adoption of ODF - and should be keeping the pressure on those vendors that seem to have been left behind, to raise their games.

Notes on Document Conformance and Portability #3

by Alex Brown 10. May 2009 13:20

Now that the furore about Microsoft’s implementation of ODF spreadsheet formulas in Office SP2 has died down a little, it is perhaps worth taking a little time to have a calm look at some of the issues involved.

Clearly, this is an area where strong commercial interests are in play, not to mention an element of sometimes irrational zeal from those who consider themselves pro or anti (mostly anti) Microsoft.

One question is whether Microsoft did “The Right Thing” by users in choosing to implement formulas the way they did. This is certainly a fair question and one over which we can expect there to be some argument.

The fact is that Microsoft’s implementation decision means that, on the face of it, they have produced an implementation of ODF which does not interoperate with other available implementations. Thus IBM blogger Rob Weir can produce a simple (possibly simplistic) spreadsheet, “Maya’s Wedding Planner” and used it to illustrate, with helpful red boxes for the slow-witted, that Microsoft’s implementation is a “FAIL” attributable to “malice or incompetence”. For good measure he also takes a side-swipe at Sun for their non-interoperable implementation. In this view, interoperability aligning with IBM’s Symphony implementation is – unsurprisingly – presented as optimal (in fact, you can hear the sales pitch from IBM now: “well, Mr government procurement officer, looks like Sun and MS are not interoperable, you won’t want these other small-fry implementations, and Google’s web-based approach isn’t suitable – so looks like Symphony is the only choice …”)

Microsoft have argued back, of course, most strikingly in Doug Mahugh’s 1 + 2 = 1? blog posting, which appears to present some real problems with basic spreadsheet interoperability among ODF products using undocumented extensions. The MS argument is that practical ODF interoperability is a myth anyway, and so supporting it meaningfully is not possible (in fact, you can hear the sales pitch from MS now: “well, Mr government procurement officer, looks like ODF is dangerously non-interoperable: here, let me show you how IBM and Sun can’t even agree on basic features; but look, we’ve implemented ISO standard formulas, so we alone avoid that – and you can assess whether we’re doing what we claim – looks like MS Office is the only choice …”)

Personally, I think MS have been disappointingly petty in abandoning the “convention” that the other suites more or less use. I accept that these ODF implementations have limited interoperability and are unsafe for any mission-critical data, but for the benefit of the “Maya’s Wedding Planner” type of scenario, where ODF implementations can actually cut it, I think MS should have included this legacy support as an option, even if they did have to qualify that support with warning dialogs about data loss and interoperability issues.

But - vendors are vendors; it is their very purpose to compete in order to maximise their long-term profits. Users don’t always benefit from this. We really shouldn’t be surprised that we have IBM, Sun and Microsoft in disagreement at this point.

What we should be surprised about is how this interoperability fiasco has been allowed to happen within the context of a standard. To borrow Rick Jelliffe’s colourfully reported words, the whole purpose of shoving an international standard up a vendor’s backside it to get them to behave better in the interests of the users. What has gone wrong here is in the nature of the standard itself. ODF offers an extremely weak promise of interoperability, and the omission of a spreadsheet formula specification in ODF 1.1 is merely one of the more glaring facets of this problem. As XML guru James Clark wrote in 2005:

I really hope I'm missing something, because, frankly, I'm speechless. You cannot be serious. You have virtually zero interoperability for spreadsheet documents.

To put this spec out as is would be a bit like putting out the XSLT spec without doing the XPath spec. How useful would that be?

It is essential that in all contexts that allow expressions the spec precisely define the syntax and semantics of the allowed expressions.

These words were prophetic, for now we do indeed face a present zero interoperability reality.

The good news is that work is underway to fix this problem: ODF 1.2 promises, when it eventually appears, to specify formulas using the new OpenFormula specification. When that is published vendors will cease to have an excuse to create non-interoperable implementations, at least in this area.

Is SP2 conformant?

Whether Microsoft’s approach to ODF was the wisest is something over which people may disagree in good faith. Whether their approach conforms to ODF should be a neutral fact we can determine with certainty.

In a follow-up posting to his initial blast, Rob Weir sets out to show that Microsoft’s approach is non-conformant, subsequent to his previous statement that “SP2's implementation of ODF spreadsheets does not, in fact, conform to the requirements of the ODF standard”. After quoting a few selected extracts from the standard, a list is presented showing how various implementations represent a formula:

  • Symphony 1.3: =[.E12]+[.C13]-[.D13]
  • Microsoft/CleverAge 3.0: =[.E12]+[.C13]-[.D13]
  • KSpread 1.6.3: =[.E12]+[.C13]-[.D13]
  • Google Spreadsheets: =[.E12]+[.C13]-[.D13]
  • OpenOffice 3.01: =[.E12]+[.C13]-[.D13]
  • Sun Plugin 3.0: [.E12]+[.C13]-[.D13]
  • Excel 2007 SP2: =E12+C13-D13

Rob writes, “I'll leave it as an exercise to the reader to determine which one of these seven is wrong and does not conform to the ODF 1.1 standard.”

Again, this is clearly aimed at the slow witted. One can imagine even the most hesitant pupil raising their hand, “please Mr Weir, is it Excel 2007 SP2?” Rob however, is too smart to avoid answering the question himself, and anybody who knows anything of ODF will know that, in fact, this is a tricky question.

Accordingly, Dennis Hamilton (ODF TC member and secretary of the ODF Interoperability and Conformance TC) soon chipped in among the blog comments to point out that ODF’s description of formulas is governed by the word “Typically”, rendering it arguably just a guideline. And, as I pointed out in my last post, it is certainly possible to read ODF as a whole as nothing more than a guideline.

(I am glad to be able to report that the word “typically” has been stripped from the draft of ODF 1.2, indicating its existence was problematic.)

Curious readers might like to look for themselves at the (normative) schema for further guidance. Here, we find the formal schema definition for formulas, with a telling comment:

<define name="formula">
  <!-- A formula should start with a namespace prefix, -->
  <!-- but has no restrictions-->
  <data type="string"/>
</define>

Which is yet another confirmation that there are no certain rules about formulas in ODF.

So I believe Rob’s statement that “SP2's implementation of ODF spreadsheets does not, in fact, conform to the requirements of the ODF standard” is mistaken on this point. This might be his personal interpretation of the standard, but it is based on an ingenious reading (argued around the meaning of comma placement, and privileging certain statements over other), and should certainly give no grounds for complacency about the sufficiency of the ODF specification.

As an ODF supporter I am keen to see defects, such as conformance loopholes, fixed in the next published ODF standard. I urge all other true supporters to read the drafts and give feedback to make ODF better for the benefit of everyone, next time around.

Powered by BlogEngine.NET 1.4.5.0
Theme by Mads Kristensen

About …

Alex Brown


Links


Legal

The author's views contained in this weblog are his, and not necessarily of any organisation. Third-party contributions are the responsibility of the contributor.

This weblog’s written content is governed by a Creative Commons Licence.

Creative Commons License      Use OpenDNS  

Quotable

If you're lucky, you will never have heard of Web services, a manifestation of Service Oriented Architecture, snake-oil devised to further the dependency of organisations on vendors and their tools.

Paul Downey

RecentComments

Comment RSS