SC34 meetings in Seoul: DTLL 
2006-05-28, 00:17
I am in Seoul for SC34 meetings, and in particular to work on DTLL, the Datatype Library Language — or ISO/IEC 19757 Part 5, as it hopes to be known.

DTLL is the brainchild of Jeni Tennison and was first described by her at the XMLOpen Conference in 2004, to general applause. As editor of Part 5 my job is to take her vision and turn it into an International Standard – this requires a particular kind of preparation of the text and committee work, alongside the various rounds of international voting which constitute the ISO JTC1 process.

I am also, in parallel, preparing an implementation of DTLL in Java, not least because I believe that there's nothing like having to code a spec to crystallise any gotchas. This will be released on sourceForge under an open source licence when I can be confident that the language is close to its final form, and that my code works :-)

The core idea of DTLL is the use of an expression language (regular expressions, say) to split text content up in such a way that it is represented as a tiny XML document. Various tests can then be performed on this XML to determine whether a value conforms to the rules for a particular datatype.

So to take a timely example, today's date in ISO 8601 format is

2006-05-28

and this might be decomposed into year, month and day parts with the following regex:

(?[YYYY][0-9]{4})-?(?[MM][0-9]{2})-?(?[DD][0-9]{2})

Note the used of named sub-expressions here (i.e. the '?[YYYY]', '?[MM]' and '?[DD]' at the beginning of group-matching patterns) – these 'name' the matched groups so that, if this expression was applied to our example date, we'd get:

YYYY - 2006
MM - 05
DD - 28


A DTLL processor will represent these matched groups as an XML document with a document element named to match the name we've given to our datatype. So, if we'd called this datatype 'date' we'd get (with Namespaces omitted for brevity):
<date>
 <YYYY>2006</YYYY>
 <MM>05</MM>
 <DD>28</DD>
</date>

If the content we're targeting doesn't match the regex, then we know it's not valid to our rule. But if it does, then we can get to work with XPath to validate the data further. So, to check that our date value is between 1 and 31 we can say

<condition test='/date/DD >= 1 and /date/DD <= 31'/>

And to specify tests for correct values depending on the month, and take account of leap years we can say (here's a complete datatype definition):

<datatype name='date'>
  
<parse>
 <regex>(?[YYYY][0-9]{4})-?(?[MM][0-9]{2})-?(?[DD][0-9]{2})</regex>
</parse>
  
<condition test='/date/DD >= 1 and /date/DD <= 31'/>
<condition test='/date/MM >= 1 and /date/MM <= 12'/>
<condition test='(/date/MM = 1 or /date/MM = 3 or /date/MM = 5 or
 /date/MM = 7 or /date/MM = 8 or /date/MM = 10 or
 /date/MM = 12) or /date/DD <= 30'/>
<condition test="/date/MM != 2 or
 /date/DD <= 28 or
 (/date/DD = 29 and
 (/date/YYYY mod 400 = 0 or
 (/date/YYYY mod 4 = 0 and
 not(/date/YYYY mod 100 = 0))))" /> 
                 
</datatype>


Et voila, a test for ISO 8601 dates (which is incidentally, more conformant than the test specified by W3C XML Schema since, unlike there, the '-' separator between the parts of the date is here optional).

The language has features for typed variables which makes definitions in practice more concise and modular, but this gives a flavour I hope.

Jeni is speaking on DTLL at the Extreme Markup 2006 Conference, by which time I'm hoping the language itself will have stablised.

The latest official version of DTLL document is always available from the DSDL homepage, but if you're interested in a more up-to-date status report, please feel free to contact me.
add comment ( 583 views )  | permalink  |  stumble this |  digg it!

Debating Gender Difference 
2006-05-20, 10:57 - General
Long discussion this morning with Sarah about gender difference, following the appointment by Cambridge University's Faculty of English of four young men to four vacant lecturing positions.

Of course these four young men might well have been the four best candidates for ths posts, but with a field evenly split between the sexes there's a thought that same kind of unconscious bias plays a part.

Relatedly, a very interesting debate between Marc D. Hauser and Elizabeth Spelke contains the startling claim that academics, when presented with a vita, will rate it higher if they are told that candidate is male.

The wider question of the debate, on the relative abilities of men and women, was however left in the balance ...
add comment ( 665 views )  | permalink  |  stumble this |  digg it!

XML:UK Are Calling for Participation 
2006-05-20, 08:20 - Announcements


XML:UK are running a “Member Presentation Day” on 27 June in Reading.

They're looking for short (15-20 minute) presentations on any XML-related topic and give as examples:

- a survey of the uses of XML within your organisation
- a current or recent project
- a burning issue that you feel needs more attention from the XML community.

This is a great change for members to learn about what's going on. If it's anything like the last event we can expect tales of triumph and woe in equal measure ...


<<First <Back | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |