ontologies

2020 - Week 49

Logicifying the procedure model

Week 49 started as week 48 ended. At least we think it did. Weeks feel like something we used to have, being more difficult to define these days. Anyway, young Robert and Michael staggered from Monday morning showers and coffee to once more fire up pixels and chat all things logical. Our design notes are now reordered, rewritten and inclusive of recent thoughts on logic gates in the form of words, pictures and truth tables. Should our loyal reader find a moment in their busy social calendar, we do hope they’ll make a click and take a read. Feedback and pull requests are, as ever, more than welcome.

Design notes tidied, librarian Jayne together with computational experts Robert and Michael spent part of Tuesday morning looking once more at the legislation citation blobs on our newly logical Proposed Negative Statutory Instrument procedure map. Jayne had made a first stab at attaching legislation citations but, following careful consideration, some umming and indeed some ahing, all blobs have now been placed on logic gate steps. Which, given the legislation is often specifying conditionals, seems to make sense. We anticipate this is pattern we might well see repeated across all of our procedure maps.

Away from the meta work, Librarian Jayne and Michael set about taking the previous week’s scribblings on our classically logical PNSI procedure and entering it into the machines. At which point, following some gentle prodding by our Jianhan, the machines kindly spat out their own version of our map, looking not unlike a line drawn, spatchcocked chicken. Next week we plan to take what we drew and what the computers drew and compare notes. Probably with more coffee and some marker pens. Being most careful to not get ink on the floors. We’re forced to work on our own floors these days and we have to be more careful. In the meantime, we think having our first fully logical procedure map within sight of silicon is really rather cool. But possibly you have to be us.

Orders being standing

As we entered Advent calendar season, our first window popped open to the kerching of Heroku tokens topping up. Nestled behind the door was our work to date on making standing orders addressable. Our regular reader will recall we have something of an issue with the citation format of such things being based on the position of an order in a list. Positions which can and do change over time. So what is currently House of Commons public Standing Order 14 was once Standing Order 4. Then 5. Then 6. Then 13. Before finally becoming 14 on the 20th March 1997. Who knows what number it might have a decade from now. Whilst even our primitive monkey brains can learn to cope with such changes, the machines we’re forced to work with are more primitive still. The conflation of label, identifier and list position makes it harder than it should be to construct persistent identifiers for orders and without persistent identifiers we can’t generate persistent URIs. And we all know the sacrifices a URI has to make to be considered cool these days.

Luckily our friends in Oxford have been working on a project to codify standing orders for assorted legislatures. Orders covering public business in the House of Commons being one such dataset. We’ve taken their data and reshaped it a little using a Rake script that now takes about two hours to run and causes the fan on Michael’s computer to do a reasonable impression of Apocalypse Now. We’ve also changed the nomenclature a little. What the Parlrules folk call articles we call orders, what they call sub-articles we call fragments and what they call versions we call revision sets. All this language still needs to be checked with clerkly colleagues and it’s more than probable that Anna, David, Paul, Martyn and Matt will hear from us shortly. It’s a different way of working for us. Usually we start with an empty whiteboard and a kindly clerk, which means we tend to get the language established fairly early on. But starting by poking at someone else’s dataset is also kinda fun and saves a lot of work on data entry. Thanks friends in Oxford.

So far we have made lists of standing orders at all points in time at which some kind of revision happened. You can anchor link to the order as numbered in the traditional style by appending #order-{order-number} to the URL. And you can anchor link to a fragment of an order by appending #order-{order-number}-{paragraph number}. Better still you can link to either an order or a fragment of an order quite independently of any list positioning. And see all versions of that order or fragment over time. This week Michael added a new table to capture where a new revision set had introduced a revision to a fragment and set about comparing the text of individual fragments to their direct predecessors. Having confirmed with David and Paul that no revision that merely changes the case of the text could be considered substantive, we now have two types of revision. Those where only the casing changes are marked as minor, those where the wording changes are considered major. At the level of fragment and order, lists of all revisions, major revisions and minor revisions are now available. At least anecdotally, most of the minor revisions appear have taken place in the early 1900s. We suspect a clerk at the time did not like capitalisation. That or their shift key was broken.

Michael also spent some considerable time writing code to chase standing orders back through time to build a simple node and edge graph of position changes over time. His intention was to spit out JSON and load it into some form of Sankey diagram to get a sense of when major edits to standing orders had happened. An intention that only half worked in practice. The application does indeed now output Sankey compliant node / edge JSON but loading the visualisation takes upwards of three minutes. And then crashes his browser. More work is needed here.

Finally, although the data model we’ve generated is still subject to change and the language used needs more work, Michael has made a start on translating our relational model to something more ontological. So far it’s just a picture. Turtle and HTML to follow next week we hope.

Your weekly egg timer update

Once again we have no real updates. Though this week did see the arrival of another of those start date / end date Word tables that JO Jane must churn out. Librarian Jayne did some testing against the dates set out by JO Jane and declared both brain and code to be in agreement. So that’s good. We plan to wait for a recess to check that our dates still agree for instruments laid when benches are empty before we roll out the egg timer further. Should God ever grant us a recess.

All about the collaboration

Anya and Michael met with John and colleague on Tuesday. Back in the day, when we still left our houses, John was a denizen of Newspeak House and we spent many a happy hour chatting data on their smoking balcony. These days John has ventured to America and re-entered academia. He’s currently looking at how journalism shapes debate within Parliament, working on a project to investigate if and how the language used in the kind of journalism that’s shared widely on Facebook is reflected back in Hansard. Some time back, Anya supplied him with a session’s worth of Hansard, annotated with the subject indexing her team applies. This data proved useful and now John would like more. Much more. A request for more sessions and more annotations is in the pipes.

One problem John faces is a problem we share. How to take material subject indexed at a granular level and use something more nuanced than simple transitivity to apply higher level categorisation. Anya and Michael pointed to the ongoing work to interlink the Parliament Thesauri with Wikidata, talked a little about the BBC project to use Wikipedia categorisation hierarchies to generate high level categories and, of course, wondered out loud what bringing in additional data from Wikidata might help achieve. We intend to stay in touch.

Documenting the documentation

Regardless of eventual implementation, we like to document our models as RDF Turtle. It allows us to put the model, the comments, the examples and the annotations into one place so we don’t have to waste time on the horrors of separate data dictionaries and suchlike. We use a tool called LODE to turn fairly human friendly Turtle into very human friendly HTML. But, as good as LODE is, it doesn’t always pick up on everything we include. Diagrams, for example, are included as foaf:depictions, contributors as foaf:makers and links to additional documentation as rdfs:seeAlsos. All of which are ignored by LODE which means we’re forced to hand edit HTML. Like it’s 1989. So this week Robert and Michael spent a good hour populating a brand new Trello board with all the things they’d like to see from their own Turtle to HTML parser. With a fair wind we hope Robert might make a start on such a thing in due course. Though in fairness he has probably made at least ten starts in at least six languages so far. Still, eleventh time lucky as we like to say.