ontologies

2025 - Week 40

A possible misnomer / starting from first principles

Our dear reader will be aware that for the past several years we have been attempting to build an open data platform. Please believe us when we add, it’s felt longer on the inside. A series of meetings this week caused us to question not only how we were going about that but also to what purpose. These are some notes on that.

Unlike many organisations, the open part of the equation has never been a difficulty. Parliament, like everywhere else, has non-open data in the form of HR, finance and facilities. That’s not the flavour of data we’re talking about. Our concerns relate to the procedural and the procedurally adjacent. The openness is helped by the existence of the Open Parliament Licence - think of the Open Government Licence, with the addition of a portcullis or two. Which itself is not so different from the Creative Commons Attribution licence. In our context, openness is a solved problem.

In many ways, having a platform to publish that open data is also a solved problem. Parliament’s Developer Hub makes available a set of open APIs, all published under the aforementioned Open Parliament Licence. It could be said then, that Parliament already has an Open Data Platform, at which point, you may be wondering why build another one. Well …

The problem we’re battling is not openness but Conway’s Law. Parliament’s approach to internet connected computers has tended to concentrate on digitising existing office functions. An example: the Journal Offices have always been responsible for the receipt of papers laid. Back in the olden days, this took the form of bundles of papers being transported down Whitehall and into Westminster. Happily, the process has been digitised, meaning a lot less shoe leather is used up. All good stuff.

Less good is the system still only deals with that one function. It neither cares about nor implements description of the procedures that commence once a paper is laid. This is true for a myriad of other systems. Hansard knows what was said and who said it, but carries no semantics about the debate. It does not know that some contributions formed part of a second reading debate, nor which bill that debate was on. Motions may also be tabled electronically, but there is no data connecting those motions to any debate that ensued. The motion having been disposed of by division, the division system does not have any data describing the motion being divided on, nor the debate that led to the decision. Committee reports published along the way do not carry any data about what was being reported on.

For anybody attempting to follow the passage of a piece of legislation or a treaty through Parliament, you’d typically need some idea of which committees might take an interest, what days those committees tend to meet and where they publish their reports. This is quite an overhead for people inside Parliament, and still worse for the uninitiated.

What we’ve ended up with is a set of systems, but not a system of systems. No systems combining to ever be greater than the sum of their parts, as our systems thinking friends might say. If you’ve ever clicked around the Parliament website, you’ve probably spotted the problem. Things that are linked in the real world, fail to exercise hypertext depicting those real world links in a way that’s amenable to either computers or people.

Happily, our crack team of librarians have spent the last 20 and more years taking all - or at least most - of that parliamentary material and not only subject indexing it but also procedurally indexing it and tying together items in a way that reflects procedural reality. Unfortunately, all this work happens on boxes inside the parliamentary network. The job then, is to port the work of our librarians to a box that the internet can see. At which point, we can start making more things from it.

Given everyone’s now fallen in love with machine learning, it should be pointed out that there is no magic in that box, nor could there be. Entity extraction might be possible, if only Parliament would learn to stay still. But that seems unlikely. As for interlinking, the source systems fail to share either identifiers or models. Not so much implementations of bounded contexts as implementations of bondaged contexts, as our domain driven design friends might say. The librarians’ work requires not only knowledge of parliamentary procedure but also what is going on in the world, as of yesterday. And very good luck automating that.

Once the librarian box is in a public place, there’s more work to do, improving the feeds we receive, the models used to describe that information, the information management principles which will change as the models change, the tooling our crack team of librarians use and the friendliness of the data outputs. There is, as Young Robert often puts it, no shortage of work here.

This is all to say, we are not really in the business of making an “open data platform” but rather a “linked open data platform”. The emphasis being firmly on the links. It’s possibly a little late in the day to retitle the project, but please rest assured - any presentations we prepare will definitely have the word linked in them. We are, after all, hypertext-type people.

In the meantime - and in the absence of our linked open data platform - we continue to do what we can, where we can. One example being …

Toward a knowledge graph for the Librar(y/ies)

In more optimistic data integration news, team Susannah and Silver continue to make magnificent progress on our Subject Specialist Finder™ / Library Knowledge Base™. A quick glance at that particular Trello board revealing a whole heap of cards - covering all angles from data model to ETL to tweaks to the website itself - have now moved to the ‘weeknotes and happy’ column. Lovely stuff.

The Library Knowledge Base™ is - in our opinion - a particularly fine example of the sort of service you can start to build when you’ve put some proper thought and effort into data models, information management and identifiers. And a decent rebut to the idea that you can save money on the thinking step if you can only find the correct supplier with the correct flavour of magic box. Not that we’ve grown cynical in our old age. Nor indeed “tainted by experience”, as one wag once put it.

Unfortunately, the resulting website is one of the few things we work on that involves a smattering of personal data. In this case it’s only the contact details of our crack team of researchers, but we doubt they’d want their telephone numbers scattered all over the web. So we can’t really point you at the results of all that effort. Once more, our dear reader must trust us.

Let’s turn back to the more open stuff.

Waddingtonification and Korrisification of the browsable procedure space

Librarian Jayne and her computational helpmates, Young Robert and Michael, continue to churn through the feedback gathered from recent testing sessions with Messrs. Korris and Waddington. Those changes have now gotten so granular, we won’t bother you with them here. Happily, in recent weeks, Shedcode James kindly offered to teach Young Robert and Michael all about branches and pull requests and they’ve taken to it like ducks to water. Who said old dogs can’t turn new tricks? So, in the unlikely event there’s anyone out there interested in the minutiae of template changes, GitHib remains your friend.

As a side effect of those changes, again with much help from Mr Waddington, we’ve also published a new and hopefully improved version of our delegation model. Feedback, as ever, more than welcome.

Still with our Procedure Browseable Space™, Jayne and Michael took a page-by-page tour of what they’ve managed to build so far. It turned out that, having built one heck of a lot of page types, those pages inevitably ended up with an inconsistency or two. That has now been rectified. Jayne and Michael have also compiled a long list of page types lacking equivalent data views. Should you glance at their Trello board, you’ll find a ‘data views to do column’, with cards covering both CSV downloads and RSS subscriptions. For now - in some attempt to avoid unnecessary duplication of work - those cards sit on the far side of our proposed move to the design system. Which is, in turn, currently blocked because …

Painting in pixels

The latest application to leave the corporate colours paint shop is our beloved Egg Timer™. Until last week, it had been resplendent in an ersatz version of the parliamentary design system, hand-rolled by Robert and Michael. It now makes use of the design system proper courtesy of the nifty little Ruby Gem supplied by Shedcode James.

In truth, the latest respray was not without pain. Much like our Procedure Browseable Space™, our beloved Egg Timer™ has a fair number of page types, all of which needed to be worked through. And all of which ended up looking ever so slightly worse as a result. “It’s very large. Like Duplo,” declared Librarian Anya. And she’s not wrong. Happily, Product Owner Jayne was content enough to let the results go live, though not entirely without concerns. At some point soon, we hope to sit back down with Design System Mary and try to work out how we can improve on matters. Until that happens, further pixel progress remains on hold.

Psephologising wildly

Meanwhile, over in psephologyland, it’s mostly been bad news. A security vulnerability was found in our hosted Datasette instance. Well, not in our hosted Datasette instance in particular, but in all hosted Datasette instances. For that reason, we’ve taken it offline until we can find some kind of solution. Happily, Shedcode James got to the bottom of matters and submitted a patch, but we’re still waiting on the project maintainers to merge that into the codebase. In the meantime, our non-hosted Datasette instance remains available, should you wish to query all things election result related.

Bots to Bluesky (and beyond)

In the outing before last, we announced the launch of a new Bluesky bot, that one covering the publication of research by the Parliamentary Office of Science and Technology. This week, we’re delighted to announce that account has been joined by its twin sister over on Mastodon. Thanks, as ever, to James for providing a welcoming home for our Mastodon efforts.

Should you be a user of Bluesky or Mastodon - or even a subscriber to good old RSS - a full list of our bot accounts is available from GitHub. Fill your boots.

Cool URIs do not change (part deux)

Last time out, we reported that the session label change from ‘2024-25’ to ‘2024-26’ caused a fire in the data integration area. A fire that our Jianhan and team are still attempting to put out. Normally, computers care little about label changes, but that tends to not be the case when labels are baked into identifiers. Which happens to be the case here. In a very real Morecambe and Wise sense, we now find ourselves with all the right written answers, just not necessarily to all the right written questions. Best of luck with that one Jianhan.

Having confirmed with the Legislation Offices that the session part of bill titles also changes when the session label changes, taxonomic labels have now been updated accordingly. Luckily, our taxonomy does not conflate labels with identifiers, so that change proceded without incident.

Taxonomic gardening

Being placed in charge of a taxonomy is a little like being placed in charge of a garden. As Ms Chatto would have it, much of gardening is about placing the right plant in the right place. Though in the case of taxonomies, it’s more about the placing of concepts in the right place. And with planting comes weeding. Now, our crack team of librarians run a tight ship. Just because a concept was once planted does not mean it still adds to the overall scene. For that reason, they run regular queries to check which concepts are used and which are not, those deemed no longer fit for purpose consigned to the conceptual compost heap.

In the days when the majority of the material flowed directly from our triplestore to our Solr instance, such queries could be run directly over Solr. But the advent of new search saw some material no longer propagating - as it were - to Solr. Meaning a new approach was needed. Librarian Anya put together a SPARQL query, which has since been tweaked by Librarian Ned and we’re back on top of our concept beds. Smashing.

Managing Members

Over in Member management corner, team:Phil have been equally busy coping with a government reshuffle. Or two. That work is now considered complete, the ever-present manual being updated accordingly.

Wandering stars

A wee while back, Young Robert was contacted out of the blue by Visual-Meaning Steve. Steve had been bobbing around the interweb and happened to come across our data models. Which caused him to ask, what on earth happened to RDF linked data in the UK public sector? A question we could not readily help with. It appears to be just us and The National Archives now, the rest of public sector being more excited by the text than the hypertext.

Since then we’ve spoken on a couple of occasions, with Week 37 marking our first actual, in-person-rather-than-pixels chat. Our guests - Steve, Cabinet Office Kelcey, and Jenny and Pritam from The National Archives - travelled from Cambridge, Manchester and Kew, battling tube strikes and downpours, turning up promptly at the appointed time. Things looked to be going to plan, until it dawned on all assembled that we’d stumbled into a strike day for parliamentary security. And therefore could not enter our own buildings with visitors. We stood in the rain for a bit and looked at our shoes.

Undeterred, CO Kelcey suggested we make use of his Cavalier credentials and find some unoccupied space in Treasury Central. A happy ending looked to be in store until Treasury security clocked that one of our party had the wrong colour of badge. If you’ve ever worked in the UK public sector, you’ll have some appreciation of the importance of badge colours. Some more standing in the rain followed.

And this is how we ended up holding a meeting about RDF at a corner table in St James’s Park cafe. If you were out for a fun day of duck feeding and stumbled into a group of people having an animated conversation about HTTPRange-14, we can only apologise. Next time, we’ll stick to the Two Chairmen. Which is where we did indeed end up. Almost inevitably.