ontologies

2024 - Week 39

Librarian of the Week

As our dear reader will well know, back in 2021 our crack team of librarians took over most of the management of House of Commons data in our Members’ Names Information System system. Most of their workload is dedicated to keeping the show on the road, as old members depart, new members arrive, old departments flicker out of existence, new departments are born, boundary changes wipe out whole swathes of constituencies and so on and so forth. That said, time is also found to impose a little more order on what has gone before. Now MNIS is a complicated system, sitting at the centre of a myriad of other systems, both procedural and corporate, its database having upwards of 80 tables. We say upwards here, because, quite frankly, we’ve run out of fingers. Anyway, I think we can agree, that’s a hell of a lot of tables.

One of these very many tables goes by the title of ‘end reasons’ and is used to describe both why a member left a House and why a member of the Commons parted company with their constituency. When we inherited that table, it was all something of a mess, members being declared to have left for many and varied reasons from ‘standing down’ to ‘defeated’ to ‘general election’. None of which made much sense. A member of the Commons not being a member post-dissolution made losing their membership at a general election an ontological abnormality. For that reason, a project was kicked off - or initiated as Young Robert might say - to tidy that table and sprinkle over some good, old-fashioned common sense.

Librarian Emily took the bit between her teeth and set about planning the spring clean. Election expert Neil got roped in. MNIS being a bicameral system, Ms McAskill and Mr Korris were also consulted to ensure we did not tread on red carpet toes. We took our usual approach of not just questioning why, but also questioning the why of the why. So, for each end reason uncovered, where is the reasoning set out? Thanks to Emily’s diligent efforts we now have a complete set of end reasons citing sections of legislation where appropriate, accompanied - possibly for Michael’s benefit - by a picture. Absolutely excellent stuff and well deserving of Emily’s second Librarian of the Week trophy.

When we first put pen to paper on our result item specification, we took the decision to only apply labels where we believed them to be absolutely necessary. Developer Jon raised a quizical eyebrow but acquiesced to our requests. Upon seeing the results, we quickly came to the conclusion that Jon had in fact been correct. Specifications have now been updated and almost every attribute now comes with a label. And much neater it looks too.

Labels in place, Librarian Jayne has started her second quality assurance pass. The first pass was on an attribute by attribute basis, so didn’t really zoom out to look at pages in totality. The second pass is on a content type by content type basis, seven such types already finding their way to the happy pile. Hypermedia being the engine of application state and whatnot, Jayne has also tested Jon’s implementation of our preservation of state specification and the ability to share said state. Neither were found wanting. Excellent work from both Jayne and Jon.

On the subject of excellent presentational work, Jon has also deployed Nokogiri to ensure that any markup in the truncated content appearing as part of result items is now sanitised, meaning odd styling no longer bleeds out to pollute the rest of the page. Markup in content on object pages that wasn’t getting styled, now is. Much better.

Still on the frontend of the frontend, our two column search result layout didn’t work so well on smaller screens. Jon has also fixed this, breaking out his Flexbox skills to pop the search facets atop the result listing and wrap them all neatly in a details element. Mobile telephone users everywhere applaud. He’s also applied a fresh coat of paint to the newly JavaScript-free search toggle widgets, making everything much clearer.

Work on search facets continues, this week seeing the release of fully functioning date and session filters. Our decision to remove the publisher facet has now been reversed, after we came to the realisation that relying only on the House facet would have accidentally removed the ability to filter for all three of our libraries. Given we work for one of them, that would have been less than ideal. A small bug with disappearing filters in facets has also been fixed. So that’s good.

It has not all been unhindered progress in the facet factory. Our current implementation features five member focussed facets: all the members, the tabling ones, the asking ones, the leading ones and the answering ones. To Librarian Anya’s eyes this felt like two too many. Following consultations with domain experts, Anya came to the conclusion that the askers, the tablers and the leaders could all, in theory, be squashed down into one bucket. Following further consultations with domain experts - and a gentle nudge from Librarian Ayesha - she also came to the conclusion that we’d call this new facet ‘Primary members’. Which has the additional benefit of us no longer having to sit through Michael’s attempts to deploy the correct spelling of Principal. The new plan fell upon stony ground, when Developer Jon attempted to follow the new specification, only to find that wrestling the correct numbers out of Solr was harder than expected. An emergency meeting has been pencilled in for Wednesday. Anya and Michael - despite both being on ‘holiday’ - will be in attendance. That, my friends, is what we call dedication.

In some attempt to get ahead of Jon for the next phase of development - advanced search and its accompanying aliases - Librarian Anya once more donned her thinking face and set about documenting the current search aliases, which ones we’d like to keep and which ones are destined for the bin. Ably assisted by Librarians Claire, Jayne and Phil, we now have a long list of XML alias configuration looking for a home somewhere. Over to you, Jon.

Taxonomic updates

In support of our long-term goals to make both our Solr and SES APIs publicly available, we’re delighted to announce that the latter is now a personal data free zone. It had contained contact details for Library research staff, but we couldn’t for the life of us work out why. It turned out they had been used on a handful of now-deprecated intranet pages and nowhere else. Meaning we’ve now been able to remove all references in both data and manual, another of Chesterton’s fences vaulted in the process.

In actual data update news, we can confirm that the ‘Proposal for a draft remedial order’ term has now been updated to ‘Draft remedial order (proposal)’ bringing it in line with our usual naming conventions. The scope note for ‘Grand Committee proceedings (HC)’ has also been updated and now makes mention of regional grand committees. And the scope notes for fatal and non-fatal prayers have both been updated to remove their Commons-centricity, such things also being perfectly possible in the upper House. In a move toward the meta, Librarian Ned has updated our manual, so even scope notes have scope notes now. Blimey.

At this point, you’re probably thinking, that sounds like an excellent thesaurus managed by excellent and dedicated librarians. If only there were some means for other people to make use of it. Read on, dear reader, read on.

Taxonomic liberation

Efforts to liberate our thesaurus from its software constraints continue, with a couple of projects currently in progress. Or ‘in flight’ as Young Robert would no doubt say. Both rely on piping provided by Mirage, so let’s talk about that first.

Mirage is a software component provided by Data Language that looks for changes to the thesaurus - additions, edits, deletes - slices and dices the results, and allows downstream systems to subscribe to any combination by means of message queues. The name Mirage was invented by Young Robert and started life - as so many names do - as an acronym. Trouble is, none of us can remember what that acronym was. So plain old Mirage it is.

One project downstream of Mirage is a replacement for the Odds and Sods Information System, which our crack team of librarians use to create records for things we don’t have feeds for. OaSIS was built a good decade ago as part of the initial search and indexing project, and is, quite frankly, on its last pair of legs. We’ve already scribbled a target model, the next stage being populating it. Now that Mirage can slice the thesaurus by both taxonomic structure and term attributes, we’re in a position to populate a member lookup with a list of only members, a department lookup with a list of only departments and so on and so forth. Testing awaits.

The second project downstream of Mirage is the Subject Specialist Directory, an application allowing members and their staff to find subject matter experts amongst the House of Commons Library research team. In support of which, Thursday of week 37 saw Librarians Anya and Susannah take computational dogs bodies Young Robert and Michael for a quick trot along the Brighton seafront for an appointment with Silver. And all very productive it was too. Architecture diagrams were sketched and wireframes scribbled before the almost inevitable trip to the pub with Data Scientist Louie. Not being believers that work and pubs should not be mixed, conversations continued, Louie having the good sense to inform us that at least one of our plans would work in neither theory nor practice. The blame for that lying squarely at someone else’s door. We’ve also managed to get our paws on the CSV files that sit behind the print edition of the Subject Specialist Directory, another blocker unblocked.

I am a procedural cartographer - to the tune of the Palace Brothers

Those keeping a keen eye on our logical and arithmetic procedure map for treaties laid under the Constitutional Reform and Governance Act - so, basically, Librarian Jayne - may have noticed a slight change of late. We had said that written statements outlining intentions to accede could only happen once. You’d think we’d have learned by now, and learn we did when the Comprehensive and Progressive Agreement for Trans-Pacific Partnership saw not one, but two such statements. One predating the laying of the treaty, a second following the conclusion of parliamentary procedure. Constraints in both map and data have now been loosened, the treaty timeline bearing testement.

Our Jianhan stepped in to lend Jayne a helping hand, when not one, not two, but three of our pipes suffered a temporary blockage. The first blockage occurred when attempting to feed statutory instruments and treaties from Solr to the procedure editor database. The second blockage was in the bit of code that translates our procedure maps to DOT files. And the third blockage came to light when not all of our procedure steps turned up in the triplestore with their types intact. All three blockages have now been cleared. Thanks Jianhan.

Librarian Jayne has also been kept busy with machinery of government changes, though, this time out, thankfully limited in scope. The Ministry of Housing, Communities and Local Government coming back into being, whilst the Department for Levelling Up, Housing and Communities has now had its end date applied. All of which should become apparent should you choose to use the ‘laying body’ select list on our statutory instrument website.

On the subject of the SI website, whilst it is in itself a lovely thing, it comes with some drawbacks. The first drawback being its URL. It means we’re confined to only ever tracking SIs, despite other, non-SI, papers being subject to the exact same procedures we’ve already mapped. Whilst our crack team of librarians have complete control over the backend of things, the website itself is in the hands of colleagues over in the Parliamentary Computational Section. All of which makes responding to user feedback - requests for RSS passim being just one example - and our innate desire to prototype new features really rather hard. It also means that lots of the work that goes on under the surface - for example, our ability to parse a work package and derive steps that now may or must happen - never makes it to the visible tip of the iceberg.

Back in week 36, we held an impromptu work planning exercise, one of the things dropping out of that being a desire for some public expression of the full procedure model and maps. Organ Grinder Jayne and her computational lap-monkeys, Young Robert and Michael, have now made the very smallest of starts on just that. So far, Jayne has evaluated the SQL Michael wrote for the procedure parsing code and declared herself perfectly capable of doing the same, this time in SPARQL. Our Three Musketeers have also sat down to draw up a URL structure for their proposed procedure browsing application. After all, when one is designing a website, where else would one start? As of about 20 minutes ago, Jayne and Michael have also bumped heads together and written some Ruby code to grab CSVs from our SPARQL endpoint and start to translate the results into Ruby objects. Not much, you might think, but not nothing.

Psephologising profusely

Whilst the rest of the country may have lost interest in elections - at least from a UK perspective - we have not. Knowing that our time will come again, we’ve made a number of small improvements to both our election results website and our Datasette offering.

First off, we had been storing Electoral Commission identifiers directly in our political parties table. This proved to be yet another ontological mistake and one we came to regret. Political parties often having more than one registration across both time and space. Luckily, back in general election preparation days, Librarian Anna had taken eyeballs and a sharp pencil to the Electoral Commission website - an API not being available - to make a mapping spreadsheet between MNIS party identifiers and Electoral Commission registrations. We’ve now imported said spreadsheet, meaning our party pages now list registrations in both Great Britain and Northern Ireland where applicable. To make other people’s lives easier and save them time hand-scraping websites, we’ve also made all the registrations available on their own page. Both offerings come with CSV downloads, because of course they do.

Data changes have not been confined to party registrations. We’ve also split apart the Green Party Northern Ireland from the England and Wales Green Party, their Scottish equivalents having been split out pre-election. Shared Ground have parted company with the Young People’s Party, though the two do share a common registration. A quick bit of mathematical magic from Data Scientist Louie, means Shared Ground have also gained vote change figures based on notional results for the YPP. It all sounds very complicated, but we don’t ask. On much the same subject, Anna’s registration data has been repurposed to show ‘related parties’, being parties sharing a common registration. Which means we’re finally able to link from The Brexit Party to Reform UK. And vice versa.

Back with CSVs, computational REST warrior Michael has added a dollop more, these covering the full list of boundary sets, boundary sets over time in a country, boundary sets established by a given piece of legislation, legislation enabled by a given piece of legislation, legislation enabling a given piece of legislation and - perhaps more interestingly - candidate results for a given election. Blame it on Michael’s OCD if you will, but the end goal is every URL that can return a CSV will return a CSV. Only 19 more to go. Stay tuned.

Three months on from the general election, corrections finally appear to be drying up. That said, at some point in the past three weeks we did issue more corrections covering Congleton, Crewe and Nantwich, Hinckley and Bosworth, Macclesfield, and Tatton. If you’re attempting to follow along from home, Statistician Carl has compiled a list of data corrections, available from our 2024 general election research briefing page.

Finally, as part of our general purpose general election mop-up, our crack team of librarians contact every new member with a short set of biographical questions. Amongst those questions is one covering whether they’ve stood unsuccessfully in previous elections to Parliament. We’ve now taken that feedback and backfilled MNIS identifiers for previously unsuccessful candidates across the spreadsheets accompanying the research briefing, the election results website and the Datasette offering. Chris Webb being just one member gaining an additional dash of hypertext.

Outreach / engagement

Week 37 was more hectic than most, not only seeing our aforementioned trip to Brighton, but also a trip across St James’s Park for a lunchtime appointment at the Institute for Government. This on the general theme of how should government use AI. Unsurprisingly perhaps, Michael was gearing up to pull what he likes to call his clever face and ask his question of the moment - if the government does go all in on AI, what would need to be in place for Parliament to scrutinise that? - but, as ever on such occasions, he bottled it. Still, it was an interesting event, ably compered by Gavin and featuring a guest appearance from Jeni. It was lovely to see Jeni. It is always lovely to see Jeni.

A wee while back, Anya was contacted by Cabinet Office Kelcey, who’d found himself accidentally browsing our ontologies and decided to get in touch. He’s hoping to work on a Linked Data-style project capturing positions in government, incumbencies in those positions and the organisations those positions belong to. Which is a strange coincidence, because it’s something we’ve been planning to work on too, our only gesture toward such matters being the quaintly named Organisation Accountable to Parliament class and a very, very high-level agency model. Which is why Wednesday of week 38 saw Anya, Michael and lead developer Tom meet in pixels with Kelcey to chat shared goals. An agreement to collaborate was signed and will be ratified at a forthcoming meeting with Researchers Graeme and Richard. Splendid stuff.