ontologies

2024 - Week 18

It’s been a while. We hope you’re keeping well and enjoying the Spring sunshine, dear reader.

What’s caught fire this week?

A good question, if not one with an easy answer. Rest assured, something has definitely caught fire. Well, it wouldn’t be a working week without some flames to fan. Sit back, while we try to explain.

There has been a long-term problem with the processing of oral parliamentary questions and their answers. There have, in fact, been several long-term problems. Such problems have proved rather hard to track because, unlike most of the things flowing into Parliamentary Search, there are two sources. Questions for oral answer are tabled in our quaintly-named Electronic Questions and Motions system. A system that replaced our old Carrier Pigeon Questions and Motions system, sometime after the invention of electricity. Oral answers, on the other hand, flow from Hansard. Obviously. Reconciling the two and joining the correct answers to the correct questions occasionally proves tricky.

In order to subject and procedurally index records in the triplestore, our crack team of librarians have a piece of software called - not unsurprisingly - the indexing application. Records in the triple store with a parent child relationship are shown grouped on a single screen, meaning all contributions to a proceeding are shown grouped under that proceeding and all contributions that follow a lead oral question are shown grouped beneath that question. Where the parent is an answered lead oral question, each contribution has a bank of radio buttons, allowing our crack librarians to assign the appropriate contribution type, be that question, first supplementary question, other supplementary question, answer, intervention or, indeed, other. All lovely, labour saving, structured data. In theory.

We’ve had particular problems with the 2019-21 session, the arrival of a global pandemic not making anything any easier. A first attempt to fix this was made by computational colleagues back in 2023, but only served to make matters worse. Calls to the aforementioned contributions listing in the Indexing application causing the blasted machines to cough, choke and crash for every record across the whole of 2020. Or at least every record sampled.

As part of his ticket tidying duties, our Jianhan picked up the baton and attempted to resolve the old, consequent calls which can be roughly summarised as, ‘please make error message go away’. First off, he’s reprocessed 130,000 - my word! - contributions from source files to triplestore. Second off, he’s tweaked the validation code in the Indexing application to no longer throw a wobble when faced with a contribution with no contribution text. Which means our Indexing application no longer falls to its knees as it attempts to come to terms with contribution-less contributions.

Quite clearly, this does not count as job done. How many contribution-less contributions has the poor machine swallowed? One might well ask. And why? For now, we can at least answer the first question. A quick SPARQL query from Jianhan reveals a grand total of 235,000 contributions with no contribution text. Which fits the mathematical definition of a ‘hell of a lot’.

Jianhan’s work continues, ably assisted by Librarians Ned and Steve. There’s some hope that at least some of the 235,000 contribution-less contributions are contributions we wouldn’t expect to have contribution text. That said, both Ned and Steve share a strong reckon that at least some of what’s now missing was present back in the early months of 2023. Which is less reassuring. We can only suppose that an earlier attempt at a fix inadvertently removed contribution text. All of which goes to show the dangers inherent in trying to fix things. Will we ever learn? Best of luck Jianhan. Best of luck Librarian Ned. Best of luck Librarian Steve.

To our deep dismay, contribution-less contributions were not the only fire being fought this week. As librarians gulped down coffee, tamped out cigarettes and clocked on for the day, they found themselves face-to-face with a taxonomy application that had just stopped working. Rendering both tools and output impotent. Fortunately - all other options being far worse - it turned out that someone - well, everyone really - had forgotten to renew the software licence. We cannot pretend this is the first time this has happened. Nor can we realistically claim it will be the last. But we do hold out some hope now we have list-loving delivery manager Lydia aboard ship.

In happier news, the licence is now renewed. In even happier news, Developer Jon took the opportunity to more elegantly handle errors thrown by the non-existence of our taxonomy service. Next time someone forgets to renew the licence, one more thing should not break. Or at least it should break more elegantly. So that counts as a small win.

In other new, old search news, things continue to go swimmingly. Developer Jon has been ploughing through search result specifications. Most of the result item snippets now appear to be complete - at least to our eyes - and Jon has moved on to exploring the implementation of faceted navigation. To that end, he’s been researching how best to build a JavaScript interface to our backend Solr service and confirming with Jianhan how the current application manages to build the content type hierarchy navigation. As a result of which, new, old search is really beginning to look like a search application. Lovely stuff.

Stand down / panic over

If you’ve been following along for the last few weeks, you’ll know we went into panic mode when work to handle non-Ministerial written corrections - which we had assumed to be be mostly complete - turned out to not, in fact, be complete. Not nearly complete. Our Jianhan once more pulled out all the stops - winning his second ever Librarian of the Week Trophy on the way - and both Ministerial and non-Ministerial corrections appear to be flowing through the pipes without interruption.

Librarians Claire, Jayne and Ned have checked, double checked and triple checked our internal search application, our external search application and Jon’s new, old search, everything turning up as expected. The only thing that didn’t quite work was the assignment of a reference number somewhere in the propagation pipes, which continued to come through as MC - ministerial correction - when it should have been the more generic WC - written correction. Actually. Initially, this was fixed by raising daily calls to the Parliamentary Computational Section requesting manual intervention. Our Jianhan has since tinkered with the piping to apply a temporary fix until the more permanent fix happens in our Hansard application. Which should save a telephone call or two.

Taxonomic liberation

Friend of the family Silver and his Data Language colleagues continue to chip away at liberating our taxonomy from its software constraints, allowing for both interoperability and reuse. Progress this week includes a complete re-load of all concepts from the ‘Concept’ scheme - subjects, if you will - and end-to-end testing of example changes. The first change made being the addition of everyone’s favourite family-friendly dog, the Bully XL. This being a narrower concept than dog and related in some unspecified way to dogs of a dangerous nature.

The contentious issue of personal data in the taxonomy service has moved one step closer to being solved. For reasons lost in the mists, our internal taxonomy contains not only the names of research briefing authors, but also their work telephone numbers. And not just of authors. For other reasons, also lost in the mists, the work contact details of other Library staff - some now departed from the building - are also in there. Names of authors, we always understood. Without them, finding briefings by author would be impossible. But why the other people? And why the telephone numbers?

The presence of this data leads to additional complications that we’d really rather not have to deal with. It means, for instance, we need two versions of the taxonomy API: a standard one for internal use and a redacted one for external users. Silver and colleagues being just one example of the latter. In turn, the need for a redacted taxonomy necessitates the existence of some software to perform the redaction. As we’re sure our dear reader will appreciate, the less software there is in the world, the fewer things are likely to catch fire. The perfect line count for code being zero.

For a wee while we were given to understand that authentication to the research briefing authoring application had some built-in dependency on the taxonomy. Which felt somewhat ungainly, but not unlikely. Librarian Phil has carried out a preliminary investigation and, most unfortunately, this does appear to be the case. Contact details, on the other hand, are only used to display on research briefings pages on the intranet. Phil is currently chatting - or reaching out, as Young Robert might say - to the Commons Library communications folks to see if the display of contact details is still considered necessary. And is planning to repeat the exercise with colleagues in the Lords Library and POST. If all goes well, we hope to strip out all of the contact details and remove any non-authoring staff who have either left Parliament or no longer require access to the briefings application. At which point, we may finally converge on a single taxonomy API. And everyone will be happy.

People, places, parties

Not a week goes by when our attentions aren’t firmly focussed on the upcoming general election. It can sometimes feel like we’ve spent most of our lives wrangling data in preparation. This week, we took another stride forward when the 650 ‘new’ constituencies that will be contested at the next general election were added to MNIS. Librarians Anna and Emily have added the ‘new’ constituency MNIS identifiers to our mapping spreadsheet, so that - once all general election candidates are confirmed - Data Scientist Louie will have a handy triangulation point on the journey from Democracy Club to parliamentary systems. Splendid.

Anna and Emily have also been chasing the heels of Computational Section colleagues, meaning we can confirm that the software’s insistence that House of Commons Members leave because of a general election rather than dissolution has now been put firmly to bed. Leaving Librarian Emily free to apply everything she’s learned about why a Member might leave the House of Commons to our actual database, without the software trying to second guess her.

On the subject of computers trying to second guess their elders and betters, there is a small piece of welcome news. The code that runs when the ‘dissolve Parliament’ button gets pressed, does appear to apply the date of dissolution to the Members’ representations and their party affiliations and not the date of the general election as we’d feared. So that’s good.

In further dissolution related news, Librarian Anna - in association with Computational Section colleagues - has reset start and end dates for constituencies to align with dissolutions and not with general elections, which they had been. An easy mistake to make, but not if one reads the legislation.

Finally, Librarian Emily was sent on a mission to explore UK geographies as captured in MNIS and UK geographies as geographic experts Carl and Neil would like to see them captured in MNIS. The latter turning out to be all pretty similar to our geographic area model. Thankfully. Top work Emily. UK geographic areas being something akin to a poorly constructed Russian doll, we do feel for you.

Psephologising profusely

A few, fairly minor tweaks to our psephology website, not all of which are quite live yet. Firstly, in order to sidestep denial of service attacks on our Heroku hosted backend, access to those URLs is now forbidden to any IP address that doesn’t belong to Cloudflare. If, for any reason, you’d bookmarked psephology-b3b91d24dfdc.herokuapp.com, you’ll now find what you’re looking for at electionresults.parliament.uk, all courtesy of Cloudflare’s reverse proxy. And all considerably faster as a result. Young Robert and Michael would like to thank Shedcode James for pointing them in the general direction of rack-attack. Cheers James.

Secondly, we now - finally - have a test website. We won’t point to it here for fear of Google gobbling it all up. Using the Cloudflare cache on the live site definitely brings advantages. But it doesn’t really lend itself to rapid development. Pushing a new feature live and having to wait 24 hours for the cache to clear makes the feedback cycle from valued colleagues clumsy and cumbersome. The test site fixes this.

In search of further feedback - and showing off slightly, if we’re honest - Michael posted a link to our shiny new website on the Democracy Club Slack channel. Feedback didn’t exactly pour in but Jonathan F did get in touch with a number of pertinent points. As a result of which, we now have considerably nicer headings on our Parliament period pages, Member listings split by A to Z by family name and a whole slew of new cards on Trello. We’ve also made a small start on adding CSV downloads. So far covering Acts of Parliament, Orders in Council, Members, Parliament periods and boundary sets in effect during a Parliament period. More to follow. Much more to follow.

Whilst Michael took off on a well deserved vacation, Young Robert has been attending to much needed pixel-polishing. All kinds of changes have been pushed to live, not least of which are:

All progress, as we’re sure our dear reader will appreciate.

Egg timing - slight return

In the course of conducting some research, Librarian Claire noticed that our beloved egg timer was listing some prorogations ending before they began. This was because we took the start date of the prorogation as the day following the last date of the preceding session and took the end date of the prorogation as the day preceding the first date of the following session. All well and good, except it turns out that some prorogations lasted less than a day. I mean, how on earth were we supposed to know that? Quite ridiculous.

Taking great care that any changes wouldn’t affect scrutiny period calculations, Librarian Jayne and her computational helpmate Michael have now solved that problem by making the start date of a prorogation the end date of the preceding session. Which means some dates are now in a session and in a prorogation. All less than ideal, but, until Parliament gets better at reporting times and not just dates, about the best we can do.

What’s a procedure? (Eat y’self fitter)

Not only are our procedural maps the first - and indeed only - example of machine parseable parliamentary procedure in the world - at least that we know of - they - like their creators - always cite sources. Gaze at any of our lovingly drawn maps and you’ll spot a dash of diamonds and smattering of circles. The diamonds point to sections of legislation where procedural rules are set out; the circles do the same job, but for standing orders. This posed a particular problem for our proposed negative statutory instrument map. A particular problem because the PNSI procedure has been set out in three Acts. It first popped up in Schedule 7 of the European Union (Withdrawal) Act 2018, again in Schedule 5 of the European Union (Future Relationship) Act 2020 and latterly in Schedule 5 of the Retained EU Law (Revocation and Reform) Act 2023. Who knows where it might pop up next?

At this point, we started to question what a procedure is. Are we really dealing with three procedures that happen to be grouped under a common heading? Or is it one procedure that just happens to be invoked three times? Maybe legislation drafters need to embrace transclusion? Anyway, we found ourselves quite stuck, not knowing if we needed three maps or one map with three lots of citations. A meeting with Mr Korris put us back on the right track, when he asked, “but do you not know which Act an instrument was laid under? Could you not time-bound the citations in the same way you time-bound the routes?” He was of course correct on both counts. An extremely long-winded way to confirm that our PNSI map is now appropriately decorated with three lots of Act citations and that ‘problem’ is considered solved. Thanks Mr Korris.

A Rush and a push

With Librarian Anna - perhaps temporarily - relieved of general election data wrangling duties and Shedcode James - perhaps temporarily - relieved of standing order application duties, they’ve finally found the time to get stuck back into tidying our Rush database. First up, they’ve tidied and normalised a bunch of strings that were used to describe a constituency’s relationship to its containing country, which means we now have a listing of UK constituent countries. They’ve also tidied and normalised nature of service information into a new, tidy table.

Prompted by Librarian Phil, who was prompted by Libarian Claire, Librarian Anna has added end dates for a number of Members who have recently gone on to other things. She’s also added a number of missing Members after cross-tabulating with Wikidata and even found the time to add the Speaker’s dad. Lovely stuff.

Many happy returns, Young Robert

It would be remiss of us to close these notes without mentioning that, last week, Young Robert celebrated yet another birthday. None of us are quite sure how old he is, but he doesn’t look a day over 50. Happy birthday Robert. Here’s to the big six-oh.