ontologies

2023 - Week 45

Sprint retrospective

We made lots of things slightly better.

Sprint goals

We intend to make things slightly better still.

Being denizens of the Bazaar, grand visions of idealised end states are not for us. We may occasionally reveal an interest in precision - some might even say pedantry - however, we are never less than pragmatic. Many acts of minor maintenance eat strategy for breakfast. Polemic ends.

Biccie bonus - a belated thank you

Way back in week 38, we reported on another stunning success: the roll out on our statutory instruments website of step collections, procedure ordering and identifying defunct defunct procedures and laying bodies. So pleased was JO Jane by this progress, she kindly donated a tin of fancy biscuits to everyone’s favourite team of crack librarians and computational “experts”. And by donated, we mean delivered by hand to our desks in our garrett in a far flung corner of the parliamentary estate. We ate them all, wrote another edition of weeknotes and quite forgot to thank her. What can we say? When one gets a gift, one writes a thank you note. We’ve let ourselves down, we’ve let our dear mothers down, we fear we’ve even let our reader down.

Thank you JO Jane. Please forgive us. They were delicious biscuits. So crumbly. And so buttery. The tin mistaken for a sewing kit on just one occasion.

Librarians of the Week

Not one, not two, but three librarians of the week this week. The proud winners of our much coveted Librarian of the Week trophy ¹ being Anna, Emily and our Jianhan. Read on if you’d like to find out why.

New, old search - backend

Last time we filed a RAG report, we were delighted to announce that Jianhan’s efforts to upgrade our ancient version of Solr to something more moderne had met with unexpected success. We also reported that Jon’s frontend code was plugged into Jianhan’s upgraded Solr, meaning we could finally - finally! - bin the static search mock application that Michael had cobbled together back in May.

An important step and yet the rest of our plumbing left much to be desired, with no data flowing from the old triplestore to new Solr. Not that we complain, preferring to view the endless computational tribulations as a secret source of strength. Tuesday of week 44 saw a trip to Victoria Street Towers for a mostly in person meeting of some of the finest computational brains Parliament could assemble. Various options were sketched out, covering how we might get data from an ‘internal’ Azure subscription to an external Azure subscription. And, lo, a solution was found. At this point, we’d like to pause and thank Rob for helping to clear a path through the weeds.

Wasting no time, our Jianhan ran back to his desk and began assembling assorted bits of computational Stickle Bricks into actual piping. By the end of the week, we had data flowing from procedural business systems to the triplestore and from the triplestore to the new Solr and from new Solr to Jon’s new search application. Here, for example, is an Early Day Motion tabled on the 8th November which flowed seamlessly through said pipes. Absolutely cracking stuff, leading Michael to make giddy remarks about sending ‘tracer bullets through the system’ in what we can only assume was an attempt to sound forceful. Leaving that aside, this work is well deserving of Jianhan’s second Librarian of the Week award, as we’re sure our dear reader will agree.

Jianhan has also cloned the old search code and popped it atop new Solr. With nothing else changed, Solr queries that had been taking upwards of five seconds now take less than half a second. Given the number of queries our crack librarians run every day, that would represent a marked improvement to their working lives. Not to mention the lives of our poor users. Someone once told us that 20,000 unique queries are made each month on our internal service. Which, if our arithmetic is to be trusted, adds up to around 33 hours of waiting for a search result. This would now take around 1.5 hours. Take the week off, everybody! At some point in the not too distant, we need to give serious consideration to running old old search atop new Solr, whilst new old search continues to develop in parallel.

Before we even think about that though there is work to be done, the migration from old Solr to new Solr necessitating a decent bit of tyre kicking. To that end, Young Robert and Michael plan to spend some time next week setting up a new RSpec repository. Once that’s done, they’ll once more have the pleasure of working with Librarian Ned to populate it.

In the meantime, our Jianhan is beavering away to redact any personal information from new Solr. When we say personal here, most of it is really not all that personal. But it does contain the parliamentary handles of our crack librarians, captured as part of their workflow. And besides, this particular audit feature has long been broken, instead operating as a kind of random name generator. Whilst the audit is necessary in the triplestore, it serves no useful function in Solr. So we’re getting rid, because data disposal is often more important than acquisition.

New, old search - frontend

Unfortunately, since we last spoke, data analyst Raafay has had to take some time off work. We’re all hoping you get well soon, Raafay. Michael in particular. He hates being the only functioning northerner in these parts.

Luckily, we bumped into data scientist Louie in a Westminster pub - wretched hives of scum and villainy, definitely not places to be recommended - and he offered to step into the breach. Which he did with aplomb. The first pass of our data dictionary - Solr attribute population counts per content type - is now complete and Louie has moved on to stage two: plotting population distributions. So how many research briefings have zero authors, how many have one, how many have two. And etcetera.

Unfortunately, having a whole team of people attempting to simultaneously filter a Google spreadsheet proved untenable, so we’ve been forced to move the data dictionary to Sharepoint. Less than ideal from a working in the open perspective, but a damn sight better than falling over each others’ filters. Sadly, it does mean that - for once - we cannot furnish you with hypertext. Rest assured, our data dictionary is a thing of beauty.

The outputs of the work of Raafay and Louie flow in four directions. Most prosaically, they help our crack team of librarians inform designer Graeme and developer Jon which attributes can be usefully displayed on our search pages. They also help the librarians to tidy bits and bobs of our data, raise calls with team:Ian to fix some of the more gnarly errors, uncover new and exciting issues with their existing applications, revise and improve their information management practice and communicate data gaps with procedural offices. What our reader might like to call, business change.

Perhaps most glamorously, they’ve also, in a small but beautiful way, helped inform how Parliament deals with government. In the course of her data explorations, Librarian Jayne discovered that a fair amount of laid papers arrive into computational systems with a scrutiny day count of zero. Conversations with the Journal Offices revealed that the fault did not lie there, but rather with the lack of detail in the government’s laying letters. We are informed that the Journal Offices have informed the government that, henceforth, their laying letters should come complete with the number of days the instrument is subject to scrutiny. And that this advice will also find its way into the next revision of the Guide to Laying Papers. Absolutely splendid stuff. In the meantime, please be reassured that over 600 records, dating back to 2017, have been updated to capture their correct scrutiny days. And that our crack team of librarians now have a routine task to ensure that scrutiny days are added correctly. Because of course they do.

Not all of the faults in our data can be laid at the feet of the government. At least some of our problems stem from the Search and Indexing data model, which is murky at best. There are occasions on which it conflates the way papers arrive in Parliament - laying, reporting, depositing, presenting - with the paper series they form part of, with the nature of the document. Some days our poor librarians feel like they’re crouched on a riverbed with only a rusty sieve for company; what should be nuggets of gold, instead puddles of sludge. Still, we are some way off new models and we have to work with what we’ve got.

In order to help alleviate matters, Librarians Anya and Jayne, together with computational whizz-kid Michael, spent a pleasant couple of hours on Wednesday sketching out pseudocode for how our reported papers data model needs to get munged into some better shape. And a further hour writing more pseudocode to help Jon construct descriptive sentences for such papers.

On the subject of Jon, he continues to make absolutely magnificent progress combining Graeme’s designs with Anya, Jayne and Ned’s data dictionary and turning the whole thing into actual working code. In addition to churning out new views for new types of object, this week Jon has also been wrangling the Rails asset pipeline - never a pleasant experience - which means we can finally host the prototype at https://api.parliament.uk/search-prototype. Much nicer than https://search-prototype.herokuapp.com/search-prototype, as we’re sure you’ll agree. Jon, we salute you.

People, places, parties

Also now hosted on api.parliament.uk are our dabblings into general election data. There is nothing that Young Robert can’t achieve with Azure API Management and his trusty computational spanner.

This Tuesday saw Librarian Susannah and computational hound-dog Michael meet in pixels with statistician Carl. A number of election data questions were both posed and answered and we now have a much clearer picture of what needs building and what needs tidying. Before the next general election, we’re hoping to cover off data for four general elections: 2010, 2015, 2017 and 2019. Which all seems doable, crack librarian time permitting.

Not much has changed in the way of code this week. Michael spent a pleasant train journey adding flags for lost deposits to election result pages. By the time he’d pulled into Waterloo, he’d also updated the URL patterns. They had been using common identifiers - geographic codes for constituencies, Electoral Commission IDs for parties and etc. Which, at the time, Michael had thought quite clever. Unfortunately, the further back you go in time, the less likely you are to find such identifiers. Which means most of the application is now running with primary keys in the URLs. It’s not the first time Michael has made an attempt at cleverness, only to trip over his own shoelaces. And we’re sure it won’t be the last. Abstract cleverness of the mind only serving to separate the thinker from reality. As we say in these parts.

Meanwhile, Librarian Ned has brought both his deep understanding of historical boundary changes and his love of a good bit of research to the table. Which means we now have boundary change pertinent Acts of Parliament back to 1944 and Orders in Council back to 1945. This is not quite true. Ned’s fine research actually goes back to 1918, but, if we imported that, we’d need to deal with modelling Ireland. And that needs a good deal more thought.

Elsewhere in the world of top quality librarianship, Anna and Emily kindly volunteered to lend a helping hand. And what a hand it was. The website is based on data published as spreadsheets alongside Commons Library general election briefings. In normal circumstances, when one attempts to take a spreadsheet and normalise the data, all kinds of errors quickly become apparent and you find yourself facing six different (mis)spellings of Conservative Party. In this case, there’s not a single spelling mistake in sight.

It’s not all good news though. Whilst the individual spreadsheets are neat and tidy, there are distinct differences in layout between general elections. Which makes importing them into a database trickier than it should be. On top of that, whilst the spreadsheets are well-supplied with geographic codes for places, they lack identifiers for people and parties. Not anymore. Anna and Emily have taken the 2019 spreadsheet, tidied up the layout and added Electoral Commission IDs for parties and MNIS IDs for Members both current and former. Michael had assumed this would be a painstaking task of combing through spreadsheets by hand, looking up identifier values and pasting them into fields. But we’re dealing with modern librarians here, not copy and paste monkeys. Instead Anna and Emily queried the MNIS API and used fancy Excel formulas to mix-and-match the identifiers. What Michael had anticipated would be good couple of weeks’ work, turned out to take a matter of hours. A turnaround time worthy of a Librarian of the Week trophy in anybody’s money. Absolutely amazing work. Thanks Anna and Emily.

Because the data is now far tidier, we’ve managed to delete 150 lines of code from our parsing script. A good thing. A very good thing. The less code there is in the world, the less can go wrong. Ideal line count being zero. Code removed, we re-ran the script against the new data and it loaded like a charm. Which means we now have result pages linking to Member pages and some stubbed-out party pages linking to Electoral Commission registration pages. It’s all a little barebones at the moment because we’re only running with 2019 data, but, as Anna and Emily progress with the tidies, the thing should fill out nicely.

Next steps for us are tidying the presentation of the spreadsheets to bring them into line with the Commons Library house style and passing them back to statistician Carl. At which point, they’ll replace the supporting documents on the briefing pages. A virtuous circle indeed.

In other election related news, Librarian Emily has taken her fine work on Commons end reasons - or why a Member might leave the House - and formulated a plan for tidying up the data in MNIS. It doesn’t look like it’s going to be quick work, but we’re more than sure it will be done well. Which is the main thing.

How’s poor Robert?

Funny you should ask that. Poor Robert has been gifted the opportunity to take the Open Data Platform Minimal Viable Product High Level Design - or the ODP MVP HLD as we say in these parts - and repurpose it as the Search MVP HLD. And when an opportunity like that presents itself, it would be churlish to say no. Poor Robert was last seen crouched behind his tiny monitor, tearing out what little remains of his hair and plaintively pleading, WHAT IS A POWER TO PIVOT? DO WE HAVE A POWER TO PIVOT? We don’t know, Robert. I’m afraid we can’t help here.

I am a procedural cartographer - to the tune of the Palace Brothers

In a small piece of map making news, back in October Librarian Jayne spotted a rare thing. In their scrutiny of the Data Protection (Fundamental Rights and Freedoms) (Amendment) Regulations 2023 proposed negative statutory instrument, the Secondary Legislation Scrutiny Committee took the unusual step of publishing a submission about the instrument. Because it’s not a thing we’ve ever seen before, there was no step to be actualised on our PNSI map. Luckily, because our maps are data, not code, they’re designed to adapt as Parliament adapts. Which means that step has now been added to the map and the scrutiny timeline is complete. Super.

¹ No taxpayer money has been expended on an actual trophy. It is purely notional.