ontologies

2025 - Week 15

Ever tried. Ever failed. No matter. Try again. Fail again. Fail better. And, just occasionally, succeed

If you tuned in last time, you’ll know that attempts to upgrade our ageing - some may suggest decrepit - triplestore did not go quite to plan. Despite our Jianhan fine-tuning every pipe in and every pipe out, the bit between the triplestore and our Solr instance exhibited what might be called ‘unexpected behaviours’.

Which is a damn shame, because the old thing is on its last legs, taking upwards of two minutes to save changes to a single record. Given a ‘typical’ sitting day sees this button pressed around 600 times, that’s a lot of thumb twiddling for our crack team of librarians to endure. Putting both patience and sanity at stake. A quick back of a fag packet calculation reveals that’s 20 hours a day spent waiting for things to save.

Coping strategies have been adopted, Librarian Steve admitting to opening a dozen browser tabs and pressing save on them all before he’s even had his morning coffee. But imagine pressing the ‘add to cart’ button on your favourite e-commerce solution and being expected to wait two minutes for that shopping adrenaline hit. It hardly ‘meets the expectations of the internet age’, as the saying goes.

Whilst the less sanguine may have judged the outcome of Jianhan’s endeavours on the basis of a successful upgrade, we pride ourselves on taking a more philosophical approach. Learning is also success, we tell ourselves. And learn Jianhan did. Head buried deep in code, he finally announced he was happy with his changes. Librarians tested and also declared themselves happy.

So when Thursday came around, Jianhan once again pressed whatever buttons it is he presses - Delivery Manager Lydia having orchestrated the paperwork, Librarian Anya having marshalled her troops - librarians indexers tested and lo, the damned thing (mostly) worked. Wow. Wonders will never etc. A bug affecting presentation of some data in our Search application remains. A programmer’s work is never done. We’ll take the win for now. As a small side benefit, we even managed to close a Trello board. Not something that happens often.

The real test will come on Tuesday 22nd, when the usual post-recess written question deluge is expected to hit our barricades. At which point, we hope we’ll no longer be sticking fingers in computational dykes. As it were. Top work our Jianhan, top work crack librarians, top work Lydia.

Farewell then, Developer Jon

Last time out, we also reported that Developer Jon had escaped from under his tottering tower of consolidation tickets and started work on search aliases and query expansion. A corner having been turned, new, old search was finally starting to feel like old, old search. In a good way - in that it’s no longer entirely reliant on what the user types, but instead first sends their input text to our taxonomy API to be turned into synonyms and tokens. Lovely stuff.

Unfortunately, it was at this point that Jon’s contract expired. Altogether less lovely. Despite the very best efforts of Delivery Manager Lydia, it would not appear that the loss of our only frontend developer is easily mitigated. For understandable reasons, if you’ve only been reading these notes to follow progress on new, old search, you may wish to tune out for a wee while. We remain hopeful that a developer will be found by some means from somewhere, but that may take some time. Until then, we have a bunch of work to do to tidy our Solr backend, our pixels, and our deployment patterns. So, when a developer finally does arrive, they’ll have more fertile ground to work upon. That should at least keep our Jianhan and Young Robert busy.

Taxonomic liberation (or, toward a single subject view of the Library)

With not much search work to be getting on with - or at least no one to be getting on with it - attentions have turned back to attempts to liberate our taxonomy with the long term goal of providing a Single Subject View of the Library™. In the short term, that means making a Subject Specialist Finder, which currently takes the form of a pdf and a booklet, cobbled together from bits and bobs of spreadsheets. Its putative replacement associates Library researchers with subjects by means of a specialism, importing Research Briefings indexed with the same subject taxonomy for good measure.

The whole thing got into a bit of a muddle when we discovered we were importing our topic taxonomy alongside our subjects taxonomy. Topics being a very different thing which we plan to deprecate in the not too distant. You’ll just have to trust us on this. Equally unfortunately, we were also importing topic taxonomy-based indexings for Research Briefings and associating our specialists with topic terms. The first of these problems has now been fixed, the second is on its way to being fixed and our taxonomy import requirements have been respecified.

A whole bunch of other bugs have also hit the happy pile, but, given the Single Subject View of the Library™ contains a small amount of personal information, it’s all locked up behind an authentication layer and we can’t point you at it. So we won’t bore you with details here. Rest assured, it’s all positive progress and possibly something of a goal line clearance.

If you must grab our stuff, at least try to be polite about it (polemic)

Our regular reader will be aware that we run a small stable of librarian adjacent web applications. Or our ‘productivity suite’, as Young Robert might say. They’ll also be aware that Shedcode James has spent much of the last month upgrading them. Or ‘hardening them’, in Young Robert speak. In the course of that work, James has rolled out Rollbar for each and every application. If you’re unfamiliar with Rollbar, it provides monitoring of errors in applications without the need to parse Ruby logs. The latter being something we’d not wish on anyone.

Now the primary purpose of rolling out Rollbar is to spot errors we’ve made. None of us being perfect, such errors do happen. For instance, it turned out that Michael had accidentally removed a partial from our election results application that was still very much in use, causing one of our boundary set aggregations to fail. Not good. But Rollbar alerted us and that slight snaffle was cleared up in a matter of minutes. All good.

Less good is Rollbar alerting us to errors other people are making. In this case, arising from a bunch of chaps - or bros, as Young Robert is wont to refer to them - attempting to train their AI efforts by scraping every last thing they can scrape off the web and entirely forgetting their mannners in their haste. It turns out our Procedure Browseable Space™ was getting absolutely hammered by the magic sand lads. And not only hammered for URLs that exist, but - in the case of the Amazon AI bot - also hammered for URLs that do not exist and never will exist. URLs such as:

https://api.parliament.uk/procedure-browser/work-packages/https:%2F%2Fid.parliament.uk%2Fa0aejkro

being called 4,600 times, quickly blowing through the 5,000 error limit on James’ Rollbar account. Lads, please. You’d think the old AIs would be well-trained enough to recognise that as a pretty unlikely URL. It would seem not. James has now added a catch-all route defaulting to a 404 which has calmed matters considerably, but the point stands. There was never a golden age of all crawlers crawling politely, but this is getting ridiculous.

That this news came through from James on the same day Jeremy pressed publish on his Denial post was an apt coincidence. The potential harms of Large Language Models often focus on the output end of the spectrum, but the input end comes with its own challenges. When ‘hyper-aggressive LLM crawlers’ are implementing what amount to DDOS attacks on the free and open internet, we have problems. Expanding on the coincidence, that very same day, Young Robert popped into Slack to post a 47,189 line script that attempts to ban such traffic. That’s modern web development for you, he added.

I am a procedural cartographer - to the tune of the Palace Brothers

With the team still missing Librarian Jayne, the map making mantle has been assumed by Librarian Ayesha and her computational helpmate Michael. Kicking off with the simple stuff, this week saw yet another select committee added to the House of Commons half of our proposed negative statutory instrument map, this time in the shape of the Energy Security and Net Zero Committee. That the European Statutory Instruments Committee was set up under a temporary standing order - and that the adoption of its sifting responsibilities has now passed to the many and varied departmental committees - has caused one hell of a lot of work here. Not that we’re ones to complain. But does nobody ever think of the librarians?

Noun / verb confusion in the CRAGing area

Entering more choppy waters, we also saw our second ever extension of treaty period A by means of a ministerial statement. We can only assume that someone treaty-related once told us that extension by statement was possible and we jumped to the conclusion that that meant an oral or written statement being made to the House. At least that’s what our scope and link notes suggest. After all, when one hears the words ‘ministerial statement’ the verb that springs to mind is ‘making’, not ‘laying’.

Unfortunately, it appears we did not take the time to do what we usually do and read the blasted legislation, which has “[t]he Minister does that by laying before Parliament a statement”. The ‘laying’ of a ‘statement’? Whoever heard of such a thing? Papers are laid, statements are made, surely? Maybe the drafter didn’t bother to check. Or possibly we’re parodying our own pedantry.

That the only time we’ve seen this happen before, the ‘laid statement’ was accompanied by a written statement, and that, this time around, time constraints meant no written statement was made, only added to our confusion.

Anyway, legislation finally read, that step has been actualised for only the second time in its life, and has now found its way to the timeline for the One Hundred Year Partnership Agreement between the United Kingdom of Great Britain and Northern Ireland and Ukraine. Which puts that problem neatly to bed. Fine work Librarian Ayesha.

Choppier waters still

Ayesha and Michael were back on duty when a request came in from the Secondary Legislation Scrutiny Committee requesting figures for the number of statutory instruments they’ve considered, broken down by the number they’d raised concerns about and the number they’d noted as being of interest. And then the rest. This further complicated by the request being scoped for the calendar year beginning 1st April 2024.

Unable to call on Librarian Jayne’s SPARQLing excellence, they at least had our shiny, new Procedure Browsable Space™ and its rather nifty ‘business items actualising a step’ aggregations to rely on. Once they’d remembered to remember that the SLSC also considers PNSIs, they managed to grab all the data they needed and pop it in a spreadsheet. Only to be faced with numbers that didn’t quite add up. This at least gave Michael the opportunity to tell his one and only computer joke:

“There are only two problems in computer science,” he explained, before pausing to allow anticipation to build. “Naming things, cache invalidation, and off-by-one errors.” How we laughed.

Anyway, data assembled, Ayesha and Michael also realised they lacked the kind of spreadsheet skills that people who really know how to wield a spreadsheet possess. Luckily Librarian Emily stepped in and managed to weed her way though everything that appeared on one sheet that also appeared on another, whittling the whole thing down to one stray actualisation. Which has now been fixed. Data improved and and an enquiry successfully answered. Quite the adventure.

In the course of that work, Michael also added CSV downloads for business items actualising a given step to our Procedure Browsable Space™. Which should make similar enquiries in the future easier still. We hope.

Psephologising wildly

Our only real bit of interesting news, psephologically speaking, is the addition of by-election data for Parliament 57. This thanks to more excellent librarianship from Anna and Emily. It means we’re only two Parliaments away - and keeping tabs on whatever happens in this Parliament - from reaching general election / by-election parity. Lovely stuff.

All other efforts have been mainly concerned with aligning labels and identifiers across assorted systems. Which is essential plumbing but pretty boring work, so only recorded here for completeness.

Firstly, records for four parties in our election results database have been updated to match the names registered with the Electoral Commission. Now, if we ruled the school, it’s more than possible that we’d introduce legislation to make it mandatory for registered party names to include the word ‘party’. Either that or make it mandatory that they do not. The current mixed economy making it very difficult for our poor old computers to render any sentence involving a party name. But that is by-the-by.

Elsewhere, our librarians have been hard at work updating constituency names from the 2005/10 boundary sets in MNIS, election results and the maiden speech PFF, to match the form given in legislation. Weston-super-Mare now being Weston-Super-Mare being just one example. Truly a thing to celebrate.

Back in MNISland, Librarian Phil has applied ONS-issued geographic codes to all of the 1997 constituencies in England and Wales. Those for Northern Ireland and Scotland not being available. Off the back of which he’s also created a spreadsheet mapping constituency identifiers, names, geographic codes, start dates and end dates. Which Michael is in the process of taking and using to fix election results constituency names and add any missing MNIS identifiers. All of which leaves us with more interoperable datasets and in a better position to import the maiden speech PFF once it’s undergone similar treatment. Time well spent to save time elsewhere. We never promised this would be interesting.