ontologies

2026 - Week 13

New, old Parliamentary Search

From his perch high above the rooftops of South Yorkshire - like some Sheffield-based Batman, or, failing that, a Sheffield-based peregrine falcon with a GitHub account - Developer Jon has been hard at work, sweating the new, old Parliamentary Search pixels. And lovely they look too. So lovely that critics from across our librarian and computational sections couldn’t help but get involved. Your regular correspondent - distracted by a trip to Malta and the opportunity to eat a hutch-load of rabbit - isn’t entirely sure what happened. But there was a meeting. And a meeting with a difference. This one going by the title of ‘design crit’. Like we’re in art college or some such.

Librarian Anya was there. Librarian Phil was there. Computational ‘expert’ Robert was there. And Jon himself showed his face. Presumably thinking, I can’t leave these people alone to decide my future. A whole host of pixel-based tidying has now happened as a result.

Starting with the slightly more functional, our examples page has moved to a new home to keep it consistent with the other applications in our ‘stable’. And the all-important search term input box now has spellcheck enabled. That, alongside assorted text changes and the removal of a stray character that had somehow found its way into a page template make the whole thing much more presentable. Why those stray characters always find their way into page templates remains a mystery. Has anyone ever tested a website and not found a stray angle-bracket assaulting the user’s eyes? We suspect not.

Still, as Young Robert likes to say, making a web-based service is a lot like making a trifle. Leaving out the cream and the sprinkles disappoints many a stakeholder. Whereas the true trifle aficionado knows it’s all about the fruit and the custard. Let’s grab a spoon, dive in and see what lies beneath the hundreds and thousands.

Sweating the taxonomy part I - query expansion on Parliamentary Search

Pixels aside, Jon has been hard at work implementing query expansion. What, one may ask, is that? Well, suppose one were to type the words ‘avian’ and ‘influenza’ into the search box and poke the little magnifying glass icon. You might expect some cogs to whir and some gears to grind and then to be presented with a list of results, all of which contain some combination of the words ‘avian’ and ‘influenza’ in some field or other. All well, all good.

The problem is, there may be many combinations of words used to encapsulate the same concept. This is why god gave us thesauri. For example, a Member may rise to their feet in the Commons Chamber and make a contribution to a debate on avian influenza using the words ‘bird flu’. A perfectly reasonable phrase for the less scientifically-minded. What then should search do? Knowing our dear user is quite busy enough with their actual day job, it would be unfair to ask them to open two tabs - one featuring Parliamentary Search, a second featuring their thesaurus of choice - and gather and enter every possible expression of the thing they have in mind, and then combine the results. These people have a democracy to run.

Our dear reader may well be thinking, well, those clever folk at Google manage it. Dear reader, Google not only have the advantage of employing the finest PhDs from the world’s finest universities, they also have the luxury of having scraped the entirety of the public web - there are signals in the world of hypertext that are not present in the world of text. If your regular correspondents were to write a journal article on the subject of ‘avian influenza’, not once mentioning either birds or flu, one might not expect the article to be returned by a web search for the phrase ‘bird flu’. Such thinking fails to take into account how other people describe that page. If enough other people link to our journal article and the text in those links contains ‘bird flu’, the Google machines are quite clever enough to associate the two terms. At which point inference kicks in, the same ‘cognitive’ mapping being applied to other searches for ‘bird flu’. And, for that matter, to searches for ‘avian influenza’. This is how the world came to acquire Google bombs. Be careful out there.

Now, our crack team of librarians and computational experts are very clever, but are somewhat short of the finest PhDs from some of the finest universities in the world. Our Jianhan is the only PhD possessor in current company, although Young Robert and Michael can claim possession of a Woodcraft Folk badge or two. We’re also lacking the entirety of the public web in a data centre three times the size of the Palace of Westminster. Our budgets do not stretch to that.

Clearly we need a different solution. That solution - as is the case for many of our solutions - rests firmly upon the shoulders of our taxonomy service. Which we sometimes call our thesaurus service, because we’re perverse like that. The taxonomy service provides labels and identifiers for concepts against which parliamentary material is subject - and procedurally - indexed. It also provides alternative labels for concepts. So the concept with identifier 8483 has the label ‘Avian influenza’ and alternative labels ‘Avian flu’, ‘Bird flu’ and the rather horrible or wonderful, depending on your turn of mind, ‘Fowl plague’.

Getting back to the point. Developer Jon’s query expansion work means there’s an additional step between the user typing a phrase and pressing go and the bit where the query is sent to our Solr search service. In that step the phrase is tokenised, bits of the phrase that can be mapped to taxonomic concepts are mapped to taxonomic concepts, identifiers and alternative labels are returned, and the whole thing is sent off to Solr. So, should a user turn up, type bird flu, press go, and expect magic to happen, magic will now happen. The actual search that’s run being:

"Avian influenza" OR "Avian flu" OR "Bird flu" OR "Fowl plague" OR all_ses:8483

Should any of our Members ever utter the words ‘fowl plague’ in a debate, their contribution will be returned alongside contributions featuring the words ‘avian influenza’, ‘avian flu’ or ‘bird flu’ - together with contributions where none of those words were spoken, but where our crack team of librarians have determined that was the subject under discussion.

Should you wish to skip the expansion step, wrapping your search term - or a part of your search term - in square brackets will do just that, a search for ‘[Bird flu]’ being a search on that phrase. With some stemming thrown in, but that’s for another time.

If you’d like to peep under the hood, pop along to the search beta, run a search and pop your fingers atop ‘Ctrl, Alt + D’ if you’re behind the wheel of a corporate Dell, or ‘Control, Option + D’ if you’re one of those moderne Apple Macintosh show offs. Should you have Javascript enabled, you’ll see a panel pop up called “Librarians’ tools”. At the end of which, you’ll see what we actually asked of Solr. Which, thanks to Jon’s tender care, also now works even when the search returns no results. All really rather nice. We’re honestly quite pleased with it.

Sweating the taxonomy part II - a Single Subject View [of / for / from] the Library

If you’ve been following along from home, you’ll know the taxonomy also forms the backbone of our Single Subject View of the Library project. The first output of which is a Knowledge Base [for / from] the House of Commons Library. In this case, we’re not only indexing publications with taxonomic concepts, but also our crack team of researchers in the form of their subject specialisms. The interface we’ve built properly stretches the legs of the taxonomy, taking account of labels, synonyms and transitivity over the taxonomic structure. As of this week, every concept assigned as a specialism to a specialist now comes complete with its own scope note, all the better for ensuring we don’t misapply any in the future.

Over the past few weeks, our Library Knowledge Base application has welcomed into the world its younger sibling: the Publications Explorer. This is another Data Graphs project aimed at exploring all the ways one might wish to describe and aggregate publications from our three research services: by publisher, by section, by owner, by author, by contributor, by collection, by collections of collections and by items on the order papers of both Houses. Our explorations are taking account of both internal and external audiences. As of this week, we’ve got our hands on a dump of the data from the current application, converted that to Postgres - thanks Shedcode James - written a whole bunch of SQL to export CSVs, and loaded those CSVs into Data Graphs. At the other end of the pipes, Shedcode James has been busy learning Cypher Query to take the Data Graphs API and churn out a website. Or a browsable space, as we prefer to say.

Would this offering be complete without the ability to browse by subject? No, it would not. Where should those subjects be sourced from, we hear our dear reader ask? Well, the taxonomy of course. To that end, Data Language’s Silver and Ant have taken our scribbled specification and are currently piping both subjects and indexings into the new project. Once James is back on the clock, we’ll be back to making pages. Lots more pages.

Finally, still with Data Graphs and still with the taxonomy, efforts to replace our aging OaSIS application with a more modern equivalent took a leap forward when we finally found ourselves on the receiving end of yet another database dump from the current system. With thanks to Joe for having the necessary tenacity and persistence. This dump has also been converted to Postgres by James, which means we’re now ready to explore what’s in there and how far it diverges from parliamentary reality. Again, it all rests on the taxonomy, some piping for which we already have in place.

One starts to get the feeling that, if we were ever to value-chain map all of this, somewhere down the bottom would be a GIF of team:thesaurus carrying the world on their backs like a taxonomic Atlas. Which we suppose makes Librarian Anya, Zeus. That doesn’t seem right.

On the subject of value-chain mapping …

Value chain mapping all of this

At our last quarterly planning meeting, Business Analyst Koye, Delivery Manager Lydia, SQL Neurotic Rachel, Developer Jon and computational ‘experts’ Young Robert and Michael promised to spend some time investigating whether it might be possible to map out value chains for our assorted services. Borrowing some much needed domain expertise from our crack team of librarians and grabbing a Wardley map stencil that someone had kindly made in Omnigraffle, we now have maps for Parliamentary Search, our procedural mapping explorations, all things election related and the good old Library Knowledge Base. It’s quite amazing the amount of fruit and custard one needs to make a decent trifle, when all the customer sees is some cream and some sprinkles.

Not only have our value chain mappers made some value chain pixels, they’ve also made a first draft of a Google sheet to capture the assorted nodes and indeed edges describing those chains. A spreadsheet that Data Scientist Louie has taken and plugged into Power BI. Something that looks much like a Wardley map being the result of those efforts. How well that responds once all the nodes and all the edges have been captured remains a matter of conjecture. Hopefully, when we’re done, Koye will end up with a fairly simple spreadsheet to maintain, tweaking to reflect progress, and management-appropriate visualisations will pop out of the other side. Hopefully.

I am a procedural cartographer - to the tune of the Palace Brothers

Sticking with the trifle metaphor, for our procedure tracking experiments our map making exploits clearly provide the fruit and the custard. Some progress has been made at the cream and sprinkles end of the pipes, our Procedure Browsable Space™ gaining a couple of new features. The first of these sees the inclusion of actualisation counts across our step listings. Or, how many times has this procedural step actually happened? If indeed it has happened. Such counts are now present on both step lists for legislatures and step lists for a step collection, providing at least a hint of precedence. Should more than a hint be required, hypertext is your answer. Click and all will be revealed.

Elsewhere, a handful of work package listings have gained flags where either a committee has raised concerns and / or a Member has tabled a motion. Such flags now appear on work package listings for secondary legislation and treaties. Or they do when our procedural triplestore is feeling well in itself. This is entirely dependent on whether the triplestore gets tired or not. If the triplestore has had a late night, it tends to time out and the blasted SPARQL query returns nothing. Less than ideal.

In better news, our Jianhan has promised to look into upgrading our procedural triplestore - currently running two major versions behind what is considered current - in the not too distant. At which point, we’re hoping that the secondary legislation query and other similar queries work more predictably, without the triplestore needing to take an afternoon nap. Always, we live in hope. What we do in the meantime remains a matter for short debate.

Psephologising wildly

Temporarily turning our gaze away from elections past and toward elections future, Developer Sri, Developer Gabriel, Data Scientist Louie and computational ‘expert’ Michael consider their first pass at a data model for the replacement Candidates Database Tool kinda complete. Kinda, as The Candidates Database Tool is, at least in our opinion, a very odd name for a system that manages the whole general election process from candidate gathering to result verification. And is a system we’d also like to handle by-elections and notional general elections. The campaign to call it the Elections Tool starts here.

As part of that work, Michael has taken a spreadsheet mapping constituencies to election reporting authorities - as at the last general election - and imported it to a slightly simplified version of our psephology database. This has been sent across to colleagues in Software Engineering and is available here, should you wish to roll your own election management system.

And for our final election news item, imagine our delight when our election results website made an appearance on the revised and reworked data.gov.uk. Delight only slightly tempered by seeing it filed under ‘Government’. Still, it’s always nice to included.

On orders being standing

And finally, and finally, and finally, our dear reader will be well aware that we’ve spent an inordinate amount of blood and treasure exploring new ways to publish old standing orders. This based on the fine work of the people at the Parlrules project. The data they kindly donated covered House of Commons public standing orders. And we’d rather like to add standing order coverage for the upper House. Not to mention those of the devolved legislatures. Never quite able to resist lending a helping hand, Shedcode James set to work screen-scraping and the results were not too shabby. James’ fine work has now been tidied by more fine work from Librarian Emily and the current set of House of Lords public standing orders are now published.

This breakthrough opens a new dilemma, as breakthroughs often do. The website has been built on the assumption that the full genealogy of every order and every fragment of every order has been captured. In this case it hasn’t.

Should you click on an order, such as this one covering any differences in form or style of writs, it is suggested that it made its first appearance on the 23rd October 2025, when, as any fule kno, it actually first left the printing press on the 27th March 1621. So that needs more work. But first, more thought. It usually being better to do the thinking before the working, except in cases where the working is the thinking, of course.