weeknotes.data-search

2018 Week 24

Working software

After three (maybe four?) months’ work, the Statutory Instrument tracker that’s been occupying so much of our time and brain space has finally gone live. From a Data and Search perspective it’s probably the most interesting thing we’ve worked on to date and lots of the team have been involved. Anya, Silver, Michael and Samu have chummered long and hard over domain models. Chris and Samu have turned their scribblings into physical data models. Raphael and Samu have made visualisations of procedures in both two and three dimensions. Jayne has entered business data until her fingers bled. Wojciech has written orchestrations to drag Jayne’s work into the data platform. And Jenna, James and Samu have been writing SPARQL to drag out the data and make a website. Jenna, James and Raphael all helped Jayne with some data integrity issues. Mike, as ever, supported everyone and cleaned up any mess we left in our wake.

It’s still very much a first iteration. We need to get the legislation model right before we can properly list and decorate SIs. We still need to implement the laying model to list items by when they first hit Parliament. And we still need a working sitting day calendar to calculate end dates for prayers and approval motions.

But all in all, we’re pretty chuffed. There’s a massive iceberg of thinking and work propping up what is, for now, four page types. And a whole bunch of stuff that becomes possible in the future.

For those interested in a more visual approach to explaining parliamentary procedure, Samu has made a bookmarklet to flip between workpackage pages and their visual equivalents.

Showing and telling…

…was in the capable hands of Jianhan who took us through the work he’s been doing to extract data entered by the Indexing and Data Management Section of the House of Commons Library into the Search and Indexing triple store. All for the purposes of populating the written question and answer model with decent data.

Community

Anya and Michael went down to Exeter to visit Professor Michael Rush. One of Michael’s main research areas is the social background of politicians and, as a side project, he’s compiled a massive database of 9,578 (and counting) Members of the House of Commons. Anya and Michael are working with Paul from the History of Parliament Trust to take the data and make it available (and editable) online. The last three intakes have not yet been captured in the database and only exist as a card index. Luckily the House of Commons Library have volunteered to backfill the data during the quieter periods of summer recess. Many cards were read and compared to the database schema. A plan was hatched to capture the information in some linked spreadsheets, to be imported into the database once the website code is written. Splendid.

Dan met with a group from the ONS ‘Integrated Data Division’ and talked to them about how we work. Like, agile and all that stuff. It was really good and they were very receptive. He also recommended the work of many of their colleagues at the ONS. Jo Lloyd (ex Parliamentary Computational Section, now ONS) set it up as she’d always thought that data day was great. Thanks Jo.

One world, one web, one team

Jamie went to Tothill Street and sat with IDMS to try and get his head around their toolset. Martin and Steve took him on a whistle-stop tour of OASIS, Indexing and Parliamentary Search. Jamie reports that everything that Michael ever said about “stapling” now makes perfect sense. OASIS (the Odds And Sods Information Service (really)) was a completely new one to Jamie. It’s been allowing IDMS to create source reference data for years, where existing systems or processes have not made data available. He also learnt some immediately applicable things about the workflow of title creation for written questions.

Jamie also met with Oli from the House of Commons Library to discuss Oli’s new Data Science role and related matters. The data visualisation theme has been bubbling away for some time now but, apart from the odd conversation, there’s been little in the way of joined up activity. Liz’s work on the Constituency Dashboard is one visible recent outcome. Oli talked about facilitating more of this work for his team’s bespoke web based data visualisations.

We’d like Oli to start using the UK Parliament GitHub account to increase visibility (amongst PDS) of the work. We’ll also be able to plug this in to our hosting infrastructure to remove administrative tasks such as operations and billing from Oli’s workload.

Jamie and Oli agreed that a parliament.uk sub-domain will be the best approach for now. Couple this with a little bit of pym.js (hat tip to the visual.ons team) and we hope to bring even more life to the Commons Library blog.

Liz would like to thank Samu for showing her round Github. She’s now put a new version of beta website crawl online. Samu explained how git manages differencing and they’ve changed the format to make that better in the future. They’ve also made the file size smaller. The crawl was last run in March. The latest run found 4,500 additional pages. Liz is looking into how we might report on additions and deletions to show some sort of growth over time. This needs a bit more thought. The context here is how people find all these pages. Samu pointed out that crawling might take a fair bit of time in the future, when we may well have 1.5 million pages of questions and answers. Webmaster tools might help here. Or the Wayback Machine. Both of which have APIs.

Alison spent some time chatting with David about space data and with Liz Marley from IDMS about indexing. On Wednesday, she attended PMQs and picked a pretty good day for it. No-one is quite sure yet as to how we’ll model mass walk-outs from the chamber. She also had an informal catch up with ex-colleague Connie Hedeler from NICE who is keen to do more knowledge sharing and may be a candidate for one of our data days.

Domain modelling

Anya and Michael spent time with some people from the House of Lords Library taking a third pass at their domain model for publishing things like Research Briefings. Nothing much changed from their original sessions with House of Commons Library and POST people. So that was reassuring.

Tuesday was the final (we think) domain modelling session for the SI tracker. Jane, Jack, Ben, Alison, Anya and Michael gathered in Tothill Street to talk through the proposed legislation model and enabling powers and statutory day counts. They got to something everyone seemed happy with. They hope John will be happy too. They’ve lost count of how many SI related meetings they’ve had but they’ve been good and we’ll miss them. Thanks Jane and Jack for bearing with us. Thanks Ben for the Maltesers.

Data platform

Jayne from IDMS emailed Mike to ask some questions she and Joe Strawson had about the data platform and the SI web pages. Her description of how she enters data and how it ends up displayed on the website unearthed some new questions about the workflow, the data model and the use of the past tense in step naming. It’s a good example of end to end support and development working directly with users. Without this the questions may never have been discussed, almost certainly lost between teams.

Our colleague Phil from the House of Commons Library, came up with a great use case for our OData endpoints. He used them to update a basic Twitter bot that tells people which Government and shadow cabinet roles a Member has had. The bot replies with positions and start dates, in chronological order. It currently requires users to input a Member’s name exactly, including correct capitalisation, but he’s hoping to work on it and include an option for committee positions too. The code of the bot is on GitHub, in case anyone out there wants to fork it.

On search. And indeed indexing

We’re currently missing some information around the process of data entering Search and Indexing for IDMS to index, so can’t yet accurately describe timescales. Liz talked to Samu, Mike and Alex. She learned we can get slightly closer by using the log data for the queue of items waiting to be picked by Indexing. Mike showed Liz the dead file queue (files that don’t make the cut), and Alex started work on writing a daily count of these.

IDMS flagged an issue with a number of duplicate written questions in Parliamentary Search. When a tabled question is answered it should be hidden from the search index, to be replaced by the answered version of the question. For some reason, both versions were showing up in search results. Alex wrote a script to reindex multiple records in the Indexing and Search triplestore, which seems to have fixed the problem

Corporate data

Dan properly started his new ‘Service Owner - Interaction Management’ gig with some meetings and a programme board. Just like the olden days. He also wrote about his job which he recognises might be vain. But aren’t we all.

Capability

Julie ran two workshops this week about career paths and talent management. Dan went along and thought they went well. He thinks the career path stuff, in particular, is looking great. What’s a career, others asked.

Employee of the week award…

…goes to Jayne Sunley in IDMS, who’s been manically actualising steps by adding business items to work packages. All for the purposes of tracking SIs. At least for now, the SI tracker is basically Jayne. Like the Wizard of Oz, but proxied by computers.

Strolls

No strolls reported but Anya and Michael took the opportunity offered by a trip to Exeter to visit the seaside. Ice cream was eaten and the second round of the air hockey tournament took place. This time, Michael whipped Anya’s ass.