Weeknotes

2018 Week 23

DATOR Day

After a short break data day returned and turned 40. In its new format it only takes us half a day, but it’s still pretty much all about the data. Aside from the gossip. And the scheming. There was a decent variety of stuff, but we didn’t quite get through it all.

Some stuff we agreed:

Instead of implementing Google Tag Manager on Parliamentary Search we’ll work to understand the compliance requirement, and implement Application Insights in a lightweight way that doesn’t involve working with the legacy code.
In terms of tracking Statutory Instruments we think we have a fairly good idea on where to capture SI citations and statutory day counts. We still need an accurate calendar for sitting days in both Houses before we can calculate statutory days (previously praying days) and determine “deadline” dates for SIs.
On the wider legislation front we still need help from external experts before Michael feels any degree of confidence. The names Richard Pope and Paul Downey were both mentioned. Michael thinks it might be a good point to get back in touch with David Beamish.
Performance analysts won’t be given direct access to Application Insights. Yet. But our Liz will be on hand to answer specific questions.
On GDPR compliance, the Members Names Information System (MNIS) interface should not be exposing dates of birth. When this change is made we need to review downstream systems.

More visions

Very little in the way of visions this week. The cheese must be wearing off.

Community

We were joined at data day by Marc Adams from the NAO to chat about stats in general and plan some actual work we might do together. We’re now looking for a guest for the next data day in July. Hands up if that might be you.

Some immediate, practical stuff we agreed to do with Marc:

Michael to send over the straw man model for constituency stats series. Which is done, with a meeting in the planning.
Marc to send over the Access database for the ONS history of constituency codes and splits and merges.
The NAO to include Parliament constituency identifiers in their data service.

On a similar subject, our Liz has been helping colleagues in the House of Commons Library set up systems to produce new reporting pages for constituency stats. New features for a constituency dashboard and topic based stats browsing went live on Thursday. They’re looking really good and people have been positive about them so far. Which doesn’t stop Michael scowling at the URLs. And the “topics”. That aside, the data is ripe for ingesting into the data platform, to allow for display on the beta website constituency pages. And open up new ways to query parliamentary material in a way that hasn’t been anywhere near possible before. A meeting is planned.

Robert had a couple of meetings about developing search functions over websites. One with people from ACAS, one with people from the Legislative Assembly of Ontario. Properly international, if not continental.

Matthieu published his blog post about the trip he took with Sara to TICTeC. If you’re interested in the impact of civic technology, you should read it.

A data scientist called Izzy got in touch with Mike to share her MSc project. It pulls House of Commons divisions data from data.parliament.uk and analyses voting behaviour across Members.

Domain modelling

Alison has been meeting lots of people and having chats with the Collaboration team about all things related to “visiting Parliament”.

Anya and Michael met with Lef Apostolakis from POST to get another view on the work they’ve been doing to model research briefings. They’d already done a couple of passes with assorted people from the House of Commons Library and still need to sit down with House of Lords Library people. The POST session didn’t change an awful lot, which is good news and suggests the model is mostly correct.

On Wednesday, there was yet another meeting on SIs and the tracking thereof. This one with Jane, Jack, Jen, Janya, Jichael and Jalison. Jenna and James being otherwise occupied. After several weeks they think they’ve finally cleared up their confusion around SI procedure clocks, the definition of statutory days and how to model statutory day counts. That said, there’s another meeting next week to definitely, finally, completely agree the last bit.

Anya and Michael went to Brighton to meet Silver. Because he lives there. Or near there. And because it’s the seaside. And because it has dodgems and air hockey. They sat on the beach and planned out a talk they’re due to give to NetIKX. Which was supposed to be an introduction to ontologies for librarians and knowledge management people, but has ended up as plea to switch from learning ontologies to learning about domain modelling instead. So no one gets their money’s worth there.

They also spent a bit of time talking about the Modelling Parliament talk they’re due to give at the KanDDDinsky conference in Berlin. And some time talking about teaching House of Commons librarians a little more SPARQL. The day ended, as many days do, in the pub, hatching a plan for the first useful chunk of a legislation model. Which has been bothering Michael for some time.

Data platform

In the best news for a quite a while, procedure data is now live and happily turning into pages on the beta website. Although for now and I guess for reasons(?), those pages are still restricted to the parliamentary network.

After herculean data entry efforts by IDMS, some monumental procedure modelling work, an INTERIM DATA SOURCE built from scratch by Chris, visualisations in two and three dimensions built by Raphael, a novel editorial interface built by Wojciech, all ably supported by Mike… we got there.

In the first instance we’re only working with procedures for Statutory Instruments. In the longer term we think any parliamentary procedure could be captured in this way. This excites us, for we are a very niche brand of nerd.

Wojciech’s done some sterling work to describe our Query API using the Open API Specification aka Swagger. We’re now publishing details of assorted endpoints, query parameters and content negotiation options in a standard, machine readable format.

Jianhan has been busy adding fixed query endpoints for questions asked by a Member and questions answered by a Member.

He also updated the OData endpoints to reflect the newly imported questions and answers data. You can now get the total number of questions, total number of answers, questions by a Member, answers by a Member, questions asked on a date, questions asked between two dates, and correcting answers expanded with corrected answers. There’s also a fixed query to return questions by search terms in headings.

Samu made major improvements to the default HTML rendering of data from our API. It now shows meaningful labels for resources, images for member photos, maps for constituency areas, and improved styling for a better table display. Matthieu helped en route with several useful suggestions.

Samu had a busy week. He also added analytics to capture redirects from hansard.millbanksystems.com (the thing that search engines still have indexed) to the new Historic Hansard. Liz has been looking at user IDs. There’s been about 22,000 unique users per week in the last month.

Use hyperlinks to subvert hierarchy

The lastest version of dotNetRDF was released this week. It’s an open-source software library we both contribute to and rely on. The new release contains a number of contributions by Samu:

A new read-only SKOS API, inspired by Matthieu’s work on transforming our controlled vocabulary to SKOS and building tools to replace and improve the software used by the Indexing and Data Management Section of the House of Commons Library.
A new GraphML triplestore writer that converts RDF data into an XML format used in visualisation software. It was inspired by Raphael’s work on visualising procedures for the SI tracker project.
A new feature for the GraphViz writer (another RDF visualisation converter) that allows for the output of separate literal nodes. Also inspired by Raphael’s work on procedure visualisation.
A fix for date handling in the JSON-LD parser which was inspired by our work on serving developer friendly hierarchic JSON-LD and RDF/XML from the query API.
A bug report regarding incorrect HTML serialisation that Chris found whilst working on the procedure prototype. This resulted in the maintainers identifying and fixing a deeply hidden indexing optimisation bug in the library.

Based on the code we’ve contributed, our query API now supports GraphML output for all queries. Here’s Jianhan’s query giving the Parliamentary questions answered by Lucy Frazer MP in a format that can be visualised by software like Gephi.

Corporate data

Dan’s been working a fair bit with David, Lew, and Noel on all things data integration-y. They’ve been trying to improve the pipeline of work coming in and the quality of requests the pipes contain. They’ve also been trying to keep the endless email chains to a minimum and working out where we go next with our infrastructure.

David went off to the Biztalk 360 Integrate 2018 conference and returned with an assortment of Biztalk stickers. Which at least makes a pleasant change from Users First and Being Bold.

Dan also got a new gig. As of Friday he is now both Head of Data and Search and Service Owner – Interaction Management. I have no idea what interaction management might be, but it sounds super. Well done Dan.

Strolls

No strolls were reported this week. Though Anya, Silver and Michael did walk to the end of Brighton pier. Well, as far as the dodgems anyway. Anya and Michael played air hockey. Anya whipped Michael’s ass.

Things that caught our eye

On the merits of Tidy Data [PDF]
Microsoft sinks data centre off Orkney
The Golden State Warriors, the happiest team in sports
This is what the best of data journalism looks like
The Structural Properties of Online Social Networks and their Application Areas
Transport for London Launches Bus Passenger Tracking Trial
I Love GDPR Spotify playlist
Liz liked this paper on Workflow of statistical data analysis [PDF]. She liked the description of creative (undocumented) and permanent (documented). “Anything that we give to other people (collaborators, journals,…) must come entirely from permanent”.