weeknotes.data-search

2018 Week 21

More visions

Dan has continued to have visions. Like Saul, shortly before he fell of his donkey. He’s done a fair bit of data strategy stuff, chatting to people from all over the place about things like reporting data for projects, digital forms, data about spaces and places, and the requisite Customer Relationship Management systems.

He also edged two inches forward on the long journey toward “providing a register of Members” type thingy. Two inches forward, one back. As they say.

Showing and telling

Tuesday was show and tell day. Sara and Matthieu recapped their trip to TICTeC. Dan reports that it was really good with lots of learning and reflection on the international state of civic tech. Dan says it’s the kind of stuff that gets him into work. As opposed to getting paid for instance. He’s put Matthieu in touch with Ed Saperia at Newspeak House in the hope we can run or participate in an event on this theme. And next year, hopefully, we might even present at TICTeC ourselves.

Community

No time for community this week. Really much too busy for friendships.

Domain modelling

On Tuesday, Robert, Silver and Michael took a trip to The National Archives to compare legislation models in the hope they could get to a minimum model to support the new statutory instrument tracker. Previous efforts had made John Sheridan sad enough to lose sleep. There’s a new sketch now, which they hope will give John a better night’s rest.

In the afternoon, they spent some time with Oli and Carl, both statistics researchers from the House of Commons Library. They went back over where they’ve got to on the model for publishing Research Briefings and documents in general. Oli and Carl seemed happy enough. Or at least pretended to be. On Wednesday Anya and Michael spent some time with Lucinda, from the Parliament and Constitution Centre in the Library, going over much the same thing. There was a good chat about anonymity and privacy and the need for individual opt-ins for researchers having a URI, ORCID IDs and modelled contributions. But, that aside, Lucinda also seemed happy. They still need to run the model by people in the House of Lords Library and POST.

Data platform

The week, like many others, has been mainly dedicated to the tracking of SIs. The Indexing and Data Management Section in the House of Commons Library have started to enter test data for business items in assorted SI procedures. Many conversations have been had about the correct way to add dates to business items which might be slightly ‘off-stage’. Anya, Martin, Jack and Ben met to chat about business item links and what they used to call ‘deep linking’ into documents. Deep links is a thing we’ve never been terribly good at. Ben cast the carrying vote for linking to HTML rather than PDFs. Top work Benjy.

James has been using the workpackage visualisations, made by Raphael and fine-tuned by Samu, to cross check the data entered by IDMS. He’s been looking at whether business item data is correct, according to the relevant procedural model their work package is subject to. He’s also been checking that the workpackage flowcharts render correctly according to the data ingested from IDMS. He’s planning to report back to Anya shortly.

Jenna and James have both been focussed on surfacing all the data related to a work package and understanding the challenges around presenting information from a ‘this is what has happened’ perspective compared to a ‘this is what should have happened’ perspective. The consensus is that we can’t create a comprehensive view of what has and should have happened. So we either have to pick one representation, or the other. For now the primary focus is on what has happened (via the recording of Business Items), but they have every intention of also creating a ‘what should have happened’ view (according to the procedure data). Once they figure out a meaningful way of showing it.

The problem here is that we’re attempting to render a tree from a model of procedure that is most definitely a graph. Most of the business items have dates attached, but for some, like the Speaker’s consideration of EVEL certification, we have a date for the decision being published but not for the actual consideration. Which means we have to take into consideration when procedure determines they should have happened. Which means we have to span the procedure graph to get some sense of what led to what. Which is hard. Luckily we have Samu, who sat down to write some SPARQL and 40 hours later came up with this absolute unit of a query.

In non-SI related news Jianhan has now ingested over 176k written questions and 145 answer corrections into the staging environment. He’s also fixed the synchronisation issue between tabled questions and their accompanying answers. There was an issue here whereby, when a question is originally tabled, it’s given a URI. And when the question gets answered, the tabled one disappears and a new question with a new URI gets created. Jianhan has managed to link the two by their UIN and table date so we can now update the tabled question with its answers.

Jianhan would also like to point out our new OData API got used as an example in a post about troubleshooting data refresh performance. So that’s good.

Data toolkit

Wojciech’s been busy creating improved data entry mechanisms for procedural data. Mike has helped out with testing and sharing with team:Anya who are laboriously entering data in preparation for launch of the thing formerly known as SI tracker. Applause team:Anya.

On searching and indeed indexing

Samu and Mike dealt with an issue that involved an outage to parliamentary search. It was initially caused by a brief failure to the cloud infrastructure under one of our VMs but fixing it took them on a journey through Parliamentary limits on registry changes, Windows patching, security vulnerabilities preventing communication between machines, and Java updates changing folder names because someone thought having the version number of the JRE in a file path was a great idea. Another win for the Parliamentary Computational Section.

Alex spent some time with Liz to find a better temporary process in Python for downloading many JSON files from a blob store, merging them and processing for analytics work. Dear reader, I have no idea what this means either. Something to do with computers I suspect.

Corporate data

Lewis has continued his work on restructuring the existing reporting solution, as well as preparing more parts of the new stock system integration for testing.

Noel has continued with the data cleansing work for records on People Data.

Joiners and leavers

Mat has now left the building, heading toward the more enlightened climes of Denmark. Or somewhere in the general direction of Scandiland. Farewell Mat. Alison joined as our new data architect to be met with the usual reception of an incorrect security pass and a bricked laptop. Welcome aboard Alison.

Strolls

Dan took a couple of strolls with Robert. They ate their packed lunches in the gardens on the Embankment. Like tramps. They talked about search and the data strategy, and shared general reflections on how to work. Unlike tramps.

Anya, Robert and Michael took a trip to the RAF Museum at Hendon where they saw a real life atomic plane. Wows. Then they accidentally went to a party.

Things that caught our eye

10 Problems With Impact Measurement in Civic Tech
Why we have produced a Beta version of a new ONS API
Using the new Beta ONS API
A bot writing poems from twitter
Understanding legacy technology in government
Our Liz pointed to Validate, an R package intended to make checking your data easy, maintainable and reproducible. She says it’s less important that it’s an R package and she’s more interested in what it’s trying to achieve. Validate allows you to test your data set(s) against predefined rules (either in- or cross-data set), import and export rule sets from structured or free-format files, investigate and visualize the results of a data validation step, perform basic rule maintenance tasks and define and maintain data quality indicators separately from the data.