weeknotes.data-search

Week 8: 15-09-2017

Community

Last night was our 4th Parliament, data and democracy meetup at Newspeak House. Thanks to everyone who came along and to our speakers. It was a good night with talks from:

Our Dan on where we’re at with Data and Search. And how bloody difficult everything is.
Anna Scott from the ODI on open data for democracy.
Rachel Coldicutt from doteveryone on building a better internet and better technology culture.
Our Samu on the new Parliament data platform.

There was beer and chatter. You should come along one day. And if you’d like to hear a talk or give a talk or just say hello, our mailing list is here.

Data day

All told data day was a “punishing hell ride”. Dan had a bit of a reversion and went into project manager mode. We like to think this was well received by colleagues.

On the plus side things were agreed:

we’ll work with Ben to come up with a proposal for “Lords inflation” (don’t ask).
Anya and Michael will pick up the government departments, positions and incumbencies domain modelling in preparation for that being needed on member pages.
we need a new approach for delivering links to MPs websites. A ridiculously simple feature made almost impossible by the brittleness of downstream systems. New work should include social media links.
Aidan will arrange a meeting with the Ordnance Survey to discuss long-standing issues with region, constituency and postcode data.
we’ll work with the members website team to break down the stories for showing votes on member pages.
we’ll work with the committees website team to collaboratively design new data sources for information missing from existing systems.
we’ll prepare for committing to stable identifiers very, very soon.
we’ll investigate the impact that removing upper case letters and vowels (to avoid inadvertent swears) would have on the length of identifiers. Jianhan is about to do maths.

Parliament in the area, we’re international, we’re continental

Dan presented to a delegation from the Lok Sabha (the lower house of India’s bicameral Parliament). Mainly around data and search but also technology in general, and publications, and archives, and broadcasting. One of the delegates said Dan had “…a scholarly look, like Karl Marx” . This pleased Dan enormously.

Anya and Michael met with Rose Rees Jones who’s doing a piece of work for the ODI around the challenges of adopting and proposing data standards. They grumbled a bit but it was very useful.

Domain modelling

Michael spent Monday diving into Oli’s election candidate database and wrapping a Rails app around it. He made some pictures of gender splits that he felt might be indicative but were somewhat drowned in caveats.

He had many conversations with assorted people about the problems we face with joint inquiries given the state of our current data model and the way it forces people to double and triple enter the same information. He’s hoping to get some internal user research happening around this.

He also went along with the website team who did a show and tell to House of Commons committee people. Again he banged on about the problems with existing data models and joint inquiries and joint evidence sessions. He came to the conclusion that 90% of his job is asking dumb questions about cardinality.

Data quality

Angela and Raphael met with Jacqui Cooksey from the House of Commons to go through the current state of select committee data in the Members Names Information Service. They discussed possible solutions to tidying and ingesting the data. They also discussed data ownership and update frequencies for committee data currently only found in the website CMS.

Samu, Ben and Michael met to discuss an approach to “Lords inflation”. This is both too difficult and too tedious to explain but there is a plan and it will establish a more firm footing for House of Lords member data.

Samu worked with Angela and Ed from the website team. He designed a database and data entry form for recording committee data. This will be further developed and maintained by Angela, and will be used to record data that’s not available in existing committee systems.

Data platform

Samu made substantial improvements to the API that serves the new website. SPARQL queries can now be extracted out of the application code and into separate files making it much easier for web developers to write, test and deploy the queries. Developers seemed pleased with this.

Search

Feedback on search, which was at first a trickle, is now a flood. Thankfully expert colleagues are helping to manage, disseminate and analyse this information.

There was a first meeting on how to deal with the search feedback; early days but all seems to be headed in the right direction. Wider proposals on search, including the search-of-beta, are beginning to be discussed.

Robert met with a Commons Library colleague to hear about the data science work they’re starting and also discussed inference and text analysis. He also spoke with various people about how to add useful ‘hints’ to search results, in the form of labels.

Robert and Dia met Laurence to talk through the proposal for searching the beta website.

Machines that do learning

Dan met with Oli Hawkins to talk about the Commons Library establishing a data science function.

Raphael went over to GDS as part of his data science accelerator course. He worked on implementing feedback for a retraining pipeline and looked at multiclass decision forests (not a clue, mate).

Measuring things

Saffiyah spent the week looking at ways of using APIs and Webhooks to automatically grab data from our survey software.

She also continued to work on establishing metrics for the House of Commons Library Enquiries service.

Sara and Dia met to talk about search feedback. Sara made changes on the report for monitoring search API performance. She also looked at clustering techniques to define user types based on their journeys around the website.

Liz started a search feedback and measures log to help inform decisions about future testing and record a history of changes and caveats which might influence measurements made and conclusions drawn. She started to look at the two feedback polls for search. Data requires cleaning and there’s work on how to categorise comments. As much as possible will be automated and summarised weekly whilst the polls are running. The aim is to use this information to understand possible areas for development.

Excellent customer service awards

Mike compiled a CSV of data back to 2000 to help a PhD student from the Federal University of Minas Gerais in Brazil with their thesis on gender equality representation in the House of Commons.

Capability

Julie presented her thoughts and potential options for how to take forward the capability work to the senior management team.

Julie and Mike attended a DevOps Journey presentation by the British Medical Journal. It was followed by a really useful discussion about culture, challenges and our current situation across assorted development / technology teams.

Corporate data

Matias and David attended a people data workshop organised by Tori. There were people from a variety of teams across Parliament.

The general consensus was that:

There’s a need to redesign the People Data service.
Current processes and workflows have to be reviewed.
Business teams need to understand that the data they produce has an impact on other parts of the organisation.
Data on source systems has to be consistent.
A single identifier should be used wherever possible.
A new standard data model of a person should be created.

Strolls

Robert and Michael went for a stroll in St James’s Park. Talk turned once again to hypertext and their general belief that 95% of improving search comes from improving browse. They also chatted about how we haven’t quite properly explained what we’ve done with the new search and what implications that has and the various advantages and disadvantages. There is still a blog post or something missing here. Michael has promised to type words. When his hangover dissipates. If.

They also chatted about “digital transformation” (whatever that is) and came to the conclusion that it only usually works if it’s emergent from internal organisational change and rarely works by driving stakes into ground from the outside. This is a problem.

No other strolls were reported.