weeknotes.data-search

Week 18: 25-11-2017

Leave the Capitol

Yet another reminder that our 5th Parliament, data and democracy meetup is on Wednesday 6th December, this time at ODI Leeds.

Speakers lined up are our Dan, Cristina Leston-Bandeira (Professor of Politics at the University of Leeds), Tom Forth (Head of Data at ODI Leeds) and Edward Wood (Director of Research in the House of Commons Library).

More in the way of interesting talks, nice people and free beer. Come along. Bring a friend.

International, continental, Parliament in the area

The boss managed to cadge a last minute trip to Geneva to attend an IPU working meeting on the establishment of a centre for innovation in Parliament. He seems to be having a fine time and looks proper dandy.

Community

Alex produced data for the House of Commons Library subject indexing terms and their usage frequency for Anya and Michael. There are a huge amount of subject IDs and uncontrolled terms in the Search and Indexing triplestore, so Wojciech helped to write a script that chunks the queries based on Parliamentary sessions so we didn’t accidentally break anything. Unfortunately, after consolidating all of the usage frequency counts, the data didn’t quite match up because some records in the triplestore don’t have a session attached to them. Oops. Liz Marley was able to produce a list of content types that do not have sessions. This was used to find and consolidate the missing data. Except ‘Deposited Papers’ and ‘European Scrutiny and Information’ because they’re huge datasets and there’s no nice way of chunking them. Yet. Work continues.

This is here because it’s a step forward in our attempts to link House of Commons subject headings to Wikidata. If you’re interested in that, you’re probably best off reading last week’s weeknotes.

Liz Thomas, Robert, Samu and Michael had a call with Chris Pennock and Justin Moore from Google about standardisation of data models around elections and election results. And other associated stuff. There were no concrete outcomes but the intent was good. Michael notes we need to chat to NIST about the work they’re doing here.

Anya, Silver and Michael went over to see Ganesh (ex of this parish) at GDS to talk about domain models and URLs for making gov.uk more navigable. Michael introduced the GDS gov.uk team to the GDS registers team and felt quite smug about it. It was a really, really good day ending with full whiteboards and beer. Not very Parliament but it’s all part of the same jigsaw and there’s really no such thing as a website. We need to make these things join like Voltron.

Anya and Michael met with Nick Halliday, Rob McCall and Mark Matten from the NAO to chat about data models and challenges. Again it was good. Things are better when we get out more. They also talked about registers and convening some public sector types to chat about duplication in the lists they manage.

Data day

Wednesday was data day. Ellie Craven and Arnau Siches from GDS came along to chat about registers that Parliament might need and registers that Parliament might produce. We agreed to start simply with a register of Houses because that’s only two things and isn’t subject to too much churn. We also talked about registers of Members, committees and the House of Commons Library subject headings.

People from the website team came along and some stuff was agreed:

Corpcomms

You’ll really need to check Yammer for this.

One world, one web, one team

Aidan spent more time with the Indexing and Data Management Section in the House of Commons Library. He had some meetings to discuss next steps for the Data Toolkit work, and priorities for support, including looking at automation of some of the regular tasks that the team perform.

He also spent time with Mike and the Business Systems team so he can start to understand the infrastructure and data flows that relate to Indexing, Search and subject index management systems.

Michael helped with a service assessment for one of the new web services. More curates; more eggs.

Domain modelling

Angela and Michael continued with the autopsy of Red Book (a database used by the House of Commons for managing committee data). More work is still needed.

Data platform

Chris and Samu explored ontology alternatives to “ClassType” classes and are considering reintroducing restriction classes to the ontology to avoid combinatoric explosions. Trust me, nobody wants combinatoric explosions.

Jianhan continued to work on the OData service which now covers OData expand, count, and navigation property functionality. He worked with Samu on using OData test suites to test the service conforms to the standard.

Ben has almost completed work on Lords’ Inflation (still really don’t ask). Expect floating baronesses over Westminster shortly. The next step is to get a seal of approval from David Beamish. Chris has been working on extensions within the platform but needs the final go ahead from Ben before proceeding with work to import the data.

Raphael worked on running the Berlin SPARQL Benchmark (BSBM), which measures the performance of storage systems that expose SPARQL endpoints. He’s testing it against endpoints set up on our services, building on Matthieu’s performance testing. He’d like to find out the key variables (specifically different inference rulesets) that affect the performance of the triplestore setups that are available to us. He’s using non-Parliamentary data from a recognised benchmark.

Dia worked on the first draft of the search strategy, vision and the goals. The plan is to align them to Parliament’s Digital Strategy.

Michael wrote a post about our general approach to search. And browse. S’ok. If reading this is your thing you should read that. Definitely.

Measuring things

Sara has been analysing application insights data, particularly the photo API requests. Samu and Matthieu made changes to the tracking code on the website that enables sharing analytics across subdomains. Which, in the case of photos, are served from api.parliament.uk. Once the tracking code is changed, we will be able to correlate page views with photo requests. We can then track if an API call was made from the beta website or from external users.

Did the machines learn owt?

Did they heck.

Did anyone say blockchain?

Nope. Unlike the machines, we’re not thick.

Strolls

No strolls were reported. Expect productivity hits in a month. Or so.

What’s Samu watching?

Samu watched The Future of Programming, a talk by Bob Martin, a signatory of the Agile Manifesto. Uncle Bob talks about the past, present and future of programming and programmers, about the gender divide in the industry, about what “agile” was really supposed to mean (in the sixties and the nineties) before it was hijacked by “project managers”, about NASA programmers working in six week iterations when automating the trip to the moon, and about how programming needs to mature if it’s to claim the universal leadership role that’s its birth right. \o/

Also from Samu: A Mother’s Confession: A SONG WITH FOOTNOTES by Amanda Fucking Palmer

Things that caught our eye