weeknotes.data-search

2018 Week 20

More visions

Not so many people had visions this week, but Dan did have a couple of catch-ups with Robert where they spoke about all things data and search and strategy. Great stuff, says Dan.

Liberating the library

Following up on last week the Wikidata community has now manually matched 3672 of our thesaurus subject headings to Wikidata things. Which is ~8% of the total. Not bad going for a week.

Showing and telling

Sara and Liz showed and told. They gave an overview of the analysis work on parliamentary material that’s been subject indexed with the controlled vocabulary by the Indexing and Data Management Section of the House of Commons Library. This data hasn’t been easily available in the past and has the potential to be used as feedback for indexers to improve the quality and consistency of their work. The showing and the telling were well received (although a telly would be an improvement on the projector) and questions were asked. They’re planning to spend more time digging into this area, to look at the usage and relatedness of subject headings, which will hopefully drive some of the website work around ‘topical’ concepts and concept cooccurrence.

Community

We’ve been pretty big on community this week. On Wednesday, Anya, Silver, Robert, Michael and Chris (from the Parliamentary Archives) went along to The National Archives for a joint session on records and data with TNA and Wellcome Trust people. It was a good session with lots of shared areas of concern and a couple of fairly concrete things we want to work on.

The first is more Parliament / TNA focussed, with a commitment to collaborate on a UK wide legislation model that won’t give John Sheridan sleepless nights.

The second is a meeting to discuss reference data: where organisations have lists of things, where those lists are duplicated and who we think the appropriate custodian should be. Lists of government departments, positions and incumbents being a perennial bugbear here. Michael is trying to organise something and invites have been extended to the ONS and NAO. If you think you might have lists of public sector things that you shouldn’t really be maintaining, or indeed should be, do get in touch. It’s definitely all more town planning than skyscrapers but then the useful work usually is.

In other register related news, Dan’s been doing a little more digging (plotting?) on how we might publish a register of Members of both Houses. Stay tuned.

On Thursday, Anya, Robert and Michael met some people from the Electoral Commission to talk about the House of Commons Register of Members’ Financial Interests. The Electoral Commission take “data” from Parliament but have to enrich what we provide to make it meaningful. More structured data would reduce the need for rekeying and reduce their workload. So we’re keen to work with them and understand their use cases. They’ve also put us in touch with the Independent Parliamentary Standards Authority to see if they have similar needs for better structured parliamentary data.

In even more community news, Dan had a good meeting with Dr Ruth Dixon on Monday. Ruth is doing a year-long research project with the House of Lords library, and was looking at how to get XML versions of bills out of data.parliament.uk. Which is currently pretty painful. Can we make it better? We would like to think so.

Domain modelling

In last week’s episode, I typed, “They did tidy up some of the legislation model, though Michael still thinks it looks wonky / broke.” John Sheridan happened to be reading and a conversation ensued on Twitter. Michael received a message from John saying that the proposed legislation model was giving him nightmares and making him lose sleep. Which came with an offer to collaborate. Which we’ve taken him up on. We now have copies of some of the TNA models which Silver is currently drawing up. Hopefully next week we make a start on something sane that doesn’t upset John.

On Tuesday, Silver and Michael met with Oli and Philip Brien from House of Commons Library research land to start on a model for research briefings and similar documents. For a one hour session with two people, it felt like they got through a lot and have managed to draw up a very basic picture. They still need to do more work with both libraries and dig a little deeper into POST publications. But it’s a start.

Just about every week I type something about procedure flow charts with the optimistic claim that, “they think they’re almost there now”. We do often think this, but we’re just as often disappointed. IDMS spent last week adding procedure data according to the flowcharts. This week they’ve been busy adding workpackage data to actualise the procedural steps. Samu’s been looking at the workpackage visualisations Raphael made and has spotted a number of precluded routes that we’ve failed to spot. Things like, Parliament having decided to agree to a motion, there’s currently nothing in the procedure data to preclude them from disagreeing. Which feels obvious. But not when your face has been pressed up against the glass for weeks. So we don’t think we’re nearly there now. We probably think we’ll never be there.

On a more positive note there was a good meeting between the IDMS folks who are mincing the pork, the Data and Search folks who are making the sausages and the web folks who are packaging them. So we feel we have a better grip on the ends and the middle. Though, as Robert might say, it’s all just balloon squeezing. But with visualisations.

Data platform

Liz and Mathieu have been hatching a plan to generate report style data from VocBench (our preferred option for future management of the IDMS controlled vocabulary). Ideally, they’d like to export the audit history and use it for reporting on when new concepts are added, amended etc. All this information is currently compiled into Word documents by IDMS so anything would be an improvement. They’ve been in touch with the lead developer for VocBench and this is on his roadmap. In the meantime, he’s given Mathieu some tips on writing SPARQL that should help here.

On searching and indeed indexing

Liz and Alex had a chat with Joe Foster about the processes involved in slurping material out of business systems and into the search and indexing triple store. They’re interested in what information is ‘available to index’ and when. Unfortunately data about when material enters the triple store isn’t captured, so it isn’t currently possible to accurately assess the data transfer. Ideally they’d like to be able to describe what it looks like now (delays in receiving information, typical time scales for indexing etc.) so there’s a baseline to compare against as and when we make changes to the data and processes involved. Work continues.

Corporate data

Dan held the first of two workshops with Lew, Mat, Noel and David to get on top of things before Mat leaves. They looked at the priorities for data integration for the next year. The main things to come out of it for Dan were:

Strolls

Again, we saw very little in the way of strolling this week. Though Anya, Silver, Ben and Michael did stroll to Soho to meet Ganesh (ex of this parish). There was more talk of SIs and registers. They called in at the Toucan and tried to buy a round of Poitín. But the landlord couldn’t sell them any. Because it is now illegal. Or so they were told. Presumably under an SI or some such. Someone should really build a tracker.

Things that caught our eye