weeknotes.data-search

2018 Week 33

Is Samu morality?

Our Samu is currently off on his holidays. In his absence we’re continuing to work under the assumption that his morality remains ontologically complete.

Community

Anya and Michael went over to Newspeak House to chat to John Bryden who’s a new fellow in those parts. They sat on the terrace munching bagels and feeling quite good about life. John’s looking at graphs of political connections and lines of influence, which gave everyone a chance to chat about All Party Parliamentary Groups and Michael another chance to chummer on about complex user communities and information ripples. And resurrect one of his old chestnuts about using plagiarism detection for the epidemiology of information. Michael’s put John in touch with both Tony and Andrew. Hopefully we’ll chat again soon.

Anya promised John a parliament’s worth of House of Commons Hansard contributions and any accompanying subject indexing data. Alex has started work on getting an export of all proceedings from the 55th parliament from Solr and writing a Powershell script to iterate through linked child contributions to return records that are indexed differently to their parent proceedings. Nice work Alex.

Anya and Michael also spent some time looking at the card index files Michael Rush sent up covering the last three intakes of House of Commons Members. They’ve added some test data to the spreadsheet they’ll be working with for now, which went quite well. Except it seems that the handwriting of professors is really no better than than that of doctors. We expect to be scanning and emailing many cards shortly.

Anya and Michael had a phone chat with Andy and Dave about the stats series model and how compatible it is with the RDF data cube model. Michael had spent some time with the data cube documents but his brain had failed to really map the spec to the examples. They now think they’re slightly clearer on data cube but are still wondering if they need to make more changes to their model. Or just scrap it and use data cube. Michael’s keen to get some time with ODI types to help with this and is chatting to Leigh as he types.

Way back in February, Michael ran an event at Newspeak House to chat about analytics, tracking and privacy in the public sector. It went well. Or Michael thought it did. And there was interest from attendees in collaborating on a document. Michael has been meaning to get on with this ever since but some combination of statutory instruments and tracking services took his attention. So Ade dropped by Tothill Street on Thursday and kindly kicked his arse. They started on an outline of a document which they’ll share more widely when they’re happy. Thanks Ade.

Showing and telling

Matt did a brief reprise of his talk on the new website architecture which seemed to go down quite well. Michael watched over a crashy appear.in link and had questions about model expression in components. A meeting has been arranged.

Domain modelling

A fair bit of progress this week but all a little less visible than would be ideal. The service we used to transform Turtle files to HTML fell over and hasn’t yet been resurrected. So our Turtle files and HTML are running a little out of sync. We’ve made previous efforts to run a local mirror of LODE but our lack of Java skills let us down. In the meantime, Robert has set out to write his own Turtle to HTML converter. Which would be handy.

Anya, Jayne and Michael met Jack from the House of Commons Journal Office to run through the impact of withdrawal of statutory instruments across the laying and procedure model. Given withdrawal seems to preclude every other procedural step, they think they should just keep it part of the laying model and steer clear of adding withdrawal steps to the procedure flowcharts. But they need to check that with Samu.

There was also some chat around approval motions for affirmative SIs falling at the end of sessions and new motions needing to be tabled. Which brought up talk about a session clock and how much additional procedure flowcharting would be necessary. There’s a general feeling that the current procedure model doesn’t allow enough in the way of conditional logic for steps that repeat in series according to flows through cycles. A head bending chat with Jayne, Silver and James is planned.

On the subject of head bending: librarians Anya, Jayne and Claire, Michaels Mike and Michael, some highlighter pens, and much paper all got together to check the procedure data matches the current state of the flowcharts. They had hoped to cover all five procedures but limped along to the end of one. An exhausted Mike crept from the room and decided that writing a screen’s worth of SQL would save everyone’s sanity here. More checking is planned but Mike’s work will hopefully make fewer brains break.

Anya, Robert and Michael had all of Thursday booked in for a meeting with the slightly optimistic title of ‘Comments blitz’. They’d intended to run through all the models so far and check the comments and improve them where possible. They managed to cover off one model. But at least they think the comments improved.

Anya and Michael drew up a new version of the legislation model with SI series and series membership. Mostly off the back of Jack answering a stream of emails. And Catherine Tabone from The National Archives explaining SI series and membership constraints. Thanks Catherine.

Alison met with Nick Jones from the Business Systems team who showed her the application they’ve built for managing All Party Parliamentary Groups. For now it’s used for registering APPGs, any benefits they’ve received and the membership of their governing committees. It doesn’t maintain a full list of members, nor is it used for booking meetings. The information in the application is no more than appears on the APPG web site, although it can be used to run reports. External organisations linked to the groups are entered as free text. Alison’s sketched a quick domain model of what we have for APPGs, excluding meetings and incumbency, which are already covered by other models.

Data platform

This is a late one that just missed the deadline for last week. Samu spent a day with Oli exploring our chosen cloud offering from a Python developer’s perspective. They started with a ‘Hello world’ and got as far as sending emails through a Logic App via Parliament’s Exchange server to a list of recipients stored in a managed Postgres database by a little Flask app in a Docker image continuously deployed to an App Service. They stopped short of automating builds from Git by VSTS because they both wanted to make it home to bathe their kids. And they feared our reader might run short on breath.

They now have longer term plans for improving the Current Awareness newsletter, moving Research Briefings to the new website and improving them by adding facilities for publishing research data. They also talked about linking Oli’s work on crowdsourced place names to the new data service. All in all they had a very good day. Which is nice.

Matthieu set about creating Docker files to help demonstrate our issues with the latest version of Vocbench. He also created a new set of load tests for the Augustus / Thorney infrastructure and got briefly (and it must be said incorrectly) blamed for killing search (see later). He spent some time chatting through the load testing with Matt, Allan and Christine and the results (about 10 pages per second on one node) gave Matt enough confidence to take his code to staging. Matthieu and Allan talked about graphs, ontologies and Brixton. In the absence of Dan, Matthieu also picked up the running of the data strategy session. And, perhaps unfortunately, was the sole attendee. Top marks for effort though.

Use APIs, they said. What could go wrong, they asked?

Wojciech interrupted his well-deserved holiday to forward an alert from the search service. A quick check revealed no results were being returned for any search query. Which was less than ideal. Jianhan and Jamie sat scratching heads and troubleshooting the issue. For a few minutes they worried that Matthieu’s load testing had been a little too aggressive and had caused something in the data platform to keel over and die. But as they peeled back the 500 errors from our API Management layer they uncovered 400 errors in the underlying service. More specifically 410. The API had gone.

Jamie spent some time checking Twitter and Jianhan checked the online service and it soon became clear the issue was with Azure. A support ticket was raised and within an hour the service was up and running again. In the meantime, other people experiencing the same issue had migrated to a more recent version of the service. Another job for our to do list.

Jamie and Jianhan would like to apologise to Matthieu for ever doubting him.

On search. And indeed indexing

Our Liz reports we now have more visibility of searches across Hansard, MPs, Constituency Finder and online petitions. And also across some things like the deposited papers filtered list. Though there’s still some debate about whether we even call that search, given there are no search terms. That said, Michael has fewer doubts. In his head at least, it’s just a badly designed browse. According to Liz’s calculations, these assorted forms are used more than the default website search, though she suspects her fuzzy query still needs work.

Hints went live on website search a few weeks ago. Sara et al decided they’d use time spent on a search results page before clicking a link as a measure for monitoring change. They’ve grouped sessions based on behaviour: the device type used and result number clicked. Which is similar to what they did when we ran the A/B testing for the display of URLs on search result pages. Sara’s using power analysis to estimate the sample size required to detect a change with a given degree of confidence, i.e. how many days of data do we need to collect to confidently say that, if there’s been a change, that change is significant? As it turns out, for desktop users clicking the first result we have enough data points after 4 days, whereas for mobile users we’d need 67 days’ worth of data. Desktop users generate the majority of traffic.

Measuring things

Our Liz also checks in to say lots of people are talking about measurement. Which obviously gets her seal of approval. Liz thinks we need more consistent messaging and a few practical things to help move everyone in the same direction. But she’s not sure what they are yet. She’s planning to speak with the performance managers about the new measurement model (which appears to not yet have a URL) but probably not until October.

Corporate data

Lew’s been monitoring our integration messaging system after a few performance configuration changes. He’s also been looking at how best to utilise some new servers the team have had donated, continued his involvement in the ongoing stock system project, and spoken to people about how we improve our KPI reporting.

Noel continued his work to clear the tickets logged whilst BizTalk burned during the last couple of weeks. He’s used the tickets to test that the system is in the same state it was in prior to the fire. Or at least the ones that weren’t too singed to read. All the tests have proved satisfactory, faults are being addressed and no new reported or observable problems have come to his attention. Work continues around improving the system and avoiding any reoccurrence. Fire extinguishers are on order.

Strolls

Another week with very poor stroll performance analytics. Though Anya and Michael did wander through Spitalfields on their way to Newspeak. And stopped off to try on daft glasses.

Leavers and joiners

Our Liz reports that James, our shiny new data analyst, will start with us on the 17th September. Delightful news.

Did anybody get a free sandwich?

Liz got a free sandwich for pointing out Pret had messed up the labelling scheme. Well done Liz.

Anya acquired a free chocolate hazelnut croissant. This time because Michael messed up the queuing system. Again from Pret. They’ll be bankrupt at this rate.