weeknotes.data-search

2018 Week 36

Dator day

Wednesday afternoon was supposed to be data day. But Dan had childcare issues so it got cancelled. Well done, children.

The delivery is the strategy

Dan’s gone all in on data strategy stuff this week. He’s currently preparing for a pitch to the Information Authority in a couple of weeks’ time. He’s also been doing some Service Owning for the Interaction Management programme. Mostly in the form of meetings. And generally trying to pull his head out of holiday mode.

Community

Tuesday lunchtime saw Anya and Michael take a short stroll across St James’s Park to the Institute for Government. After partaking of a light lunch, they took their seats for the star studded launch of the Parliamentary Monitor 2018 report. The event was excellent, as is the report. Your favourite computational section gets a nod or two and there’s a constructive description of the ongoing difficulties people face when attempting to work with Parliament’s data.

Late on Thursday, John Sheridan and Matt Bell, both of TNA, dropped by library land to chat with Anya, Jack and Michael. There followed an hour of John’s rather brilliant mind explaining some of the intricacies and patterns of legislation. With Michael attempting to capture a little of John’s brain on a whiteboard. The meeting had been occasioned by a link to the legislation model in last week’s weeknotes which John had duly followed and hit upon some areas of concern. Twitter chats commenced and John offered his help in explaining powers and duties and how they interwingle.

Michael’s now drawn a new picture of the legislation model which we think captures something of what we learned from John. We also think it brings us closer to fixing some of the concerns raised by David Beamish last week. And we hope it is more pleasing to our Samu. Because Samu’s happiness is important to us. Thanks John. Thanks Matt. Thanks David.

Anya, librarian Liz and Michael caught up on where we’re at with entering the Rush data. So far about a fifth of the card index files have been entered into the spreadsheet. With remarkably few questions arising. There are some things on the cards that don’t fit into the current shape of the database. They decided to make additional sheets even if the data couldn’t be slurped into the database yet. Emails were exchanged with Paul and James. Progress was made. Go librarians.

Alex finished getting a big chunk of data out of Solr and sent it over to the lovely people of Newspeak House. More details back in week 33. He also sent some other stuff, like Matthieu’s SKOS vocabulary export and a topic terms visualisation that Raphael made some time ago. We think they’re happy with it.

One world, one web, one team

Team data analysis showed and told how they’re approaching their work on search data. In particular, splitting apart variations that can be explained and variations that trace back to random process. Unfortunately the computational section was unable to provide either a working Skype camera or a slide projector, so no one got to see Matt’s pretty pictures. Sad face, Matt. Feeling for you.

Pictures - pretty or otherwise - apart, they looked at grouping search data to better investigate the behaviours and processes behind the observations. The general approach is to bucket by device type and by weekday versus weekend as these often show significant differences. They also looked at a way to assess change in measures. For measures that can be summarised as success or failure events, they’re using the standard deviation of a binomial distribution to compare current observations to past observations. This, we’re told, is a general method and doesn’t require a baseline to access a change. Liz says she enjoyed the discussion that followed about data driven baselines and KPIs. Each to their own I guess.

Our Liz met with Jenny and Tom, the communications and change managers for Skype and Office 364, to share some thoughts around creating KPIs. Jenny and Tom are running a workshop to identify measures to demonstrate the benefits of their work, creating common measurements where possible. They talked about including measures that can relate their actions to outputs, to reflect impact on the measures. Liz covered how team data analysis could help to assess measures and KPIs. They’re planning to continue working with Jenny and Tom.

Liz spent some time with Julie helping her think about a workshop she’s running with Jonathan Seller, Tori Baker and Peter Lamb. They’re hoping to gather metrics that will help with the “People, Culture and Capability” work. Which roughly translates to employee satisfaction, recruitment and retention stats, skills gaps and the like. Julie says Liz has helped them in their thinking about data and KPIs. Julie also says thank you.

Matt and Sara have been analysing data from service desk calls. They began by creating a cooccurrence matrix, looking at how often pairs of words occur together in call descriptions. They’re interested in seeing if there’s any correlation between these word pairings and the teams the calls are assigned to. They also broke this down by individual words, looking at where calls ended up if they contained the word ‘Spire’ or ‘iPhone’ or some such.

Domain modelling

After several weeks spent playing possum, the LODE service underwent a Lazarus like resurrection. So our HTML files are now back in line with turtle files. Which is pleasing.

In the meantime, Samu has managed to deploy our own local version of LODE, so we don’t get stuck the next time it goes down. More details below. In a different corner, Robert has been chipping away at his own turtle to HTML translator. Results so far look good. The LODE service is handy but it doesn’t pick up all of the data in the turtle. So once the HTML is generated there’s some manual cutting and pasting and bodging and fudging. Which our ham fists occasionally get wrong. Given we have more control over Robert’s translator, we can easily tune it to pick up new vocabularies and new data properties. Which means the HTML expression remains closer to the model and less stuff can go wrong.

To that end Michael and Robert have been tinkering with ontologies, adding image links, authors etc. Which was all stuff we’ve been ignoring for months because LODE didn’t pick it up. As a side benefit, the Robert powered parser choked on our use of rdf:isDefinedBy. Which, as any fule kno, should have been rdfs:isDefinedBy. An error we’d copied and pasted 700 times and which hadn’t been spotted by LODE. So that’s also now fixed.

Following on from last week’s meeting with David Beamish, Anya, Robert and Michael gave the procedure model a minor tweak. Rather than hanging off the work package, the subjectTo predicate is now hanging off the work packageable thing, which, at least for now, is either a statutory instrument or a proposed negative statutory instrument. Which all feels neater and truer to the domain language of Parliament. Although this might also go as we flesh out the legislation model and powers and duties. John remains in charge here.

Last week’s weeknotes had a slurry of words about the problems the procedure model has with self-preclusion and steps that can be actualised many times in series but not in parallel. I won’t repeat them here. Off the back of this Anya and Michael tweaked both the flowcharts and the procedure data to remove some self-preclusions and some withdrawal to decision preclusions. Fascinating stuff.

In a meeting with Jack they also discovered that there can only be one non-fatal motion for the negative procedure in the House of Commons. Which presumably Jack had already explained and they’d drawn wrong. This is now fixed in both flowcharts and data.

Anya, Robert and Michael continued with their ‘comment blitz’ in a bid to improve all the comments across the models. They probably rewrote three before getting distracted by Daily Express comments. But they were a good three. As was the Daily Express comment.

September spawned a monster…

…in the shape of statutory instrument withdrawal in the draft affirmative procedure. We had hoped to avoid modelling withdrawal in the procedure data, on the assumption that withdrawal kills everything and withdrawal is always possible. But it turns out withdrawal can only happen up to the moment of a decision in each House. So we had to capture withdrawal precluding some things, withdrawal causing some things and decision points precluding withdrawals. Dear reader, we are sorry. You’d have to be Erskine May with a cheap Dell to find any of this interesting.

For all these reasons, Anya and Michael had yet another meeting with Jane and Jack to draw even more lines on the draft affirmative flowchart. This has now been tidied. To the extent that this horror could ever be described as tidy. It’s now awaiting sign off from Jane and Jack before we make the data changes. I mean, look at it. Absolute state of that.

Data platform

In happy news, our Rebecca has returned to the fold. And has been busy writing an assortment of SPARQL queries for the purposes of tracking statutory instruments. Way back in week 7, Samu added checks on our public repository so that developers collaborating on SPARQL queries could get feedback on whether their changes had worked. Or crashed. Or burned. The return of a super-productive Rebecca prompted Samu to add pull request cloning from GitHub to VSTS. Which makes it much easier to review and merge changes to the Query API. Which means web developers can work faster.

Returning to the topic of the recently resurrected LODE service: for the past few months, we’ve been using LODE to generate human readable pages for our domain models. It’s a piece of open source software created by Silvio Peroni, a senior assistant professor at the University of Bologna. Silvio maintains a freely available instance of the tool online. Unfortunately it’s proved a little unreliable and is fairly frequently unavailable. Twitter calls have gone out and Silvio has kindly kicked it back to life. But it’s not good that we’ve been relying on an unsupported tool for something so crucial.

So Samu’s now deployed our own local instance which runs on an Azure App Service. It’s deployed by a VSTS build script from the LODE public source control repository on GitHub. And it’s running on a free plan so doesn’t cost us anything. The whole thing took some time to get working because LODE isn’t built in a way that facilitates cloud deployment. It relies on either the ontology source being RDFXML (which ours aren’t), or on the OWLAPI to convert it from other formats to RDFXML. For some reason, Samu’s version can’t seem to find the OWLAPI library, so the second option is not an option. This is either because it’s wrong (works on the developer’s setup but fails elsewhere) or because Samu’s no good with Java.

In truth, none of us are any good with Java. Which is a pain considering more and more of our services are written in Java. Samu thinks we’ve done a good job and gives credit to Wojciech on GraphDB, Matthieu on VocBench, and Chris on WebVowl. But he also thinks we’d benefit from some expertise here. Which doesn’t mean we need permanent Java capability. But some serious Java mentoring would do us good.

In the absence of OWLAPI, Samu resorted to EasyRDF (thanks @njh), another free online tool that converts between RDF serialisation formats. Which is either another hack or a true embracing of loosely coupled web architecture. Depending on your particular sense of aesthetics.

The only actual modification Samu made to the LODE source is the addition of a custom HTML homepage hosted on Gist, which the automated deployment injects into the installation. This homepage retrieves a list of ontology folders from the domain model repository on GitHub and generates pre-parametrized download links.

Corporate data

Judging by the email your correspondent received, Noel was on his tod this week. Poor Noel. He’s been looking through more tickets on the incident management system. He’s also created a stored procedure to fix an issue in BizTalk.

Dan met with Liz to begin development of a few progress and performance measures for the integration engineers. We’re hoping that Matt, Sara, David, Lew and Noel will work together to develop something useful.

Strolls

No strolls reported, but we’re told that Dan and Robert were planning one this very afternoon. No more details are available at this point.

Things that caught our eye

Samu read fake news in the context of digital media by Alan Rusbridger.
He also read architect Modern Web Applications with ASP.NET Core and Azure. Samu says it’s the state of the art in cloud based Web development. All coming from Microsoft. Who, by coincidence, supply the technology we use in the Data Service. It all contributes to the long overdue work on upgrading our applications (Query, Search, Photo and OData) to the latest version of the web development framework.
Liz flags up a survey on exploring key performance indicators.
Dan suspects Google’s attempts to make it easier to discover datasets is quite a big deal.
Michael wrote a post about users and their endless neediness with help from Anya, Robert and Silver. Unfortunately he found it trickier to publish than to write because he’s buggered up his computer and can’t access his website. Thanks again @njh.