weeknotes.data-search

2018 Week 34

Did Samu have one of those insight things?

Yup. That happened. On a late summer evening, at around 8:47 Samu was gathered with Anya and Michael outside the Red Lion when he announced he’d had one of those insight things. An epiphany if you will. Unfortunately, Michael needed the toilet so he missed the moment. And Anya’s mind has gone blank. Reconstructing from Slack it appears the insight was along the lines of the desirability of doing new things. Like the procedure model. Or Standing Orders perhaps. Because as soon as you try to do anything with existing things, lots of random people with lots of random opinions turn up and start shoving spokes in your wheels. And it does tend to be much harder for people to interfere if they’re not even capable of imagining the thing in the first place.

As the conversation continued, Anya chipped in with, “it’s about what users might want to need”, which blew Samu’s mind. In fairness, she’d been down the juicer a while.

Elsewhere on Slack other people are also claiming to have insights. We’re bubbling over. We really have no shortage of insights. A glut you might say. At this rate they’ll be renaming us the Parliamentary Insight Section. Seriously.

Get yer yo-yo’s out

After spending some time away from the office, Dan blessed us with his presence again this week and even did some service owning for the interaction management programme. In some kind of blast from the first dot com boom past, Dan’s taken to carrying a yo-yo round the office. He bumped into Michael, who positively amazed everyone with his yo-yo tricks. The last time he’d used a yo-yo was in Blackpool back in 1989. Flares were worn. A hooded top might have been sported. This yo-yo was not so good. The string was too tight which made it hard to spin the thing. Which is what you need from a yo-yo. Michael recommended greasing the axle. His remedy for most things.

Community

A post from our friends at the Government Computational Section captured many people’s attention this week. With half of all search queries predicted to be spoken by 2020 (yeah, right), they’ve been looking at what they might need to do to support this. Which rang some familiar bells across town at the Parliamentary Computational Section. This quote in particular caught Jamie’s attention:

This analysis helped us realise we could increase the number of answers we provide in voice assistants simply by making it easier for search and knowledge engines to use gov.uk as a data source.

In fairness, we think we could have told them this. As our friend Tom might say, the web is generative. Build things properly and all sorts of things become possible.

One world, one web, one team

Anya and Michael caught up with Jamie to chat about some of the concerns they have with written questions and the assorted and convoluted data flows that have grown up around them. They’d hoped to give Jamie some work but he turned the tables and gave them some work instead. So now they have to make a list. Thanks Jamie.

Anya and Michael caught up with Emma McIntosh and Kate Anderson. Both from the Petitions Committee. Michael’s had a long term interest in looking at cooccurrence patterns for people signing petitions. And Emma and Kate have a long term interest in getting more value from petitions and petition data. In fairness they didn’t get too far, but a database schema has been requested from our friends at GCS so we might have a better idea of what’s possible soon.

On Thursday morning Michael met with librarians Anya, Liz, Jayne, Jason and Steve to look at Michael Rush’s index cards and the spreadsheets Anya and Michael have made to decant them in to. Michael Rush is a professor of politics at Exeter University whose main research area has been the social background of Members of the House of Commons. Anya and Michael with help from Paul, James and assorted librarians plan to put this on the web soon. Ish. Reading glasses have been placed on order. Work continues.

Rush data planning was interrupted for Anya and Michael by a trip to see Stephen McGinness in the House of Commons Journal Office. They had questions about committee reports being laid into the Houses and a more general query about whether there’s an abstract model for “things being donked into Parliament”. To cover the commonalities between laying, tabling, presenting and depositing. It seemed to be decided there was. Rhiannon Hollis, resident clerk of the Justice Committee, was also in the room and suggested ‘made available to the House’ as a higher level abstraction. Which is probably better than donked. This is why clerks are so important.

Michael went to chat to Matt and Usman about model expression in the new website application stack. Samu wandered over and joined in. In the end they decided no real benefit would derive from expressing models at HTML level. But that at some point we should turn our attention back to schema.org mappings and inline JSON-LD. Which feels like a very Chris shaped job.

That chat was followed by a meeting with Anya, Jianhan, Joe and Michael which Samu also joined. He’s a sucker for a good meeting. The general architecture of applications in and around questions and answers was discussed and drawn. Which is a truly tottering tower of technology. Joe’s since been off excavating what look suspiciously like enterprise architecture diagrams of the whole mess. Thanks Joe.

Alison met with Bridget and Andy from the Renewal and Restoration team to discuss approaches to modelling and managing space data. As in rooms. Not planets. They’re all planning to meet with Stacey, Caroline and Suzanne from the workspace management project next week to categorise some of the data issues already identified.

Our Liz and Joshua had a quick chat about best practice for presenting information in tables: right alignment, equal column widths and making font size for titles and annotations different to the contents of the table. Though it’s possible there might be a limit to what can be done with data picked up from other places.

Showing and telling

Nowt got shown and nowt got told this week. At least as far as your correspondent is aware.

Domain modelling

Still not much in the way of visible progress this week. The LODE instance we use to turn our turtle into HTML is still down so what’s in the model list is unchanged. Nevertheless, we’ve been paddling like frantic ducks under the water.

Anya, Alison, James, Victor and Michael all met to talk about modelling around committees, committee events and calls for things in general and evidence in particular. Michael also caught up with Victor to look through some of the designs so far. Which seemed to go quite well. More work still required to stitch the models and the interfaces together.

Wednesday afternoon saw Anya, Jayne, Alex and Michael get together for a mammoth session checking the made affirmative Statutory Instrument flowchart against the data entered in the procedure editor. Only a couple of discrepancies were found. Which is pretty good considering. Alex claims he saw Michael sleep on a couple of occasions. Given Michael was on the floor and Alex was sat at a table, a good defence lawyer could probably pick this allegation apart. Suffice to say, Michael insists that while he may have snoozed, he absolutely did not pass out. At any point.

Alison had another modelling session with House of Lords Ed McCarthy and House of Commons James Bowman. They looked at government responses to committee reports and the list published by the House of Lords of reports with overdue government responses. They also discussed footnotes referencing evidence sessions, and other references included in reports. These references include all contributors to published written and oral evidence, whether or not they were actually referenced in the report itself. Reeling from references to references, James has set off to find a clerk for Alison to speak to about committee correspondence, divisions and minority reports. The committee report domain model has been updated accordingly.

Data platform

Rebecca, Allan and Christine visited Samu’s corner to discuss the abstract procedure model, procedure instance data, modelling and architecture. They’ve just finished migrating search pages to the new component architecture (Augustus and Thorney), and have now started creating pages for secondary legislation. Samu would like to say that their patience in listening to his architectural musings was greatly appreciated. Michael nods as he types this. Matt also dropped by to share some recent experiences of good customer engagement during a feature release from an online banking startup.

In light of last week’s search API hiccups, Samu spent some time upgrading the Search Service to use the latest and greatest version of the Bing Search API. Version 7 if you’re interested. Since he last touched this code, the vendor released an SDK for this service, so our application now calls methods instead of issuing HTTP requests. Which Samu says is a good thing. Climbing the ladder of abstraction, says Samu. Who does enjoy a good abstraction ladder. Though we sometimes worry he might get dizzy and fall off.

In the course of his fixes, Samu was also able to remove some dodgy code from the search application. Previous versions had assumed we might want lists of search results longer than the 50 supported by the external provider, so Raphael had added some logic to make several calls and concatenate the shorter lists of results into one longer one. Samu squinted at this and asked Matt Reed (who’s been working a lot recently on analysing usage of the Search Service) to see whether anyone actually cared for longer lists. Matt looked at the telemetry data we log in Application Insights and found that the number of search requests with page sizes greater than the default 10 over the last three months is negligible. So Samu rolled up his sleeves and took an executive product feature change release decision to refactor this code into the void.

Samu also consulted Matt when he wanted to choose the lowest pricing tier for the new external search API. Pricing depends on quotas of requests per second and requests per month. Matt wrote some more log analytics queries to show how these metrics evolved over the last three months. Top work this.

On search. And indeed indexing

Our Liz wrote some words on the office post blog about the changes we’ve been making to website search. Nice one, our Liz.

Jamie and Liz met with Robert to discuss the aims and aspirations for search. So far they’ve grouped things that can be done into buckets: title, summary, url, hints, and dates. In terms of analysis they’ll be working on how to describe these things.

Liz met with Joe from the Content Team to review search hint descriptions. They also reviewed feedback from Anya and Robert. Joe’s taken a first stab at the live set, and had some suggestions that seemed sensible to Liz. Sentence case would bring things in line with the style guide for example. There are still some questions about a couple of them, and some further work to review hints that form part of a larger set. Based on the comments, Liz is putting together some principles for hint descriptions which she plans to share soon

Matt’s also been taking a second look at search hints. Previous work was focussed on covering the greatest possible range of results with the fewest possible hints. It used a data set of 6,500 results clicked on by users. The new work takes a data set of just over 100,000 results and looks at the top ten results for all searches rather than just those that were clicked on.

Matt’s also been looking at a new method for optimising the set. Whilst optimising for the smallest possible set of hints, the original method didn’t attempt to minimise crossover. Although we did get fortunate in that the selected rule set only produced more than one hint when one of the hints was a document type. Using a much larger set, it was almost inevitable we’d see multiple hints applied to individual results. To counter this, Matt has included a method which checks for overlap and removes rules for which more than half of the results are covered by other hints.

Corporate data

No specific news on the corporate data front this week. Dan informs us that BizTalk continues to smoulder but we think the main danger has passed. Still, it is something like a bush fire. Just when you think it’s all out, BizTalk makes a farting noise and flames appear from hitherto unburnt areas. Like a clown car, but for data.

Dan tells us that everybody is working on short and medium term improvements. So that’s good.

Capability

Our Liz has been helping Trine out with the sift for the new performance analyst role. Interviews to take place next Tuesday.

Strolls

Anya, Robert and Michael took a stroll up to Covent Garden where they visited the Shake Shack for the purposes of cheesy chips and milkshakes. It was decided that the chips were a little too cheesy and the milkshakes a little too thick. Which sounds churlish but Anya and Michael were both feeling a little under the weather. And sucking was hard. 3/10. Other milkshake establishments are available.

They circled back round through Soho where they found another promising milkshake establishment and took Robert to the most ridiculously expensive hipster shop ever. Robert looked less than impressed. Though we think he may be secretly saving for a Supreme branded fire extinguisher. He’s a skater kid at heart.