What if data sets from sources far and wide could be made available and usable for anyone with the curiosity to combine in novel ways and answer all kinds of previously unanswerable questions?
Governments seem to have jumped on this concept wholeheartedly, with open data schemes such as the Obama administration’s data.gov and our own version, data.gov.uk, and the creation of innovation-driving bodies such as the Open Data Institute in London, sponsored by the Department for Business, Innovation and Skills.
The idea of data transparency is at the fore of the political agenda – in 2013, G8 leaders signed the Open Data Charter, with the goal that all government data is published openly by default, and the European Commission has many portals and legislative measures in place to facilitate that. The Open Data Institute was conceived by world wide web inventor Tim Berners-Lee and runs under the mandate of ‘Knowledge for Everyone’.
At first glance, it’s all very utopian and democratic, despite some upset from privacy watchdogs over grey areas such as the UK government’s clumsy handling of anonymised NHS data.
Governments are exploring use cases from crime and health to transport and education, aimed at addressing societal challenges, fostering participation of citizens and unlocking the potential of open data for wider economies.
But while the public sector has moved quickly to make a show of freeing data that is public by law anyway, commercial organisations on the whole have been more hesitant to test the waters.
In fact, the embracing of open data in the private sector seems to be inversely proportional to the growing obsession with big data, as Jamie Turner, CTO of location-based data software company Postcode Anywhere, points out: ‘As organisations become aware of the great value in their data, they are also becoming less likely to share that asset openly. In an increasingly competitive world, there may be valuable, unique insights in their data – and sharing this with the world poses a potential competitive threat.’
The scepticism around whether a certain level of transparency can be profitable for companies is understandable because of the instinct for asset protection. However, explains Gartner analyst Frank Buytendijk, many businesses are discovering that releasing some of their own data can be a competitive differentiator in a market where more and more customers are starting to care about how data can improve their lives.
‘In utilities, you see the emergence of smart meters and information on how energy is consumed,’ says Buytendijk. ‘Telcos are offering data on recommendations based on calling behaviour so customers can figure out what the best plan is for them. Banks, especially if you do internet banking, really have embraced this idea with charts showing in detail how you’re spending your money. So the idea that a product or service comes with information on how it can be used is becoming very popular.’
Benchmarking is a fairly tried and tested example of opening a limited amount of data to customers, whereby companies publish comparisons of their various performance measurements against competitors. This can be a thorny area, but well worth the gains if it can be navigated successfully.
‘Performance measurement is always a management headache in B2B,’ says Thomas Oriol, CEO of sales pipeline management application SalesClic. ‘From an IT perspective, the most important consequence is to choose open platforms that facilitate the integration of external data sources.’
The key to doing this while protecting your precious assets, says Oriol, is to release data that is only valuable in aggregate form, so that it can’t be traced back to you. ‘For example, as a detergent manufacturer, I don’t care if your grocery store increased revenue by 17% last month, but I do care if the grocery market grew by 17% in the same period,’ he explains.
If a company is open about their operational performance in a way that safeguards their data from competition, that triggers their competitors to respond.
‘If that becomes the norm, and you totally open all your service levels – and they don’t have the back office to be better at it – then you’ve created a competitive advantage where you force your competition to show fewer results,’ says Buytendijk.
‘Then you can narrow the scope a bit and restrict it throughout the value chain where you have a completely transparent forecast.’
Do the data mash
Some companies are beginning to grasp the business case for value chain integration, where better transparency can help them eliminate problems such as inaccurate forecasts of demand in a supply chain. Establishing a strategic partnership can help the sharing of information to match the supply and demand of each stage of a supply chain and eliminate volatile forces such as the well-known ‘bullwhip effect’, which can end up hitting suppliers and manufacturers hard.
And data integration can sometimes help companies strike gold – literally. When Canadian gold mining company Goldcorp was going through a slow patch, it did something quite remarkable for an industry where confidentiality and privacy has traditionally been top of the agenda.
It decided to publish geological data of the areas it had been mining, offering a prize of $575,000 to anyone who might use the data to find the next six million ounces of gold. In doing so, the company triggered its own gold rush, vastly increasing the mine’s production rate.
‘A team of biologists knew nothing of geology but said, “If we look at the data and there’s a pattern in there, we can apply statistical analysis from our field,”’ explains Buytendijk. ‘Goldcorp found places to mine that they would never have thought of with their own geologists. Although often the field is entirely different, it has the same dynamic. So open data can be the start of a conversation and a contribution.’
Making networks work
So perhaps there is something to the claim by the Centre for Economics and Business Research that unlocked value in data will be worth over £40 billion per year to UK private and public sector businesses over the next few years, if they could just loosen their grip on their own data.
But in order to gain the potential windfalls of open data schemes, the first step for companies is to put the practical foundations of security in place – and this begins with designing and building an appropriate network.
If the key to carrying out open data schemes securely is not giving too much away at once, a similar approach can be taken to the network infrastructure. A single compromised machine that shares a hub with other machines can packet-sniff data going to the other machines.
Divide and conquer
For Martin Baldock, managing director of digital forensics firm Stroz Friedberg’s London office, this means segmentation – the physical division of a network into separate parts. A network segment can contain just one machine or many, and each segment can have its own hub or switch.
‘This means that if someone were to break in at any point, they can only see or access a limited amount of the network,’ says Baldock. ‘It’s limiting from your access point where you can actually go, so if your access point was compromised then the interceptor can only see what you can see anyway.’
Rather than looking at external penetration testing, says Baldock, companies wishing to open up some of their data should think about tighter segregation inside their network, to limit where external parties can go.
‘Companies need to look inside, map out and think, “If I got in here, where else could I go?”’ says Baldock. ‘They must look at configurations from inside out as well as outside in.’
However, as with many aspects of IT these days, many are choosing to use the cloud, as hosting this way can be a legitimate option for those looking for fast ways to share large – or even monolithic – data sets.
In 2012, the US government teamed up with Amazon Web Services to make gene research data from the ‘1,000 Genomes Project’ – an international effort to create the most detailed ever catalogue of human genetic variation – freely available to genetic researchers to run analysis on. Doing so via the cloud platform gave the government access to a super-high-speed network and space for over 200TB of genome data.
When it comes to security, there’s a case to be made for clouds being generally better secured than the networks of individual companies.
‘However, if something does go wrong in a public cloud, the impact of that is much higher, and everyone is affected and infected,’ argues Buytendijk. ‘Look at Heartbleed.’
When it comes to organisations protecting the security of their own networks, they may be less robust than the cloud, but may not attract as much unwanted attention as Amazon Web Services or other cloud providers.
‘That’s a very rational and very defendable decision,’ says Buytendijk. ‘But then again, given you’re talking about committing to open data schemes, I would certainly go for the cloud option. I mean, why would you create something special when the whole idea is that it’s open?’
Even with the speed and capacity benefits that cloud has opened up, there are yet more roadblocks for companies in the form of interoperability and standardisation, as open data remains notoriously difficult to work with.
‘The problem is, the moment you start to share the data it loses a bit of context,’ says Buytendijk. ‘There are standards out there, particularly Resource Description Framework (RDF). It qualifies all the relationships in the data, so those standards are much more precise and functional, as well as having a high level of machine readability. It starts to interpret the data and analyse it in a readable way, and with Excel spreadsheet data it just can’t be done. But as yet this is not very well known or widely accepted.’
While the technology for hosting, security and aggregation is all there, the reality is that most of the commercial world is a long way off realising the dream of money-raking collaborative data schemes such as the one that Goldcorp dreamt up.
Businesses may have to wait for the public sector and the few intrepid outliers to make the mistakes first.