The Data Economy

Technologist Mike Lynch takes a look at how the role of data has changed and why it may well be the new oil. Both valuable, of strategic national importance – and a security risk.

September 2016

I was recently asked by a very wise journalist whether I agreed with him that "data is the new oil". It is a very good question. After all, oil powers modern life, from cars to kettles as well as the digital device you might be reading this on. Data, meanwhile, powers much of our modern lifestyle: Amazon knows what books you like, Ocado knows your preferred brand of milk, Spotify recommends music guaranteed to please. Just as countries that sit on vast oil reserves are wealthy, so too are the companies, like those listed above, that gather a lot of data. In fact, technology companies currently occupy the top three spots on the Fortune 500, and an oil company the fourth, making the case that data might in fact be more valuable than oil.

Data is undergoing incredible change. In the past, data was gathered for one task, such as a mailing list, which would require names and addresses to be entered into a database of rows and columns. The process was very formal and clean but now we have a world of unstructured data, bits of prose, images and audio that do not work in the same way. Because they are richer, we find unexpected insights among them. We are at the very beginning of a paradigm shift in the type and amount of data we can understand because the data itself is evolving. Take the Internet of Things, for example, which will be in your home before you know it: devices everywhere will generate more data.

But it isn't only the data that morphs and grows, how we analyse it does too. With data fusion, we are starting to gather data from more independent sources, bringing them together to get answers in a way that wasn't possible before. In an unintended way we can also find out things we couldn't before. New algorithms around analytics can find insight among dirty, incomplete and sometimes inaccurate types of data and get us stunning results whereas in the era of the database, huge effort and great expense went into data cleansing, and might not have provided any benefits.

Furthermore, in this new era of machine learning we are seeing intelligent systems starting to have an effect on all sorts of tasks, from legal and financial services to autonomous vehicles. These systems learn from data, which is central to their intelligence, and they need a very large amount of data to learn from. This brings us into a new question of data ownership.

When asked "who owns the data?", the knee-jerk response is "I do", but what if that data can do some societal good? Enter the concept of informed consent, which is a nice idea in the context of elective surgery, for example, but is quite difficult to give for your data. For example, you can get cheaper insurance if you put a tracking device in your car. What happens if the tracker works out that this type of vehicle goes up in flames. Well, if we haven't given informed consent, we can't get the benefit of the data. So while it is a good place to start, it isn't what we need in this new world. Perhaps, like in accounting, we need to be asking "why should you share data?" and find practical ways to make it work.

The other side of data ownership is derivative data: who owns the results? Again, many of us feel that "I do" if our own data was used. But if you are a student that uses a library, reads the books and then write a book of your own, none of the authors you read are staking a claim to your work. The same should happen with data. But it isn't completely straightforward.

Take the NHS, which has vast troves of health data. From a research point of view, it is generally considered a good thing. But what if we let a foreign entity use this data for drug development? Well then, we need to find ways of describing this data as "strategic data" that comes with a price tag, be it selling the drug to the NHS at lower cost, or forcing the R&D to be done in the UK, etc. We must not be shy about realizing the value of our data.

The last point to consider around data is security. There will be data breaches and in fact, 80% of the FTSE 100 have an infiltration on their network. Cybersecurity keeps executives and governments awake at night, and already the modern thinking on cybersecurity is that breaches will happen and are a price paid for holding data. Likewise, anonymisation, while commendable, is going to fail and will be broken by our sophisticated data analytics.

So back to the original question, I agree that it is starting to look a lot like data and oil are similar. They are both valuable, of strategic national importance – and a security risk.