Big Data. I believe that one phrase could get millions in venture capital funding. I don’t even have to put a product with it. Just say it. And make no mistake about it: the rest of the world thinks so too. Data is “the new oil”. At least, according to some pundits. It’s a great headline making analogy that describes how data is driving business and controlling it can lead to an empire. But, data isn’t really oil. It’s nuclear power.
Black Gold, Texas Tea
Crude oil is a popular resource. Prized for a variety of uses, it is traded and sold as a commodity and refined into plastics, gasoline, and other essential items of modern convenience. Oil creates empires and causes global commerce to hinge on every turn of the market. Living in a state that is a big oil producer, the exploration and refining of oil has a big impact.
However, when compared to Big Data, oil isn’t the right metaphor. Much like oil, data needs to be refined before use. But oil can be refined into many different distinct things. Data can only be turned into information. Oil burns up when consumed. Aside from some smoke and a small amount of residuals oil all but disappears after it expends the energy trapped within. Data doesn’t disappear after being turned into information.
In fact, the biggest issue that I have with the entire “Data as Oil” argument is that oil doesn’t stick around. We don’t see massive pools of oil on the side of the road from spills. We don’t hear about our massive issues with oil disposal or securing our spent oil to prevent theft. People that treat data like oil are only looking at the refined product as the final form. They tend to forget that the raw form of data sticks around after the transformation. Most people will tell you that’s a good thing because you can run analytics and machine learning against static datasets and continue to derive value from it. But doesn’t make me think of oil at all.
Welcome To The Nuclear Age
In fact, Big Data reminds me most of a nuclear power plant. Much like oil, the initial form of radioactive material isn’t very useful. It radiates and creates a small amount of heat but not enough to run the steam generators in a power plant. Instead, you must bombard the uranium 235 pellets with neutrons to start a fission reaction. Once you have a sustained controllable reaction the amount of generated heat rises and creates the resource you need to power the rest of your machinery.
Much like data, nuclear fission reactions don’t do much without the proper infrastructure to harness them. Even after you transform you data into information you need to parse, categorize, and analyze it. The byproduct of the transformation is the critical part of the whole process.
Much like nuclear fuel rods, data stays in place for years. It continues to produce the resource after being modified and transformed. It sits around in the hopes that it can be useful until the day that it no longer serves its purpose. Data that is past the useful shelf life goes into a data warehouse when it will eventually be forgotten. Spent nuclear fuel rods are also eventually removed and placed somewhere where they can’t affect other things. Maybe it’s buried deep underground. Or shot into space. Or placed under enough concrete that they will never be found again.
The danger in data and in nuclear power is not what happens when everything goes right. Instead, it’s what happens when everything goes wrong. With nuclear power, wrong is a chain reaction meltdown. Or wrong could be improper disposal of waste. It could be a disaster at the plant or even a theft of fissile nuclear material from the plant. The fuel rods themselves are simultaneously our source of power and the source of our potential disaster.
Likewise, data that is just sitting around and stored improperly can lead to huge disasters. We’re only four years removed from Target’s huge data breach. And how many more are waiting out there to happen? It seems the time between data leaks is shrinking as more and more bad actors are finding ways to steal, manipulate, and appropriate data for their own ends. And, much like nuclear fuel rods, the methods of protecting the data are few compared to other fuel sources.
Data isn’t something that can easily be hidden or compacted. It needs to be readable to be useful. It needs to be fast to be useful. All the things that make it easy to use also make it easy to exploit. And once we’re done with it, the way that it is stored in perpetuity only increase the likelihood of it being used improperly. Unless we’re willing to bury it under metaphorical concrete we’re in for a bad future if we forget how to handle spent data.
Data as Oil is a stupid metaphor. It’s meant to impress upon finance CEOs and Wall Street wonks how important it is for data to be taken seriously. Data as Oil is something a data scientist would say to get a job. By drawing a bad comparison, you make data seem like a commodity to be traded and used as collateral for empire building. It’s not. It’s a ticking bomb of disastrous proportions when not handled correctly. Rather than coming up with a pithy metaphor for cable news consumption and page views, let’s treat data with the respect it deserves and make sure we plan for how we’re going to deal with something that won’t burn up into smoke whenever it’s convenient.