The data security topic

In my view data security is an abstract concept, just as abstract as money, religion and fascination – all devised by humans. And anything that is human-made can be human-destroyed.

When it comes to IT and data security, history has proven that pretty much anything can be cracked, taken, reshuffled, altered, refurbished, reheated and re-served. As long as there is a strong enough incentive, nothing is impossible.

In my experience as data specialist, I have met plenty of security officers and I have heard plenty of stories, ranging from concerns about CPU memory address containing undocumented functions to pure denial of data access due to the risk of terrorism and all of this in the dimension of the more and more popular cloud computing and big data.

There have always been and will be unknown features in hardware. They are not always accidental; indeed, sometimes they are intentional as they are generating profit for their makers on top of the profit generated by the products they are embedded in.

The profit of dumb appliances is not endless (be it CPUs, be it SIM cards with monthly subscriptions, be it smart home appliances) and to maximize profit, the vendors of these products inevitably need to find new angles. Let’s forget about the CPU memory allocations for a minute and let’s take the simpler example of telecom operators.

In the 90s when the mobile phones became mainstream, there was plenty of demand to keep the telecom operators running through subscription fees. Given that there are plenty of new users demanding the services, this is a pretty good income for a while. But as the time passes, the services are bound to get cheaper (due to competition on the market, new emerging technologies etc) and this income is not nearly enough anymore from the vendor’s point of view. At this point, there is a need for innovation, a need for opening new opportunities.

Fast forward to the mid-2000s and we get into the birth of the big data, where the actual SIM cards and phone plans are cheaper, but the data they generate is sold expensively without the owner of the device necessarily being aware of that.

For example, if you have a phone on you and you cross the city, the telecom knows which route you took and they can run endless analysis to find out what drives your choice and how it can be influenced. And if not the telecom, then someone else for sure is very much interested to know this.

Welcome to the era of the human-behavior-as-a-product.

Facebook and others have hit it big with the selling and reselling of the activities and the preferences of the mostly clueless general public.

On the plus side, the telecom data is also used for crowd analytics used for public transport optimization. This, in my view, is a great use case. As long as the bus comes on time right after I get off the train, I am happy to not wait in the rain.

So, about those CPUs undocumented functions – what options are there? The answers differ whether you are the end consumer or a producer/marketer of a product with embedded CPU.

As a producer, you could build your own hardware and document it, which you will regret shortly after because you will want to make a buck on something more than the plain appliance. There is so much more to CPUs… CPUs are not just toasters!

And even if it wasn’t for the CPUs leaks, there is so much more happening on the data scene that it is not really that important what CPUs do behind the scenes. You still need to persist data eventually, you still need to send a message or two, you still need to use networks and wireless devices (and by the way, wireless devices are perfectly capable of keystroke recognition!), you inevitably use ISPs, ISPs inevitably use satellites, and data circles the Universe several times before you get feedback on the message you sent. This list above shows how many possibilities for data break-in there are.

Good luck with data security then. The only way is to dig up a well, hide a disconnected device there and make sure you don’t communicate with anyone. In that case you are pretty safe, but not safe from the well collapsing on itself.

Here is a funny example: at company X, a team of data scientists led by a team of business stakeholders wanted to start a project for predictive maintenance of the company’s appliances, which were spread all over the country in thousands of different locations.

The business case was that it would take a long time to do inspections and maintenance of each appliance, some locations were hard to reach and most of the inspections were not even necessary, so in the end this was a great case for saving money on inspections if they could predict the failures.

For this, getting the geo data and the previous inspection protocols together were essential and throwing in some Machine Learning at the data would give great cost savings to the business.

The InfoSec team, however, was pulling the plug on the project for X consecutive years. The Security concern was that a dataset like this would be very tasty for terrorist organizations, so it should not ever be worked with and not even thought about.

All it took to get the project green lighted was a data engineer to shake the status quo by pointing out to Info Sec that the information needed could be scraped from public sources like Google Earth API, Google street view etc.

So far, we have just scratched the surface of the old-fashioned data security concerns.

Who cares about CPUs, appliances locations and public data if the latest field of war is the AI? There is a whole new unexplored territory when it comes to securing DNNs and ML models from black box attacks. By black box attacks I mean that it is fairly possible to do an attack to a DNN as an external user and after showing it enough pictures of cats to convince it that the next cat is actually a dog.

Good luck securing that! The essence of DNNs is to constantly learn and improve, and this is both their strength and their weakness.

And it is not about cats and dogs. It is about DNNs being used in media, large-scale decision-making, law enforcement and all kinds of industrial applications fields.

I guess the data security concerns need to evolve too.

On a final note, if your security officer is concerned about data encryption in the cloud, just tell them that the answer is quantum computing.

All the cloud vendors have it, and even the most complex encryption key can be broken in hours by quantum computing.

If the security officer laughs, suggest to them to do a test: the security officer encrypts their hard disk and leaves the computer at your place. Ask them to trust you that you won’t try anything.

And watch them sweat. :)

Written on October 26, 2018