Cyber Management Alliance, executive interview with Steve Abbott, CEO at DocAuthority. Steve answered our questions with regards to data protection, unstructured data, data identification and classification, DLP and much more. You can watch Steve answer our questions in the video below.
Why is data protection so difficult?
Data protection is very hard. I’ve been involved in four companies, three of which were in the encryption business, and it’s very hard to identify what should be encrypted; and even harder to understand who should be able to encrypt it. Why? It’s because, in general, people don’t understand the value of the data, and therefore they don’t know who should be able to access it, who should be able to share it with whom, and so on. So, it’s very difficult to manage what you don’t understand and to automate that procedure, particularly across enterprises and through the cloud. It’s very though to do that and I don’t think, necessarily, that anybody has a perfect solution for this.
We’re trying to help to identify the value of the data, such that we could streamline this process; from trying to protect and control 100% of your data, down to the Crown Jewels that you really need to protect. You need to identify it first before you can protect it.
What’s wrong with DLP the way it’s done today?
We were the creators of the first DLP system back in 2001, and Ariel Peled, our founder of DocAuthority, was also co-founder of PortAuthority, which is where our name comes from; so, it goes back to 2001.
We built the DLP service to be able to identify information as it was leaving the network, and we utilised the regular expressions or common words, keywords, in order to identify what was is important, and what isn’t. Unfortunately, that isn’t necessarily enough today. You need to identify the function of the data, the value of the data; and so, it’s not necessarily just a keyword or two that should be the flag, it should be understanding the business value to you; and if received or given to the wrong hands, or pair of hands, you need to make sure that you know what is being distributed, to where and put controls on it.
DLP today has lots of problems because of lots of false positives. The flagging information isn’t necessarily important, and making it unnecessarily difficult for users to collaborate. So, there’s an identification problem of what the data is; they’re looking for words, not the business function itself or the context, and I think that’s the missing element with DLP.
Why should organisations focus on unstructured data?
Unstructured data represents about 80% of the data across an enterprise, which I think would surprise a number of people. The budget dollars to manage and control data typically are being put into the central database technologies that people utilise, but unfortunately the spread of unstructured data within business files has been doubling about every twelve to eighteen months. So, there is heaps of information inside an enterprise, typically very old, that nobody understands what it is, what the purpose of that information is, who can actually access it today, and so therefore it is very difficult to manage. So, people have been avoiding managing it because it hasn’t been achievable, until DocAuthority.
How does one value unstructured data in an organisation?
Today, most companies don’t know what the data is, where it is, who can see it, let alone its value; and so, the purpose of DocAuthority is to identify and group business functional data together and apply a business value in dollars, as a currency, for you to then manage based on its value. That’s been very difficult to date because most people haven’t agreed on the value of a particular record by its function. Now we have the ability to do that in a standard, systemised and we can make it much easier for CISOs to manage by value versus to manage by keyword, which has not been working.
Why is focusing on unstructured data important in privacy regulations?
So, with the recent GDPR coming into law in May 2018, it now is law for you to be able to identify what PII, or private information, that you have on individuals, and they also have a time bound requirement – about 30 days to respond to consumers asking you to delete everything you have on them. So, there’s a category of the law that says there’s a ‘right to be forgotten’; so, individuals can request companies that have got their data to delete it. So, they have to find it first, and then they have to do what we call ‘defendable disposal’, which basically means you’ve understood what the data is, and you’ve understood that it can be deleted, and it’s not going to break another regulation, for instance, how long it has to be retained for in country X, which may be an opposing force to the privacy laws.
So, it’s very difficult for CISOs to make big decisions because they have 30 days to do it, and they may get it wrong. So, the biggest thing you have to do is understand what that function of the business data is in order to be very comfortable when you’re deleting it for the right purposes, not necessarily just because it’s PII.
What is data management?
So, data management is the availability and quality of managing the data so that the right users are served the right information, at the right time; and right now, users have a very difficult task finding information in the enterprise, understanding whether that version is the correct and up-to-date version that they are utilizing, and also to be able to share it with the folks that should be able to access that information. I think those three points are very difficult to get right.
So, data management is a massive, massive problem and unstructured data is 80% of it, and I don’t believe that anybody has really tackled the problem of making sure the right user is accessing the right information, at the right time.
How do you convince the board to focus on data management, rather than data protection?
So, data management is focusing on making the data valuable for your business users. Data protection may not be the objective to make your users productive; it may be just to keep it away from everybody, which is a little bit of a dilemma for a CISO – what do you say can be accessed, and what do you say cannot? Unless you understand the business function of the data, you won’t get it right.
So, that’s what we’re focusing on here is data management, to make your business users more productive, not just necessarily to be able to answer audit questions.
Top things to do: data identification and classification
Historically, nobody’s really been able to identify data and its value. We’ve asked users to label documents for its importance, typically a classification level. That classification level should stay with the document throughout its lifecycle. Unfortunately, users don’t really care about the labelling so they don’t generally do it. So, we have a classification system that’s being deployed to the enterprise but nobody’s really enforcing it. It’s time to automate the process and to do it through the grouping of documents by its function versus asking each user to classify each document, each time. That hasn’t worked; I don’t believe it will work in the future. We need to utilise AI to be able to do it on behalf of the organisation, not necessarily just the users.
So, it’s very important to get the classification right because its lifecycle is managed through its classification label – how can it be utilised, whether it should be encrypted, where it should be distributed, and so on. This comes down to classification and at this point, it’s not automated enough and it’s not systematic, and I think that’s been the missing elements of particularly some classification tools; some others have done fairly well but I think there’s a growing need for connecting discovery and classification as one process, not necessarily as two.
What is the impact of cloud services versus on-premise management of unstructured data?
So, right now it’s been very difficult for most enterprises to manage their unstructured data. Now it’s going to get very difficult because the information is going to not just one cloud service, but a number of cloud services, and the cloud service providers are not necessarily the owner of the data; you are. So, you’re asking a third party to manage of your most valuable assets through a service level agreement that may not necessarily be focused on the value of the data, but making the data almost a commodity in terms of a service, and how that is offered up to your users. You don’t really know where it’s going; you don’t really know how it’s got there; who can see it; and so on.
I think while the data is stored in a cloud service, it’s probably quite safe. However, it’s designed just to be stored, it’s to be utilised and that information therefore is going to be accessed from any number of devices, and Y number of locations. So, what we really got to do is focus on the value of the data to protect the information that is the highest value to the business in terms of commercial value; and not necessarily treat users differently but treat the data differently, depending on its importance and its commercial value.
Why should I use DocAuthority?
This problem of managing unstructured data has been around since we first had Microsoft Office documents as a standard; so, that’s been over twenty years and in that period of time, nobody really has solved the riddle of understanding what the value of the data is, and what the commercial context is of these documents.
Typically, keywords or regular expressions they’ve utilised to try to understand what’s inside the document, it doesn’t necessarily read what the content means in terms of its business function; and that really is the difference between us and the rest is that we don’t rely on regular expressions, which have a very high false sense of security, because we really don’t know what the data is. So, we maybe securing it correctly or not, and up until today, because we use focused AI, we really do get better and better over time based on feedback from the data; what is the data telling us?
So, we group the information into categories that can be managed centrally. So, you make one set of decisions based on the category; you don’t make independent decisions for every file, every time a user accesses the information.
So, we are very accurate; it’s highly automated; and it deploys centrally. These are all the things that customers ask for when we did our homework a few years ago.
How do you apply business value to a business document?
So, a lot of people have asked us at DocAuthority, how do you put value on the data itself? The first thing we do is scan your network and put the document types into independent groups by its functionality. That then allows us to put one set of controls on the information inside that group as a single, systematic approach to either classifying it as X, making sure it has to be retained for Y number of years, and also who can have access to it in terms of access control, and so on.
If you manage by groups – let’s say, for instance, you have 50,000 documents in a group; that is one decision for 50,000 documents, whereas in the past you had to make 50,000 decisions for each document. So, I think that’s the basic magnitudes of the solution that we have; it’s making it, breaking it down to a very manageable group-by-group function, not managing it by independent files.
How can DocAuthority be used in organisations?
There are three main use cases that people buy DocAuthority for; it is either defendable disposal, being able to delete what isn’t of use anymore, perhaps high risk, highly toxic, so we can identify what should not be available to the users which should be deleted; it’s also what should be migrated to the applicable cloud service, so the right type of functional data is going to the right type of functional cloud application; and also, what should be highlighted to be protected, for how long, for whom, and so on.
They are the three main uses cases; it’s about making sure that you delete what can be deleted, and should be deleted; it’s about making sure you apply the right rules to what data should go to which cloud service; and also, what should be protected because certainly, 100% of your data is probably impossible to protect at all times. But the top Crown Jewel 5% or 6%, which is typically what we found on customer networks, is really important to manage. You have to manage it by its sensitivity, by its value, and also what would happen if it were to be lost or stolen, and how much it would cost to be reproduced. These are all the factors that we can help customers identify when we group the functional data together so you can manage it centrally.
How does one produce a business case for data management?
Imagine you are able to identify the data by its value, and then be able to manage it by its value. If you were able to also identify your data assets on your network and in the cloud, perhaps you’d be able to manage it better. But also you need to have more spend, the budget to be able to manage the data, that will enable you to understand the information’s value, as such that if you had assets of $100+ million, you can easily justify spending $3-4 million on tools that will manage data and make it available to the right users, for the right reasons.
But if you don’t have an idea as to what value the data is, and how important it is to your business, it’s very difficult to justify spend on tools that are not necessarily going to help you manage it better.
So what we can do is identify the value in terms of its data assets, and we can identify where it is, who can see it, and whether it should be controlled or protected. We can do that automatically and I think that really helps CISOs and other IT managers to be able to justify funds to spend on unstructured data which fundamentally has been missing for some time. Now the tools are available to really understand the value to you so you can spend the right amount, for the right amount of management.
What’s your vision for DocAuthority?
So, today we’re very focused on helping customers to be data compliant with various regulations around the world; some are very difficult to implement.
Tomorrow, what we’re looking to do is to focus on availability of data and making sure that the right version, with the right business context and function, is served up to your users, somewhat predictably. We hope to be able to do that by first identifying the data, and then being able to apply the rules on that data centrally. But our goal here is make your users more productive; not necessarily just being able to identify the data’s importance, but to make sure that the right role of the user is aligned to the right role of the data every time.
Today, users can’t access the information they need because, perhaps, controls have been put in place. Data is currently managed by its location, not by its value. We have to turn that upside-down to make sure that the right availability for the right functional data is applied to the right users; and being able to understand that data by its function is the first step to be able to make your users more productive, and that’s our focus long term.
For more information on Cyber Management Alliance, their GCHQ Certified CIPR training and other courses, webinars, Wisdom of Crowds live and virtual events, and their Insights with Cyber Leaders series of executive interviews, contact us today.