Data classification tools help deal with ‘dark data’ growth

27th November 2015 By: Schalk Burger - Creamer Media Senior Deputy Editor

Data classification tools help deal with ‘dark data’ growth

NICK CHRISTODOULOU As about 58% of data stored by organisations is dark, they must identify this dark data to expose risks and valuable information

About 58% of unstructured data stored by companies is dark data, which means that the value or regulatory importance of the data has not been determined. Subsequently, most of the stored data add costs, rather than increasing revenue or reducing regulatory risks, says digital information company Veritas Technologies South Africa country manager Nick Christodoulou.

Therefore, data classification tools are required to identify and analyse the data, where it originated from and who the creator is and then map this to other information flows or processes in the company. This information is used to determine how long the data will be relevant, while policies can be built to manage specific datasets for use in various business processes, store them for regulatory purposes or delete them.

Christodoulou proposes that organisations identify their dark data to expose risks and pick out valuable information. With the average cost of data storage being about R9-million a year per petabyte, companies must use this information to eliminate redundant, obsolete or trivial data proactively.

“The ‘Veritas Databerg Report 2015’ showed that 32% of unstructured data held by companies is redundant, obsolete or trivial, which adds risks and costs. As 58% of the data is dark, only 10% of data held by companies has been classified and is useful,” he says.

Unstructured data is being generated in increasing amounts, owing to the use of multiple devices by employees and customers, as well as the growing digital economy. For example, about one in three employees treat the company network as their own and often upload pictures, documents and messages that hold no relevance for the company, Christodoulou explains.

“The chief officers in a company must define a workable information governance strategy for unstructured data. The maturity of information governance in a company influences the detail of the strategy, but an effective way to start is to set rules and policies for data created by various departments (including how long the information will be relevant), for files with specific naming conventions or for specific file types.”

Further, companies will then find more business-critical data within the significant volumes of stored dark data.

“Once companies have established the policies and rules to govern information, there are many tools that can make their management and administration easier, while the improved visibility of the data improves the operation of the business and reduces costs,” says Christodoulou.

Being able to classify the data enables companies to automate much of the processes, such as archiving regulatory data, sending important information to the relevant department and deleting redundant, obsolete or trivial data.

“Even if unstructured data management is not a problem yet, organisations must implement data governance for regulatory or legal purposes as more business and regulatory processes become digital. Best-practice information governance improves business operations and provides visibility of data to enable companies to manage their processes more effectively,” he concludes.