Gartner estimates that today over 80% of enterprise data is unstructured. That means that the majority of information held by organisations isn’t easily searchable nor held in secure sources. It’s a difficult area to manage given that not all data that is created or collected can be structured easily. For example, consider that close to 270 billion emails are created daily. Many of these hold personally identifiable or sensitive information.
On top of that, unstructured data is growing year-on-year. Every PDF, video, Word document or even social media post created adds to the problem. Typical modern working practices within high-pressure environments are also to blame. The mindset of creating a data-driven culture becomes problematic. So does the approach towards data security.
Case in point: Employees requiring data when out of the office – perhaps while they’re working from home or at a meeting – has the propensity to take data from a structured source and put it into an unstructured format. The data is then no longer protected or secured and cannot be easily accessed or managed.
Data engineer: The ‘real’ sexiest job of the 21st century
As the hype around data scientists fails to meet expectations, perhaps it’s time data engineer took the title as the “sexiest job of the 21st century”
What are the risks?
Owning masses of unstructured data, and not having a system that tracks when it is generated, makes organisations vulnerable to several risks.
- Visibility: Unstructured data is a challenge simply because organisations don’t know where much of it is held or what it contains. This is contrary to creating the much sought single source of truth. Not knowing when it is generated, or when data is copied from structured sources to unstructured formats, makes it difficult to know how big the issue is and whether new measures need to be introduced.
- Access: Personal data protection is an increasingly hot topic, particularly with the introduction of GDPR. If an organisation isn’t able to find data in unstructured sources, it risks not being able to provide somebody with the information they have requested within the compliant timeframe. Also, organisations that share data with partners might pass on unstructured data to them and no longer have access to it themselves. A legal issue could arise with a customer or partner that relates to terms agreed over email at the beginning of a relationship. With data held in unstructured sources, it will be difficult to search for and retrieve the information required as evidence.
- Security: Cybercriminals are finding it increasingly difficult to access corporate networks and data due to increasingly effective security solutions. For that reason, they’re looking for weaker routes, and unstructured data is a prime target. This is due to it not being protected by the same security measures as structured data. A prime example is when sensitive corporate information listing Salesforce’s acquisition targets was leaked from the email account of one of the software company’s board members, Colin Powell.
- Financial: Organisations that collect data without managing it in structured sources risk their digital estate expanding year-on-year. That means needing to pay for more space and technology to store the data, rather than solving the core problem. If an organisation extracts data into structured sources, it can keep only the data that it requires and be efficient in its use of storage capacity.
Minimising the risks
There are a number of steps organisations can take to improve their approach to unstructured data and mitigate risks.
- Improving visibility: An organisation should establish if its current IT infrastructure provides constant visibility to track when unstructured data is generated. If the IT team is notified when this happens, it can establish what the data is and whether it needs to be structured. Equally, if an alert is set when an employee copies structured data into an unstructured source, IT can investigate and take action.
- Scenario planning: Organisations often need to retrieve data quickly. So they should do trial runs and establish if key information is easily accessible in structured formats. For example, see if they can access email conversations if needed as legal evidence. If conversations like this aren’t preserved in structured formats, the IT team could introduce technology to do this.
- Restricting employees’ access to data: Do employees only have access to the data that they need to perform their job? If not, the IT team should put systems, policies and processes in place to make sure that data which doesn’t concern them is restricted. This removes the risk of them putting that data in unstructured sources.
Realising unstructured data’s benefits
Despite its risks, nobody should fear unstructured data and view it just as a liability. It can often be used to drive success for organisations. Take call recordings, which can be used as a training tool for new employees or to understand customer sentiment better. Customer calls can give insight into satisfaction rates, driving targeted service improvements.
How augmented analytics tools will impact the enterprise
Augmented analytics tools are coming, but how will they impact enterprises? Could they bring about citizen data scientists and plug the data skills gap?
Overcoming fear of the unknown
The biggest challenge with unstructured data is simply not knowing how much of it exists, where it lies or what it concerns. Organisations should identify the gaps that their current IT systems have in their data management to mitigate the risks it poses. Tools are available that make it easier to structure data efficiently and search for it when it’s needed. These should be considered by any organisation with a significant amount of data to manage.
Written by Kevin Widdop, commercial digital transformation consultant at Crown Records Management
Automating data science and machine learning for business insights
The huge explosion of data is causing a headache for organisations: how can we exploit it and gain invaluable business insights?! It all comes down to machine learning and data science. Read here