A reality check on how to get big data out of the lab and into production

‘While there are many big data success stories, big data still has not really broken out of the data scientist backroom and into the enterprise data centre’

 A reality check on how to get big data out of the lab and into production


Enterprise interest in big data is big. A recent Gartner survey found that 75% of enterprises are either investing in or planning to invest in big data initiatives over the next two years.

These enterprises believe that big data can deliver them insights that they can use to streamline their business processes, target the right customers with the right advertising and protect themselves from the security breaches.

Yet, while there are many big data success stories – in human genome research, the restaurant industry and in the healthcare sector – big data still has not really broken out of the data scientist backroom and into the enterprise data centre with mainstream big data applications.

Part of the reason for this is that big data technology continues to mature. Yet it is also the case that, even as big data technology “grows-up”, economically it is still hard to build a business case that justifies investment in the massive amount of expensive storage area network (SAN) or network attached storage (NAS) legacy storage infrastructure needed to support mainstream big data applications.

Turning to non-traditional, storage infrastructure causes its own problems because this non-traditional infrastructure lacks the data management, governance and protection mechanisms needed for mainstream business applications.

>See also: How big is big data – and what can I do with it?

These two economic and technical challenges – high traditional infrastructure costs and non-traditional infrastructure’s weak data management capabilities – are preventing big data from truly realising its full potential.

If big data is to truly become big in the enterprise, enterprises will need to use a data-centric, distributed architecture for their big data applications. Only by using this type of architecture will they be able to secure the economics, agility, performance and reliability they need to support mainstream big data applications.

When enterprises first approached big data, they tried to use their existing legacy infrastructure architectures to support it. What they soon found was that big data had big storage requirements, and that using legacy storage infrastructure architectures’ expensive SAN, NAS and other technologies for these applications would be cost-prohibitive.

The data scientists in the backroom therefore turned to non-traditional storage infrastructure architectures – usually architectures that use direct attached storage (DAS) on commodity hardware with some simple “storage code” written into the big data database or data service.

This non-traditional architecture is much more economical than legacy infrastructure architectures because all enterprises need to purchase is the server – and for smaller, non-mainstream big data applications, this approach can work.

But when enterprises then try to move big data from the backroom to mainstream production environments, the data management drawbacks associated with this architecture rear their ugly heads.

For example, DAS storage infrastructure architecture lacks the data management, governance and protection mechanisms need to make it secure, available and reliable. As they move big data from the backroom to production, enterprises soon find themselves dealing with wide variety of data types and data services, with practically every application a “composite” of other applications.

DAS storage infrastructure architecture lacks the flexibility and control to deal with all these different data types and data services. Moreover, when enterprises’ employees and customers begin actually using the applications on an ongoing basis, enterprises soon find that they need management tools to ensure that data is backed up, secure and only accessed by the right people. And with a DAS storage infrastructure architecture, doing this is time-consuming and difficult, if not impossible.

As a result, enterprises soon discover that their big data applications are unable to reliably deliver the performance and availability needed for broad deployment to employees, partners and customers.

In addition, while DAS storage architecture might support smaller-scale projects, it does not have the data management capabilities needed for a production level DevOps model.

DevOps teams need to be able to build, test, validate and release new code on a continuous basis, which means that they need to be able to use real-time production data in their test environments.

But a DAS storage infrastructure architecture cannot efficiently create snapshots of production data, nor is it easy for it to clone this production data for test environments. Infrastructure costs climb as enterprises find that they need to not only create separate infrastructure for their test environments, but also buy or build tools to continuously migrate their data.

Non-traditional architecture does not just lack the data management tools needed to deliver big data applications to the entire enterprise – it also lacks the tools needed by DevOps teams if new big data applications are going to be developed, updated or refined on an ongoing basis.

Given the failure of both traditional legacy storage infrastructure architecture and non-traditional DAS storage infrastructure architecture to support mainstream big data applications, is it possible for enterprises to truly realise the promise of big data?

It is. Data-centric, software-defined storage infrastructure (the same type of architecture used by AWS and other major public cloud service providers) offers enterprises a cost-effective way to deploy big data applications with the economics, agility, performance and reliability needed for mainstream applications.

>See also: Top 8 trends for big data in 2016

A software-defined infrastructure enables enterprises to use commodity storage hardware, and avoid the high-costs of legacy storage infrastructure technologies. Moreover, by adding a virtual data layer between the data users (databases, applications and data services) and the data storage (flash, disk, and other storage media), enterprises have a unified, transparent data management system for both controlling performance and for provisioning, managing, replicating and sharing data.

This architecture also enables enterprise to easily add new databases and other data services as these big data applications grow. A data-centric, software-defined architecture provides enterprises with an agile, simple and cost-effective way to deploy mainstream big data applications to thousands, if not millions, of users without sacrificing security, control or flexibility.

If the 75% of enterprises working on big data initiatives continue to try to use traditional legacy storage infrastructure or non-traditional DAS storage infrastructure architectures for mainstream big data applications, they will soon find themselves trying to fit a square peg in a round hole, and unable to deploy mainstream big data applications that are economical, secure and scalable.

However, if they instead employ a data-centric, software-defined storage architecture for their mainstream big data applications, they can move these applications out of the backroom and into the data centre, and enable big data to truly deliver on its promise.


Sourced from Mark Lewis, CEO, Formation Data Systems

Comments (0)