Beneath big data: building the unbreakable
While CIOs work to identify the business value that will make big data a worthy investment, they also face the daunting task of building suitable, and unbreakable, infrastructure to support it
Short of time?
Businesses hear plenty about the various methods of extracting value from data, but much less understood is the best infrastructure to house and manage big data for the next ten years.
Big data, as a technology, is a simple premise to understand: transforming data, of which there is lots, into valuable information. However, while many organisations are struggling to really master the value part, even more are behind in deploying next-generation infrastructure systems to support it.
The pace of innovation in the industry doesn’t help, either. It seems like every week that vendors are pitching the latest game-changing technology to generate insight from proliferating data sets. But before leaping into those solutions, CIOs must take the time to consider from an IT perspective how to store, manage and make them useful and secure.
>See also: Is big data dead? The rise of smart data
Meanwhile, there is an abundance, and growing number, of factors that are hampering the ability of legacy systems to cope with big data.
Whether it’s the flexibility to handle and stream both structured and unstructured data in real time, or scalability to add resources on demand to cope with transactions, being a data-savvy organisation requires a data-savvy infrastructure.
A good example was the so-called Black Friday and Cyber Monday spikes of retail demand, when many well-known brands simply couldn’t cope with the increased traffic.
‘On such demands, web servers were crashing and websites were down,’ says Dr Mohammed Haji, solutions architect at Couchbase. ‘This caused a lot of customer dissatisfaction, with customers moving to websites that were unable to cope with the demand.
‘Big companies saw a massive wake-up call to embrace big data and the new technologies that they so reluctantly did not embrace. It shook the UK retail industries to the core with major bad publicity.’
End of an era
The relational database management system, along with data warehousing, has reigned supreme for a long time as the ubiquitous platform for enterprise data, but the big data trend has sought to challenge that as NoSQL systems are able to handle far greater quantities of data.
Distributed file system Hadoop and parallel processing framework MapReduce have made massive inroads as CIOs recognise the scope for processing large data sets. This anti-SQL army is growing stronger and consequently backing the rise of cloud-based purpose-built NoSQL databases.
To stay ahead of the game, storage development teams are also adopting the new buzz term ‘lambda architecture’ (LA), catering for high-speed, low-latency and robust fault tolerance systems.
‘This has a high-throughput caching layer to process and allow reactive and predictive processing and a back-end MapReduce data fabric like Hadoop,’ says Haji. ‘With a high volume and a colossal amount of data, the industry is seeing a massive surge in the adoption of open-source products, especially NoSQL products, and the move away from blue-chip dinosaurs due to their inability to be reactive and agile.’
While overhauling storage infrastructure for the big data age does require significant investment, many of the technologies involved these days are both open source and cloud compatible, making setting up trial systems much simpler.
Companies can get up and running much more quickly and easily today, says DataStax chief customer officer Matt Pfeil, who cites seeing public companies go from the learning phase to production in as little as three months. ‘As a result, they’ve experienced huge cost savings with the combination of commodity hardware and open source,’ he says. ‘Even more importantly, the business advantages are gigantic – bolstered by better products and experiences.’
CIOs may well know and understand the business advantages of these technologies, but the tricky task comes with measuring the return on a hefty capex investment. ROI in this instance essentially comes from two things: saving money in the long run and increasing the scope for extra revenue with the opportunities that big data technologies create.
Using big data for personalisation programmes around e-commerce and retail is an obvious one for increasing spend, but applying these approaches is also possible in other sectors too.
>See also: Big data and mapping – a potent combination
This can be seen in things like being more predictive in designating time to sales leads, and generally making each transaction more insight driven.
‘Using data can help customers see more value, which then leads to more profitability or greater sales,’ says Pfeil, who also suggests looking at where traditional relational technologies are being used within the organisation.
‘Once you start going down the route of NoSQL for new projects, it’s possible to see other areas within the organisation where moving over to NoSQL can help to reduce costs. Open-source platforms can offer much greater returns compared with proprietary systems.’
Preparing for big
But with innovations like NoSQL, in-memory and Hadoop being positioned as necessary technologies for big data, how can CIOs know what is the right fit for their particular environment?
When asked to find the technologies needed to work with big data, CIOs are being tasked to enable their businesses to move faster without infrastructure downtime.
This includes accelerating application performance and response times and providing high availability to the business – all while often being budget constrained.
‘When deciding on the best solution for their particular environment, CIOs need to ensure that they are using a solution that provides the best value while remaining non-disruptive to the day-to-day running of the organisation,’ says Paul Harrison, storage director at Dell UK.
They also need to be very careful when considering the characteristics of their organisations’ data and the demands placed upon it, as well as how long data needs to be kept for risk and compliance purposes.
Security, after all, is both a big data problem and opportunity. A lot of the data that organisations throw away to reduce storage costs may contain insights that could help identify a cyber attack.
The log files that show network behaviour, firewall records and proxy server data can be correlated to show suspicious internal behaviour that might be an insider threat.
‘If this data has been discarded then it can’t be used to find attacks,’ says Matt Davies, EMEA marketing manager at Splunk. ‘Always think of the security angle when deciding how much data you are going to store, and hence the kind of storage you need to deploy.’ Harrison adds, ‘A lack of readily available information and support for security is preventing companies from deploying these technologies for fear of a security breach that may affect this data.’
Another factor that organisations often overlook is the importance of considering the value they hope to get from the data, rather than just focusing on the technology.
It’s all well and good being able to store big data, but the complexity of deploying those systems frequently sees organisations forget to acquire the skills to actually use the data once they’re ready for it.
It’s also important for organisations to keep an open mind about cloud and consider what the right options are for on-premise and whether they suit each particular organisation.
‘Organisations sometimes fail to be imaginative and ambitious,’ says Davies. ‘What can they do if they combine their customer, product and e-commerce data? What can be found out? How can this change the way a company operates?
‘They shouldn’t be afraid of this, but approach it with some creativity, imagination and ambition.’
Other enterprises make the increasingly fatal mistake of allowing the IT department to completely drive the technology change, with no input from the business, and some take on too many challenging objectives.
In many cases, it is possible to develop an initial view or hypothesis of what can be achieved with data and then review it with the business before committing to more significant investment.
>See also: Capitalising on the power of big data
‘Another error to avoid is to assume that having new data-driven insights will automatically result in business performance improvements,’ says Andy Shepherd, principal architect at Fujitsu UK and Ireland. ‘In reality, the envisaged transformation will only occur if the business is committed to applying the insights and using them to make changes to the way it operates.’ This can be a challenge if the project does not enjoy sufficient stakeholder buy-in or sponsorship from board level to drive through the transformational change.
That executive support is vital in budgeting the initial costs that deploying new storage infrastructure warrants – a significant barrier in engraining an understanding of the value of data within corporate culture. It is down to the CIO to champion that culture from the top down, otherwise the company will be subjected to more years of infrastructure unsuitable for the big data age.
‘Fundamentally, that storage will not scale or scale in a cost-effective manner, may not be sufficiently resilient to accommodate very large data sets, and cannot be adequately monitored and managed to reliably underpin the overlying big data infrastructure and applications,’ adds Shepherd.
What the experts say
'Organisations that don’t look into evolving their storage for big data will run into a wall. The wall will be CPU, storage or network infrastructure. When taking older architecture and then expanding this beyond its meaningful scope, the system starts to break down. All systems have a scope to which they can scale; going beyond it begins to show weak points in design and architecture.'
- Matt Starr, CTO, Spectra Logic
'The key challenge with big data is that the cost of manipulating and conducting complex analysis on the volume of data collected can be prohibitive. It can also be cripplingly slow - if you can only ask a question once a week, you might be missing out on business-critical insight in the intervening days. If you're stuck with spinning hard disks, be prepared to spend significantly to attempt to make it move faster.'
- Alex McMullan, Field CTO, Pure Storage
'Opt for progress over perfection. Getting something in place is the most important. Design data management policies in good faith and implement them consistently across the business. What you decide will touch everyone in the business so it is vital to get all employees on board.'
- Phil Greenwood, commercial director, Iron Mountain
'Legacy infrastructure was designed in scale-up architecture, and has many shortcomings in terms of capacity, concurrency and data mining. As such, scale-out architecture is the answer to the data growth challenge. As well as this, legacy infrastructure is not designed for variety data and data intensive compute.'
- Yanhua Xiao, big data solution CMO, Huawei
'Organisations that are able to merge IT infrastructures with big data infrastructures tend to be most successful. This leverages the same people, infrastructure and skill sets across multiple use cases. But it also requires IT to embrace a different type of technology than they may have used in the past. It also requires sharing of control and budgets, which is not always trivial in a large organisation.'
- Molly Rector, CMO, DDN
'The biggest challenge is trying to predict the future; knowing what your data will look like in three or five years’ time so you can put in an infrastructure that is flexible enough to be able to grow and accommodate changes. The amount of data that all businesses are generating themselves and storing on their clients is growing all the time. Being able to know what data you want to keep, what you want accessible now and what you can file away is going to become increasingly difficult.'
- David Barker, technical director, 4D-DC
'While data quality is becoming an increasing concern for big data, statistics show that many organisations have yet to implement any form of strategy. Whereas some believe that the sheer volume of data available through the Hadoop architecture makes data flaws less impactful on the whole, when viewed statistically data quality is in fact more critical than ever.'
- Ed Wrazen, VP product management, Trillium Software
'The issue is not only data growth. Users have always had a lot of data, much of which they might not have accessed. Now they are being forced to mine it for competitive insights. In terms of data growth, data is now of a granularity that was not dreamt of five years ago. IoT devices are a good example. Legacy infrastructure is struggling because the systems were not set up for dynamic, near real-time or real-time analytics and visualisation.'
- Simon Garland, chief strategist, Kx Systems