How to build a cloud data warehouse for the first time

It may sound daunting going into it for the first time, but the cloud data warehouse can be a viable option for businesses that want company data to be sorted into separate categories, and the cloud generally allows for agility and scalability.

A recent study by Denodo found that 56% of organisations deploy data warehouse technology in the cloud, with frequently identified benefits including efficient workload management and vendor lock-in capabilities.

However, it can be difficult to know where to start when venturing into this kind of tech for the first time, and mistakes can be costly in regards to time and money. So what should companies do in order to minimise risk and ensure incoming reward?

Skill up the workforce

One port of call that needs to be worked towards is making sure that your workforce’s skills sets are prepared for the transition.

A major aspect of data warehouse technology in the cloud is its prominent offering of big data, and as beneficial as it can be for monitoring customer behaviour, its potential can not be reached if employees aren’t skilled enough to know how to leverage it properly.

“Establishing data warehousing on a global scale, with low latency and massive computing power, is no longer out of the standard business’ reach,” said Thomas LaRock, head geek at SolarWinds. “What once cost millions of dollars to implement can be done for a few hundred dollars and some PowerShell scripts.

“Cloud providers like Microsoft Azure and AWS can easily be leveraged to allocate hardware resources for our data analytics needs. Dealing with big data, however, requires serious upskilling —there’s no doubt about that.

“But these new skills will broaden the horizons of individual IT pros. Having a sound understanding of data handling, combined with traditional network engineering, will ultimately boost the career of IT pros and so should be viewed as a necessary investment.”

Establish sufficient data governance

LaRock continued by stressing the importance of making sure that company data is managed in a way that doesn’t produce useless duplicate data or siloed data.

Break down data silos and put data into the hands of the many

Breaking down data silos and putting data in the hands of the many is much more powerful and efficient than data in the hands of the few, according to Joanne Taylor, director of Digital Strategy, Software AG. Read here

“The most common pitfall when implementing data warehousing is curating, collecting, and aggregating multiple copies of the same data,” he said. “Businesses typically have a lot of data silos which, if they form a part of a data warehouse, create redundancy.

“If you’re going to start implementing data warehouses, you must consider establishing a proper data governance strategy.

“With such a strategy in place, silos will be identified ahead of the data warehouse being implemented.”

Start small

When endeavouring to establish a cloud data warehouse for the first time, it may be best to minimise any risk when mistakes happen by keeping ambitions low to start of with.

“It’s going to be a process of starting small, get some experience and value in a small project, and learn from that,” said Craig Stewart, CTO of SnapLogic. “Get that experience from that first project, and then you can incrementally gain additional value.

The data journey: It’s only the beginning for digital transformation — Big Data LDN

In one of the opening keynotes from Big Data LDN, Doug Cutting — Chief Architect at Cloudera — discussed the importance of the data journey in the pursuit of digital transformation. Read here

“The great thing about the cloud is that you can grow the elasticity that you can get from the likes of RedShift and Azure Synapse, which do give you the ability to do that.

“Starting small means that in the event of the project you’re attempting failing, you can learn from it and then move onto the next step without having incurred a huge cost in terms of either the financial resources for spinning stuff up, or even the human cost of making these things work.

“Use a tool that’s a no code-type tool, with a self-service approach. That combination means you can look to get value quickly and learn what you do isn’t giving you value, then you can move on quickly as well without having had a very costly failure, and the failure is, in itself, a learning process to get to the value.”

Planning the new architecture is key

As well as starting small, it’s vital that companies practice patience by carefully planning out their cloud data warehouse architecture.

Rob Mellor, VP and GM EMEA at WhereScape, said: “Be aware of some of the myths you’ll hear along your research journey. You can’t just throw all your data in the cloud and start analysing it with no design or architecture needed.

“An analytics environment is planned and architected so that all users can understand and use it.

“Neither can you just forklift all your data warehouse into the cloud with no need to redesign it. Your old data warehouse will have grown some barnacles along the way that will need cleaning up.

“But this is a good time to blow the dust off, remove inefficient processes, wasted space from unused assets such as old reports, visualisations and analyses no longer used. This is also a perfect opportunity to automate many processes to make them far more efficient.”

Make use of existing models

A decision to move a data warehouse to the cloud for the first time may not require a complete ‘out with the old, in with the new’ job, and there could be lessons to be learned from existing architectures that need improvement.

How to make containers an IT architect’s best friend

Anil Kumar, director of product management at Couchbase, explains how to make containers — the DevOps driver — an architects best friend. Read here

This can, among having other benefits, aid companies to address the aforementioned need for sufficient data governance.

Helena Schwenk, market intelligence manager at Exasol, said: “Migration should be viewed as an opportunity to rationalise and revise existing on-premises data warehouses.

“Identify what data assets and sources can be revised, augmented or added, and pursue an incremental migration strategy towards achieving a cohesive cloud data warehouse platform that includes proper governance and oversight.”

Ensure possible evolution

Schwenk went on to state the importance of knowing how to use other kinds of data that are beyond those that are used frequently within the company.

Big data in the cloud, especially the public cloud, could benefit from the incorporation of outside influence.

“Investigate how public cloud can support new data workloads or business use cases,” she said. “For example, consider supporting advanced analytics and data science within your cloud data warehouse by using its scale and elasticity to make more data available and accessible for analysis.

“Newer cloud native data sources, such as social media data and data from sensors, can be hugely beneficial in providing a deeper, more insightful understanding of business.”

The impact of social on business

Business applications and possible commercial opportunities arising from social should always be rooted in consumer insight – suggests Ellie Gauci at PSONA. Read here

Consider serverless tech

Justyn Goodenough, international area VP at Unravel Data, suggested thinking about using serverless technologies.

“Serverless relational databases are a common choice for business intelligence applications and for publishing data for other systems to consume,” he said. “They provide scale, performance and, most of all, SQL-based access to the prepared data.

“Vendor examples include AWS Redshift, Google BigQuery, and Azure SQL Data Warehouse. These work great for moderately-sized and relatively simple data structures.

“For higher performance and complex relational data models, massively parallel processing (MPP) databases store large volumes of data in-memory and can be blazing fast, but often at a steep price.”

Research and seek expertise

Lastly, two tips that should be considered when starting to leverage cloud data warehouse technology for the first time may apply to any new venture in business, or even life for that matter.

“It is important to understand exactly what you are looking for as different platforms have different benefits for types of data, analysis and processing,” said John Lyons, GM, cloud and hosting at Zen Internet. “For example, a company might find that a multi-cloud service is a better fit – don’t assume that because your business has one cloud service from a particular provider that they will also be the best provider for your other cloud needs as well.

“Finally, as well as doing their own research, companies should also engage with experts that have the frameworks and experience in this area. This will help to minimise any risk or challenges in adopting a cloud data warehouse and ensure the company is in the best place to take advantage of the benefits it can bring.”

Avatar photo

Aaron Hurst

Aaron Hurst is Information Age's senior reporter, providing news and features around the hottest trends across the tech industry.