We’re a few months on now from the announcement of Intel’s $740 million investment in Cloudera. How is this integration progressing?
Better and faster than we had expected. I was very excited putting the partnership together. As it unfolds, and we start collaborating more, this is going to be a game changer – not just for Cloudera, but I think the industry. Intel is working on a chip that’s going to ship in five years from now. They’re sharing those designs with us and we’re collaborating with them on how we can write and take advantage of instructions in the chip to actually make them perform better for analytic workloads. So if a customer is going to build a scale-out grid, and they were planning to have a thousand nodes driving it, the work with Intel might say they can do it with 600, which is significant cost savings in the long run. That’s huge. That’s a five-year roadmap.
But in the near term, we’re doing things around security that have immediate value. With security, it’s all about protecting all the sensitive data that’s landing in Hadoop. The best way to protect it from hackers is to encrypt it. Once it’s encrypted, it’s secure. Encrypting software is a very expensive thing to do. With Intel, we’re encrypting in the chip, and in software you just have the key management system that allows you to encrypt and decrypt really fast. So we believe our customers will now put more data into Hadoop because they can easily encrypt it, and therefore it’s naturally more secure.
You’ve worked a lot with MongoDB over the years, but you recently announced a fully fledged strategic partnership. How is that different to how you’ve been collaborating before?
Before we were parallel companies going after big data with slightly different value propositions. Our customers had been working with both of us. Now they have matured such projects enough where they want to integrate them together, so it’s much more of an end-to-end. So what we say with MongoDB is let’s do tighter engineering and integrate our offerings better so our customers get a faster time-to-value from a joint solution.
Hadoop is gradually maturing in the market. Where is adoption up to in the UK?
A lot of innovation first happens in the States. Then you look at Europe and the UK as traditionally 12 to 24 months behind. In this cycle, it’s happening a lot faster. I think the adoption here in the UK is catching up, if not getting almost on par, with what we’re seeing some of the more advanced folks do in the US. Put aside the web properties in Silicon Valley which are a few years ahead, if you look at banks, insurance companies and retailers, I’m seeing adoption here catching up very fast. I think where we’re seeing it happen the fastest is in global corporations because they’re competing with US-based multinationals – these are global applications and it’s a race to be first.
With it happening so fast, do you anticipate a stage where Hadoop becomes commoditised?
It’s going to touch every enterprise in the world, but I don’t think it will become commoditised. ERP systems, which have been around now for 30 years and touch every enterprise, are not commoditised. ERP systems are still fundamental to how we run businesses. We believe that Hadoop – and more importantly, enterprise analytic capabilities on top of Hadoop – are going to be the most strategic investment that enterprises are making in the next ten years. The fact that every enterprise is doing it doesn’t mean it’s commoditised – that actually means that’s where a lot of investment and resources is going to go. So we think it will be broadly adopted.
What is driving adoption of Hadoop, and why do people call it a free puppy?
There are two ways to get Hadoop. You can go to the Apache Software Foundation and just download open source projects and assemble it. Or you can go to a commercial distribution, a company that will assemble it and make it easier for you to consume. It’s estimated that about 80% of the people of the world using Hadoop have gone to a commercial distribution. Of those, Cloudera is about 80%. So we have probably close to 10,000 enterprises using our distribution for Hadoop, that free puppy. Our business model is: puppies aren’t free. You’ve got to care for them, maintain them, take them to the vet etc. So our business model is for those people to want to keep the puppy – they use our software, subscription, support and services to scale out.
Is that worrying for the CIO because they’re forced into a reactive approach, or is it indicative of how the role of IT is changing?
It’s indicative of how the role of IT is changing. Understanding Hadoop, big data and analytics is now on the agenda of every CIO – not because it’s bubbling up and they’re reluctant, but because they realise that to compete in the modern world you have to be information-driven. You have to know more about your customers, you have to know more about your competitors, and you have to know about the factors that impact your business.
Most CIOs have moved from having to automate the processes of the business to having to know more about the business. It’s top of the mind for them. The fact that Hadoop is bubbling up from guys experimenting in the IT department is actually an advantage because when we talk to CIOs now we tell them who in their business has already been working in this stuff. We tell them they already have the software in place so if they understand the technical stuff they understand how it impacts their business. I’ve been in enterprise software for 30 years and this is the first time in my career where I have had endless access to CIOs, CFOs and even CEOs around technology. That’s because of the strategic impact to the business.
Is it also a wake up call to CIOs that there role is changing because developers and lines-of-business are driving such innovation, rather than them?
I think it’s a wake-up call that if they don’t get ahead of this trend of lines-of-business driving IT initiatives, issues like compliance, security and governance of data, which fall back on IT, start to become more difficult. So what IT needs to do is get ahead of that curve and put a foundation in place that addresses those things – offer a service to lines-of-business rather than trying to claw stuff back. We’re seeing that trend now.
That’s why you don’t hear us talking about Hadoop anymore as much as you hear us talking about an Enterprise Data Hub. The Enterprise Data Hub is a reference architecture for a CIO of how to architect Hadoop into their data landscape. So if CIOs want to get ahead of that curve, here’s how they architect it in. That reference architecture shows how data fits with existing technologies, how to get data in and out, and all the partners that are integrating to it. That is to help offer IT as a service to their business users, verses getting behind the curve.
Hadoop has progressed a lot but some businesses are still reporting reliability issue. How are these issues being addressed?
If you’re going to have an Enterprise Data Hub as a core part of your platform – you’re building your core business applications on it – it’s got to be secure, governable and reliable. So we’re spending most of our time on those enterprise-grade capabilities, and then we’re leveraging our partners to extend their data in the applications and all the tools on top of the platform. Reliability is something that is consistently getting better. If you look at the first generation of data management tools – data warehousing, analytic engines, BI tools etc. – they evolved over a 25-to-30-year period. In the last five years, we’ve caught up with about 20 years of that. Every year this community and this architecture is recreating what we learned the first time around, and catching up on around five years from the first time around. It’s moving so fast.
We’ve seen Cloudera dominate the Hadoop distribution market, but now that you speak of bringing all of an enterprise’s big data solutions together in an Enterprise Data Hub, where does that place you in the market?
We compete with Hadoop distibution vendors, but with the Enterprise Data Hub we also compete with traditional players like IBM or EMC Pivotal. But we have a very simple differentiation from those guys. IBM will come in and say here’s Hadoop and here’s this stack of technology on top. They offer the end-to-end solution – not only do they have Hadoop, but they have an ETL tool, a data warehouse, a discovery tool, and a BI tool. We say, here’s the Enterprise Data Hub as a platform, and here are 200 ISVs (independent software vendors) the customer can pick from to build their own stack for best of breed. We’re delivering a platform that gives ultimate choice. It’s open-source at the core and it’s open architected.