IA: You’ve previously served in senior sales roles at Oracle, where you directed a $1 billion sales team. What made you pack that in to found Hortonworks back in 2011?
I grew up in Oracle and the software world, then the dotcom crash showed there was going to be a very different way the enterprise wanted to consume software, and around 2004-05 I saw the movement towards open source.
I joined the company heading JBoss, an open source middleware product competing against IBM WebSphere and others. We figured out completely different models on how to build software and distribute that software. We really grew the company fast, grew the community fast and realised that this was going to be the future of software.
Ultimately, Red Hat bought the company, but I was involved with some other open source companies at the time and companies doing a lot of investing in the open source space.
I saw an opportunity with SpringSource, which had deep roots in London, and joined as chief operating officer. The company really grew phenomenally well, and this was very instrumental in building a huge app ecosystem of partners. It was then acquired by VMware. Open source really changed the way software was built and consumed.
Around 2009-10, we took a step back and said we really want to examine what’s going to be the biggest IT shift in the enterprise, and where we can build the biggest company possible around the biggest shift that’s happening in IT.
We saw that shift clearly happening around big data. All these new paradigm data sources coming into enterprise: mobile data, weblog data, click stream data, sensor data, web-based data.
But enterprise just had no place to put it that was economically powerful, or architecturally would work with their existing platform. They were running out of space, and we saw the opportunity of Hadoop to solve that problem.
We knew, with Hadoop being an open source platform, if we built a big, successful company around it we would need the core committers that were driving the technology. And we knew the tech had originated at Yahoo! and the core committers were all still at Yahoo!, so we approached them to help us build the next generation of enterprise data platform based on Hadoop.
It took us a while, but we finally figured out a new structure that worked for both us and Yahoo!, and our founding team was launched.
What have been your key milestones over the past few years?
We went public in the summer of 2014, which was obviously a key milestone for us. We were the first open source company since Red Hat to go public, and by doing so we really established that Hadoop is a data platform that is here to stay and is a very viable entity that we had tremendous confidence in. So it showed that the ecosystem was very big and worthy of having a public company in it.
Another tremendous milestone was that we were the fastest software company ever to reach $100 million in revenue.
Our acquisition of Onyara was also a huge milestone for us in that it has expanded our footprint significantly, from just being a data-at-rest platform with Hadoop to being a multi-product, multi-platform company.
With the acquisition of the NiFi product, we actually moved all the way to the edge where data originates, so our platform is the one that moves data from the point of origination all the way through its life cycle, giving customers the ability to interact with it through every step associated with that movement process until it comes to rest.
For so many of our customers they have really started to see the returns on that investment and hard dollar ROI and use cases.
You recently announced a joint collaboration with Hewlett Packard Enterprise to optimise enterprise Spark performance. What will that entail, and why did you go to HP?
At a macro level, HP is a terrific partner with us. On the enterprise side we have a number of reference architectures; that’s how our products work together in a deeply integrated and highly supported way. We get a lot of support from them, and it’s been very successful as a general partnership.
Spark is one of several examples where we have become extremely focused on making the technology much more enterprise ready and better from a performance standpoint. HP is doing work to rebuild some of the core engines and make them more stable, driving architecturally better performance and the way they make those engines work.
We partner with HP to properly adopt Spark into all the work we are doing across the board, and make those engines we rebuilt part of our core offering, which is then made enterprise ready and deeply integrated into our core HPT or Hadoop platform. So it’s all packaged and productised, ready for massive workloads to go through and for customers to expect a very stable, predictable environment.
Spark is going to continue to be used over the next few years, without question or doubt. It has a terrific purpose in what it does for bringing great processing power and analytics workloads, and there is going to continue to be a need for fast analytics processing.
So we are going to make sure that Spark evolves into an enterprise-ready engine, and we are going to continue to stay involved in the community and do all that we can to make sure it’s a successful part of our enterprise platform.
What do you see as some of the biggest challenges for enterprises and their big data strategies over the next few years?
The biggest challenge for companies is that there is just so much data, and they’re faced with the opportunity to bring it under management and get insight from it. So they really have to take a step back and realise the ways in which data allows them to transform business models and the way they interact with customers and the supply chain, and make sure they have a really good view of the leverage points that they truly want to go after.
It’s also about being prescriptive in knowing which are the right data sets to bring under management and which do not create as much value – and then how to accurately architect the data to come under management and have the tools to most efficiently act on that data as it comes into a form they can quickly get value from.
The IoT is generating a phenomenal volume of data that has tremendous value associated with it. Data generated from those devices, sensors and actions can give enterprise tremendous visibility and insight.
When we take the data generated from IoT and apply it in a way that creates more timely, cleaner, more economical and better-value ways to interact then it becomes truly transformational.
> See also: The UK’s top 50 data leaders and influencers
But to get to that point they have to be able to collect it all as it’s happening, in real time, and have visibility into it while it’s still in motion. And they must have the platform and technology to make decisions pre-emptively or be able to predict something that’s going to happen so that they can either expand it, extend it or intervene and not have that outcome happen.
All of the data that gives them that visibility is here in enterprise today – the challenge is bringing it under management and having the ability to act on it. That is where our combined open source platforms come in, allowing people to do that economically for about a tenth of the price they could do in their old proprietary world.
What do you think makes Hortonworks stand out in the crowded Hadoop market?
We are the company that innovates the core architecture of Hadoop and its capability around the volume and type of data – we will always continue to lead that. For our Hadoop distribution, Hadoop distribution, what we have done is really change the architecture in which companies manage data.
Other distributions can be very limited because of the architectural constraints of their platforms to be able to manage only fractured and siloed data sets. We have expanded the way they can manage data and Hadoop by giving enterprise the central data architecture, so they can access all data unencumbered and be able to drive not just batch applications but interactive and real-time applications.
And we have extended that with our Hortonworks DataFlow platform, so now they don’t have to be just a data-at-rest business; they now have the ability to go all the way from the point of origination and manage data all the way through the life cycle, interact with it at any point in time and manage that.
We have got the broadest ecosystem of apps, many of which our partners build, and with the Hortonworks platform they can transform their data architecture.