Companies are building new applications everyday – whether it is to meet their own requirements or to serve their customers. Open source platforms are increasingly being used to support these applications, moving from initial development and experimentation into production.
For example, Apache Hadoop provides support for storage of huge volumes of data and companies are now looking at how to get more from their 'data lakes.' Meanwhile, new stacks of tools are being developed to help developers build their applications faster.
One example here is the SMACK stack, which includes the following components:
Spark – delivers near real-time and batch analytics on large volumes of data
Mesos – provides the 'operating system' for the data centre, or this case for all the components within the application
Akka – a toolkit and runtime for building highly concurrent, distributed and resilient message-driven applications
Cassandra – a distributed database management system that can cope with huge volumes of data, and Kafka – a message broker for managing and sending data.
Brought together, these elements provide the necessary building blocks for running applications that can handle hundreds or millions of transactions per second. This approach – based on open source projects – allows companies to scale up their applications by adding nodes to clusters, rather than having to migrate to newer and bigger appliances or server hardware. These technologies came from massive scaling needs and developed by the likes of Google, Amazon, LinkedIn and Facebook to run their operations.
In combination, these open source elements help companies run at scale while meeting customer expectations for service. However, there are a couple of elements that developers and CIOs both have to consider as these applications are moved into production. The first element is support.
Making open source and big data work in production
All the parts of the SMACK stack are helped by different projects within the Apache Software Foundation, with the exception of Akka which is available under the Apache License and supported by Typesafe. For CIOs, while the stack itself may interoperate well and people are available with skills that can support the individual parts, this can represent a challenge for running in production.
The communities around open source projects tend to be very active and growing, while commercial support is available for the open source toolsets involved. In 2016, I see more of the stack elements being supported by single companies to make this easier for CIOs to understand and get behind.
It’s easy to underestimate how important that 'single throat to choke' can be in running production IT systems, and open source continues to develop that approach in response to customer demand.
Alongside this, there are more options available for how data can be created and stored. There are many new database options available for IT teams to consider – the recent Gartner Magic Quadrant for Operational Databases in 2015 listed 30 vendors, all supporting their own products both open source and proprietary.
This variety offers IT teams a huge amount of choice and the potential to go down “best of breed routes” for their data; however, this can also lead to problems when it comes to support.
Next year, there should be greater consolidation in the market as vendors buy each other or start to support multiple database platforms under one roof. This should make it simpler for companies to run their critical applications on open source database platforms in the future.
Securing the future for open source applications
The second element here is security of data. Alongside the ability to run and support production volumes of data from a customer experience perspective, the security of that data is mission critical.
This has developed significantly over the past year as more companies begin to make the transition into running big data applications within their production applications. As these companies rely on those applications for revenue, the teams involved care more about security of the data they are putting into the system.
Both community and commercial open source projects are responding to this increasing demand for security. Steps like user and role-based authentication and management of object permissions can control the security of data stored within the database layer so that only those developers and team members allowed to view the data can access it.
Many of these open source platforms can work in fully distributed environments spread across the Cloud and on-premises clusters. For the NoSQL database Cassandra, the links between these clusters can be encrypted using SSL so that all data remains protected as well.
Alongside this, authentication of the nodes within Cassandra clusters to each other can be managed using Kerberos, LDAP or Active Directory when communication takes place over a non-secure network too.
For companies bringing together multiple open source tools that can pass data between each other, security of the elements involved should also be considered. Use of credentials for gaining access to components can ensure that authentication is completed, for example.
Alongside this, CIOs can work with their vendors to go through security requirements and ensure that their implementations are compliant with any relevant legislation as well as protected against outside attack.
Looking forward, many companies are implementing open source elements within their core business applications. This has gone beyond the web server infrastructure and into how business data is created, analysed and used to provide customer services. Supporting this move into production will be important for the future success of these applications.
Sourced Patrick McFadin, Chief Evangelist for Apache Cassandra, DataStax