Kubernetes monitoring best tools

With Kubernetes becoming a key gateway for container software development, efficient monitoring and management of the platform is critical. Antony Savvas considers some of the best tools to help do it

Kubernetes monitoring and management best tools

Patrick Smith, field CTO of EMEA for Pure Storage, says of the requirements: “Many applications are relatively simple, lightweight and state-less. For these, a relatively vanilla Kubernetes implementation may be sufficient.

“But as organisations look at using Kubernetes for more business-critical production applications, the surrounding ecosystem becomes critical in delivering performance, monitoring and observability, resilience, and the ability to host persistent data. These functions go beyond the scope of the core Kubernetes platform. Organisations are then faced with the decision of whether to build, adopt or buy these capabilities.”

Craig Robinson, research vice president for security services at analyst house IDC, adds: “As organisations rapidly adopt Kubernetes to scale their DevOps, a lack of in-house skills will undoubtedly challenge teams. Security operations teams need coverage of every app, endpoint, network, and more, and resource-constrained teams can’t become experts on every new vector overnight.

“Adding a customisable, integrated approach to securing Kubernetes that allows security operations teams to get up and running quickly is becoming a must-have capability for modern organisations.”

Monitoring – DKP

D2iQ created the D2iQ Kubernetes Platform (DKP) to reduce Kubernetes complexity by providing a production-ready Kubernetes platform that could easily be deployed and managed.

This was achieved by curating and integrating some of the best Kubernetes services and providing centralised multi-cluster, multi-cloud fleet management across all environments, including all the major cloud providers.

Many companies deploy Kubernetes through an as-a-service offering from Amazon, Google or Microsoft because barrier-to-entry is lower, but the downside is the need to supplement those services to achieve production readiness, as well as the management limitations across multi-cloud, multi-cluster environments, including air-gapped environments, and being limited to the apps and tools provided by those particular providers.

D2iQ simplifies Kubernetes deployment and management through platform automation capabilities, including top-to-bottom declarative programming through Cluster API (CAPI) and easier DevOps through GitOps enabled by integrated FluxCD.

DKP is a platform designed for every key Kubernetes management requirement, including reliability, scalability, security, openness, declarative programming, centralised fleet management, ease-of-use, ease of deployment, observability, full automation, and cost effectiveness, says D2iQ.

Key customers include the likes of Royal Caribbean Lines, GE Healthcare and BMW.

Data protection – Velero

Sathya Sankaran, founder and general manager for CloudCasa Catalogic Software, says on data protection: “With the growing deployments of enterprise applications on Kubernetes, there is a need for data protection and recovery. One such Day 2 management tool is Velero, a popular open-source project for the backup and restore of Kubernetes clusters and their persistent data.

“Velero has been pulled or downloaded from Docker Hub over 50 million times. It is one of the most active projects in the Kubernetes ecosystem and it provides snapshots, backups and restores of Kubernetes data.”

Sankaran says Velero is a useful tool for the backup and recovery of single clusters but has its own limitations as a scalable enterprise solution. As organisations’ clusters grow and they adopt hybrid and cloud environments, running the backup on each cluster independently becomes difficult. Backup scripts have to be created and executed to avoid the tedious manual effort. Velero also lacks a GUI and users have no way of managing multiple clusters from one screen.

This is leading Kubernetes management vendors to build on top of Velero and to give users a central interface to manage or monitor multi-cluster backup and recovery operations.

Monitoring – VictoriaMetrics

Aliaksandr Valialkin, is the co-founder and CTO of data monitoring platform VictoriaMetrics, which is used by Grammarly, Roblox and Adidas, for instance.

“Monitoring Kubernetes and Kubernetes applications is the most popular use case for most monitoring solutions like ourselves. We understand the various monitoring issues Kubernetes causes,” says Valialkin.

One of the most overlooked issues is that by default, Kubernetes exposes huge amounts of metrics, which grows over time. “If you look at Kubernetes version 1.24.0, for instance, every node exports between 2,000 and 3,000 series and that’s without counting application metrics,” says Valialkin.

The number of metrics increases with the number of Kubernetes nodes and running containers, creating millions of metrics. “Considering only 25 per cent of Kubernetes metrics are ever used, it’s a huge drain on resources and budget to continue gathering potentially useless metrics,” adds Valialkin.

Another key issue is that there is no established standard for metrics, which means those in the community and various companies have tried to invent their own standards. So different varied standards result in even more metrics being stored across various applications.

Current monitoring solutions like Prometheus are busy trying to overcome the complexities introduced by Kubernetes, such as the active time series churn (ephemerality) and huge volumes of metrics for each layer and service. But adapting can only go so far when scalability is an issue.

VictoriaMetrics, on the other hand, addresses high cardinality and performance issues. It is optimised for using lower RAM and disk space for high cardinality series and provides optimisations such as a uses per-day inverted index, in order to overcome time series churn. This means businesses can monitor Kubernetes using less resources.

“But it does not solve the issue of how many metrics Kubernetes creates – this requires a reduction in complexity and the number of exposed time series, which many in our community are still trying to solve,” Valialkin says.

Managed detection and response – MDR

Expel has just launched managed detection and response (MDR) for Kubernetes. The product is designed to enable customers to secure their business across their Kubernetes environment and adopt new technologies at scale without being hindered by security concerns.

“Organisations are adopting Kubernetes as a way to help their developers move fast and scale. This is similar to the historical drive to cloud infrastructure and, just like that drive, it comes with a new set of opportunities and a new set of security challenges,” says Matt Peters, chief product officer at Expel. “We developed MDR for Kubernetes to enable organisations to take advantage of the Kubernetes ecosystem while still protecting what matters to them in today’s constantly shifting threat landscape.”

The need for fast, agile and lightweight application development has become a core competitive requirement, but without incorporating security from the start, risks increase. MDR for Kubernetes provides insights across three core layers of Kubernetes applications.

First, to help organisations stay ahead of pervasive misconfigurations, MDR for Kubernetes identifies cluster misconfigurations and references the Center for Information Security (CIS) Kubernetes benchmark for best practices to recommend configuration improvements, allowing security teams to proactively become more resilient against threats.

Second, the offering offers clarity. The technology integrates with Amazon Elastic Kubernetes Service (EKS) and Google Kubernetes Engine (GKE) infrastructure, and analyses Kubernetes audit logs, applies custom detection logic to alert on malicious or interesting activity, and provides step-by-step remediation recommendations.

Thirdly, Expel uses a “bring-your-own-tech” approach, so customers can maximise return on investment (ROI) with the solutions of their choice, with MDR for Kubernetes integrating with a portfolio of run-time container security vendors to get customers more value from the tech they already use.

Cloud control – OpenCP

Civo has launched OpenCP, an open-source extension to kubectl, the widely used Kubernetes command-line tool for controlling clusters. OpenCP provides a framework to build additional functionality on top of kubectl, to allow it to be used to control cloud services beyond just Kubernetes.

A main challenge is that most Kubernetes clusters require a secondary set of tools to manage the underlying cloud services. From configuring firewalls to adding cloud storage, these tools are far from standardised and every cloud provider is different.

OpenCP is designed to standardise commands under one universal set of instructions that will work in any cloud environment.

Dinesh Majrekar, CTO of Civo, says: “Kubectl is a much-loved tool, and there are already a number of extensions available for it. However, OpenCP is designed to supercharge the tool, giving it the ability to control far more than just Kubernetes through a standardised and well-documented interface.

“This should be music to the ears of developers. Every cloud has its own challenges, and learning proprietary commands and techniques is one of them. We want to help developers get past the admin and start doing what they love – developing great software.”

Automated discovery – Shipa

Mirantis has acquired Shipa to add automated application discovery, operations, security and observability to its Lens Kubernetes Platform.

Lens helps eliminate Kubernetes complexity and empowers users to easily manage, develop, debug, monitor, and troubleshoot their workloads across multiple clusters in real-time, supporting any certified Kubernetes distribution, on any infrastructure.

Shipa’s technology brings application intelligence and awareness to Lens, making it easy and simple for Kubernetes app owners to run, optimise, secure and support their apps anywhere. With minimal effort, users can see how their apps and microservices are deployed, along with a graphical view of network connections and maps of application dependencies.

Moreover, users can create and share run books tuned to their needs, building on a library of certified templates for a variety of use cases and security requirements.

“Shipa’s technology puts ground-breaking application discovery, optimisation, security and management capabilities in the hands of Lens users,” says Adrian Ionel, co-founder and CEO of Mirantis. “It will help cloud native software teams move even faster, freeing them to code and innovate.”

Management of applications is done independent of infrastructure with connections to incident management tools, vulnerability scanners, as well as integrations with Terraform, Slack and GitHub Actions.

Mirantis acquired Docker Enterprise in 2019 and integrated the technology into its Kubernetes platform.

Cloud protection – TLS Protect for Kubernetes

Venafi has introduced TLS Protect for Kubernetes. As part of the Venafi Control Plane for machine identities, the new offering enables security and platform teams to easily and securely manage cloud native machine identities, such as TLS, mTLS and SPIFFE, across all of an enterprise’s multi-cloud and multi-cluster Kubernetes environments.

By delivering increased visibility, control and automation over machine identity management within more complex cloud native infrastructures, it helps enterprises improve application reliability and reduce development and operational costs.

“As organisations shift from traditional data centre environments to modern, highly distributed cloud native infrastructures like Kubernetes, the volume of certificates and machine identities explodes, leading to increased threat risks and an increased need for security controls,” says Shivajee Samdarshi, chief product officer at Venafi.

“Through the Venafi Control Plane, we’re modernising machine identity management and making managing machine identities in cloud native environments easier than ever. TLS Protect for Kubernetes gives security and platform teams the observability, consistency and control over machine identities to ensure a validated and auditable chain of trust exists for every workload deployed to a Kubernetes cluster, including consistent approaches to certificate configurations and security policies.”

Built with a fully supported version of the cert-manager open-source project, TLS Protect for Kubernetes provides in-cluster observability to identify and remediate security risks stemming from poorly configured certificates, as well as offering options for security controls over certificate issuance to meet the security team policy for enforcing trust.

It also includes a management interface that provides full visibility of public trusted certificates for ingress TLS, as well as private certificates for inter-service mTLS for pod-to-pod and service mesh use cases. By building a detailed view of the enterprise security posture across multiple clusters and cloud platforms, including certificates that have been manually created by developers, it proactively identifies operational issues which help platform teams maintain cluster integrity and prevent outages.

More on Kubernetes

5 Kubernetes technology vendors hot right nowKubernetes is a technology that has created a whole new ecosystem around itself, and it is now a key plank in the DevOps movement when it comes to developing new applications and services and improving business operations

Kubernetes vs Docker – pros and consKubernetes vs Docker is an argument that goes on in the DevOps community. Which one is the best DevOps tool when it comes to deploying software in the cloud?

How enterprises can get better with KubernetesHow mature is your business when it comes to adopting Kubernetes? Steve Judd at Jetstack offers a guide for CTOs of enterprises adopting Kubernetes for software deployment, scaling and management

Antony Savvas

Antony has been a business technology journalist for 35 years, including following the convergence of computing and telecoms, the emergence of mobile and wireless data, and now new industry productivity...