Automate everything or get left behind

John Jainschigg, content strategy lead at Opsview, explains to Information Age why it’s vital to treat monitoring as a part of DevOps automation. Automate everything or get left behind image

DevOps is about accelerating delivery of new products and services at scale, reliably and affordably. Doing this requires automation — using software to build, configure, deploy, scale, update, and manage other software.

We usually think of monitoring as happening alongside this process — its job is to alert operators when things go wrong, help analyse issues, confirm compliance with service-level objectives. But it’s better practice to treat monitoring as a vital part of ops automation. A modern, full-featured monitoring platform can be a powerful automation engine in its own right, and a critical enabler for larger automation initiatives in application and infrastructure lifecycle management and problem mitigation. It can even, in many cases, work to enable autonomous operations like self-scaling and self-healing.

>See also: IT monitoring: Don’t monitor yourself into a madhouse, Opsview

Here are some of the ways that your monitoring system can be able to help you get more done, eliminate human error, and meet (not just comply with) SLOs:

Streamlined, automated monitoring system deployment and lifecycle management. On-premises monitoring solutions can co-reside with monitored infrastructure; both in classic private clouds and data centres and in provider-hosted virtual private clouds (VPCs). This lets them comply with security, privacy, data governance and other regulations; and helps them overcome bandwidth and cost barriers that can limit the scalability of SaaS monitoring solutions. Premise monitoring must be deployed, scaled, and updated, however — and this can be daunting for all but very simple, single-server configurations.

Forward-looking makers of this kind of monitoring platform are starting to exploit popular deployment automation frameworks like Ansible, Puppet, and Chef (the same ones DevOps is using to automate infrastructure deployment and routine operations) to streamline monitoring-system deployment in scaled-out, highly-available configurations. For operator convenience, they’re hiding deployment-tool complexity behind webUIs and simplified configurators, though the standard tooling is accessible for DevOps folks who wish to dovetail monitoring-system or metrics-collector deployment with infrastructure roll-outs — a best practice. Details of monitoring can be defined and maintained as part of definitive “infrastructure as code” repositories.

>See also: The value of visibility in your data centre

Automated agent deployment and monitored-object registration via API. Standard deployment tools like Ansible can also be used to inject, configure, and update monitoring components (endpoint agents, required libraries, etc.) on hosts. The same tools can extract facts from deployment manifests or directly from hosts at deploy time, then use monitoring-system APIs to rapidly configure monitoring for host infrastructure and applications, as well as “unmonitor” hosts at the end of life. Routinely putting systems under monitoring as soon as they’re deployed enables rapid detection of issues in staging or production, and can be used to trigger rollbacks, if required — an important best-practice for continuous delivery.

CMDB ingestion. Some monitoring platforms can ingest data from operations management tools and configuration management databases (CMDBs), such as those offered by ServiceNow and similar vendors. This lets operators quickly and confidently configure monitoring for existing infrastructure, applications, and full business services — avoiding laborious and error-prone manual compilation of system facts.

>See also: What is the right storage software needed for DevOps to be a success?

Discovery and auto-monitoring. Sophisticated monitoring solutions use an increasing range of methods, including direct access to hosts via SSH and indirect access via configuration repositories like ActiveDirectory and services like Windows Discovery, to extract facts from existing infrastructure and speed up monitoring configuration by operators. Leading-edge products are now moving towards automating the process completely: creating comprehensive maps of infrastructure, apps, and complete business services and monitoring these things without the need for any manual intervention or direction.

Alert processing, notification, escalation, integration. Alerting is, of course, a powerful form of automation. It entails decision-making, which may be simple (e.g., some metric has surpassed a given threshold) or significantly more complex (e.g., several metrics, from separate systems, have entered states predictive of a particular kind of known failure for a critical business service). It involves sophisticated assignment and escalation based on issue, team rotas, time/date and other variables. It demands outbound integration with communications methods such as email, or with multi-mode notification platforms such as PagerDuty; or more sophisticated integration with issue-management (e.g., JIRA) or operations workflow management (e.g., ServiceNow) as well as collaboration (e.g., Slack) and other solutions. All this automation power works together to get the right alert to the right person at the right time while avoiding over-alerting and fatigue — smoothing operations and helping teams avoid downtime and meet SLO commitments.

>See also: How workload automation can bring DevOps up to speed 

Proactive issue mitigation. Finally, sophisticated monitoring solutions now provide the ability to execute scripts on hosts, or trigger centralised automation (e.g., Ansible) to perform tasks based on monitored conditions: from rebooting a failed server to scaling up an infrastructure cluster. Over the next decade, developments in machine learning will gradually improve the ability to monitor systems to deduce the abstract structure and function of business services, monitor them automatically, predict their failure modes, repair them and optimise their performance — either autonomously, or by optimal allocation of operator resources to tasks.

By John Jainschigg, content strategy lead at Opsview

Latest news

divider
Government & Public Sector
Government responds to UK critical national infrastructure and cyber skills report

Government responds to UK critical national infrastructure and cyber skills report

13 November 2018 / The UK Government has today vowed to build its cyber security capacity across the critical national infrastructure [...]

divider
Research
Is London’s reign as Europe’s data capital under threat?

Is London’s reign as Europe’s data capital under threat?

13 November 2018 / Work is needed to maintain London’s reign as Europe’s data capital, as the UK’s most [...]

divider
Legislation & Regulation
The GDPR and Brexit

The GDPR and Brexit

13 November 2018 / Background It is difficult to think of a piece of legislation that has generated as [...]

divider
Data Protection & Privacy
Organisations need to improve data protection and compliance protocols

Organisations need to improve data protection and compliance protocols

13 November 2018 / Data is quickly becoming recognised as an organisation’s most important digital asset, or as some [...]

divider
Research
Are millennials a threat to business cyber security?

Are millennials a threat to business cyber security?

13 November 2018 / New research from SailPoint, the identity governance platform, highlights how despite the prevalence of data breaches, the [...]

divider
Events
Tech Events Calendar

Tech Events Calendar

13 November 2018 / With the aim of bringing our readers a extensive tech events diary, Information Age will [...]

divider
Cloud & Edge Computing
How to cut through the complexities of the enterprise cloud in the era of IoT

How to cut through the complexities of the enterprise cloud in the era of IoT

13 November 2018 / Enterprise adoption of the Internet of Things (IoT) has exploded over the past few years. [...]

divider
Business Skills
The CTO: the ‘ultimate integrator at any company’

The CTO: the ‘ultimate integrator at any company’

13 November 2018 / Dmitri Tcherevik, CTO at Progress, is a serial entrepreneur, having founded and run two successful [...]

divider
Cybersecurity
Should we be saying goodbye to SIEM and hello to SOAR?

Should we be saying goodbye to SIEM and hello to SOAR?

13 November 2018 / Back in 2005, when SIEM was first popularised as a way of helping organisations to [...]

Do NOT follow this link or you will be banned from the site!

Pin It on Pinterest