The digital transformation has both simplified and added complexity to the management of modern IT operations, with teams empowered by options for infrastructure, point tools and platforms, and simultaneously crushed under the weight of those options. Managing hybrid environments requires one set of tools, and cloud-native workloads (like serverless) require another. Plus, all these options produce data, which ought to lead to actionable insights, but instead, continue to bog infrastructure teams down as they sift through what matters.
This has set the stage for the rise of artificial intelligence for IT operations, or AIOps. AIOps, with its inference models that mine data looking for insights, is meant to ingest this infrastructure data and produce those actionable insights. At least that’s the promise. And since it entered the scene as a formal discipline with Gartner in 2016, IT teams have been trying to figure out how to employ it to make their lives easier. But how? What are the actual use cases?
Why automation falls short in IT Ops
How is AIOps really used?
An upcoming study from OpsRamp surveyed IT professionals who are actually using AIOps in their organisations and uncovered a few real-world examples of how artificial intelligence and machine learning are being employed to deliver quantifiable value. They include:
Anomaly/threat detection: AIOps is a valuable addition to a strong security management posture. Heuristics and algorithms can mine traffic data for botnets, scripts or other threats that can take out a network. Especially when these threats are complex, multi-vector and layered, machine learning can expose patterns that can undermine business service availability.
Event Correlation: Infrastructure teams are faced with floods of alerts, and yet, there is only a handful that really matters. AIOps can mine these alerts, use inference models to group them together, and identify upstream root-cause issues that are at the core of the problem. It transforms an overloaded inbox of alert emails into one or two notifications that really matter.
Will 2019 see the automation of automation and push up salaries of data scientists?
Intelligent alerting and escalation: After root-cause alerts and issues are identified, IT ops teams are using artificial intelligence to automatically notify subject matter experts or teams of incidents for faster remediation. Artificial intelligence can act like a routing system, immediately setting the remediation workflow in motion before a human being ever gets involved.
Incident auto-remediation: AIOps is also being used as an end-to-end bridge between ITSM and IT operations management tools. Traditionally, ITSM teams sift through infrastructure data to identify and remediate issues at the root cause. AIOps extracts root cause inferences from infrastructure alerts and sends them to an ITSM team or tool through API integration pathways.
Capacity optimisation: This can also include predictive capacity planning and refers to the use of statistical analysis or AI-based analytics to optimise application availability and workloads across infrastructure. These analytics can proactively monitor raw utilisation, bandwidth, CPU, memory and more, and help increase overall application uptime.
What’s ahead for AIOps?
As complexity mounts, options increase and pressure builds for IT teams to deliver business services with minimal downtime. AIOps is emerging as both a cutting-edge discipline and next-level advantage for faster, more efficient infrastructure monitoring and management. Companies and IT leaders are starting to recognise the value. Today it’s for alert management and data mining. Tomorrow? Automation through AI can very well drive new pathways for innovation and task relief in fundamental ways. This trip has truly only just begun.
Written by Jiayi Hoffman, data science architect at OpsRamp