Incident Management (IM) is hardly a new component in enterprise IT, but in today’s world of increasing cloud-based applications, intelligent automation and unified approaches to communications, it is more significant than ever before.
If there was ever a time when organisations could reliably depend on an offshore or skeleton IM function, that era is fast disappearing. These types of incident management programmes simply cannot keep up with this dynamic and complex enterprise IT landscape – and having a substandard IM function can have dire reputational and financial implications as the business critical nature of IT infrastructure increases rapidly.
How has IM evolved to reach this point? And how can enterprises best position themselves in the midst of this evolution?
What is the function of IM?
First, it’s important to define what incident management actually is. ISO 20000 states that the objective of IM is ‘to restore agreed service to the business as soon as possible or to respond to service requests’.
This, however, sounds very similar to the function of any technical helpdesk. It’s important to understand that IM isn’t the technical repair service; it’s the strategic coordinator at the core of the entire incident response process.
Part of the role of IM, then, is to bring together all the appropriate technical and service management expertise when an incident occurs, in order to analyse and respond to the incident as quickly as possible.
Part of the role is a communicative one, making sure that all key stakeholders are aware of the issue and what its business impact is. And part of the role is one of overall responsibility, checking that normal operations have been resolved comprehensively and that the business is truly back to normal.
As such, the IM team needs to take responsibility, maintain authority and be the driving force behind the resolution of any IT incident or problem. It’s both a centralised and a hugely significant Problem Management function.
Yet even that isn’t the entirety of what IM is responsible for. It is also important to distinguish between the above process, which is all about responding to immediate, business-critical issues and threats, and a longer-term, trend-focused role. There are two facets to problem management; a reactive facet, which responds to incidents as they occur, and a proactive facet, which drives long-term optimisation.
This is about examining the kinds of incidents that arise more than once, and exploring incident prevention techniques, and the ways in which services and systems can be optimised. Typically, this requires an events monitoring system, to collate all the available information about IT problems and use this to combat recurring issues.
How is IM evolving?
It’s clearly a dynamic and complex picture, yet the fundamental role of IM has not actually changed that much over the years. Restoration of normal operations has always been its core objective, and that has always been through a mixture of immediate incident response, and long-term optimisation. So what has changed?
To understand this, businesses need to look more closely at how enterprise IT has changed thus far, and how it continues to evolve. A number of key technologies are having transformative effects on how organisational IT operates, and these, in turn, have a knock-on impact on how IM operates.
First, think about the rise and rise of cloud computing, and the migration of ever-greater numbers of systems to public or private cloud environments. Once business-critical applications are migrated to the cloud, connectivity technologies such as LANs and WANs become business-critical too, so there’s a whole new set of priorities for the IM team to think about.
Cloud computing therefore not only extends the scope of IM out of the organisation’s premises to cloud environments, but also introduces a whole new set of business-critical technologies for the IM team to look after. If essential business applications are hosted in the cloud, then day-to-day operations depend on reliable connectivity to the cloud.
Second, there’s automation and artificial intelligence (AI), true technology buzzwords of the moment, and tools that are being applied in a vast array of industries. While moving over to machine-driven rather than human-driven functions can be a cost-saving measure, it also means that software or hardware incidents can have further-reaching impacts on organisations than ever before.
Individual human actions may be replaced by long chains of automated instructions, so if just one fails then a bottleneck of disruption can occur. Once again, this dramatically increases the scope of responsibility for IM, and the urgency of restoration of service.
Third, there’s the drive towards unified communications; that is, uniting all voice and data communications on single consolidated networks. This can dramatically streamline management and visibility, and enable greater innovation in terms of adding feature-rich applications – but it also links together previously disparate communication channels.
In turn, this means that issues with one channel can have sharp knock-on effects. Since communications are the lifeblood of a huge array of business services, a failure of one part of the unified communications infrastructure can rapidly become a business-critical issue, so it is vital for IM to have clear visibility and control over the entire network.
Today’s sophisticated security threat landscape is also worth considering. Security does not always fall under the IM remit – some organisations run an entirely separate security function – but when it does, IM teams need to be able to both investigate and mitigate the immediate risk and look into longer-term protection and prevention measures. With security threats ranging from simple phishing emails up to hugely sophisticated malware designed to surreptitiously infiltrate data over long periods, this is no small task.
In short, not only has enterprise IT expanded hugely to impact on far more areas of business operation than it did previously, it has also become more tightly integrated, so that issues within one area are more likely to affect others. And this means that the importance, scale and scope of enterprise IM has never been greater.
Location, location, location
A more nuanced element of the IM evolution lies in where it is physically located. As with many aspects of enterprise IT, and indeed, functions that lie outside of the IT department altogether, there has been a historic trend for organisations to base departments offshore, where running costs are lower.
>See also: Are cyber threats still not a priority?
However, as IT and consequently IM become more complex and critical to business operations, this trend is beginning to reverse. As the strategic and communicative hub of critical incident response, IM teams need to have outstanding communication skills, strong understandings of the culture and priorities within the organisation in question, and great breadth and depth of technology knowledge; qualities that are far harder to pin down and guarantee when the IM function is being run from continents away. Offshore IM functions quickly become a false economy when they are simply unable to be the strategic lead and driving force they need to be.
A broad range of technological factors, then, are coming together to make incident management an increasingly important, and increasingly complicated, aspect of enterprise IT.
Sourced by Tom Bellew, Incident Management Lead at Systal