Avoiding the downtime blame game

No matter which sector an organisation finds itself in, one key aim of the IT team will always be same – maintain uptime.

Over the years, technology has taken over more of what businesses need to survive. According to a study by analyst firm IHS Markit the financial impact of downtime is $700 billion per year. Without access to business-critical applications, there is no business.

However, no matter how hard they try, IT teams cannot always keep systems up and running.

One of the most notable examples of downtime occurred in 2016 when United Airlines grounded all of its U.S. flights after an IT error.

Outages such as this inflict massive damage to a company’s finances and reputation. The longer they last, the worse things get.

To fix the problem, you first have to find it, which can often difficult for IT teams.

Larger enterprises often operate with an array of disparate hardware and software that carry a patchwork of vendor-specific tooling to monitor and manage each system.

This collection of tools creates additional silos in the IT stack. It takes multiple IT teams to manage these tools, creating an environment where technicians step over each to solve the same problem.

Conversely, a single technician might be assigned to unravel a number of disparate reports sent from several internal and external support teams.

It is at this moment when the fingerpointing ensues. Blame is assigned to different tools managed by different teams.

To end this confusion and inevitable downtime, IT teams must move to a centralised environment with a single set of tools and reporting process.

This will allow them to view information in one place so IT teams can sing off the same hymn sheet. This centralised approach provides several advantages.

Greater visibility

Having the full stack visible in a single pain of glass helps teams monitor the full IT environment from storage to compute to virtualisation and more.

This enhanced visibility makes easier and faster to identify and resolve issues and avoid downtime

Early warning signs

A single solution presents analytics and alerts from the entire IT environment, offering a better alternative to multiple tools that lack the context of the wider stack.

This can allow teams to be presented with issues before they escalate.

Risk analysis

Calculate risks through IT layers with specific downtime scenarios, such as a datastore running out of storage and causing an App/VM outage.

The Internet of Things will continue to drive massive amounts of data that will make databases and IT infrastructures significantly more complex and harder to manage.

With this in mind, it is important for IT teams to centralise and consolidate IT management tools and teams to proactively avoid preventable issues.

Implementing a comprehensive solution and eliminating disparate systems and tools will enable the birds-eye view of that is crucial to ending the downtime blame game that occurs in so many organisations.

Sourced by Mike Kelly, chief technology officer, Blue Medora and general manager, SelectStar

Nick Ismail

Nick Ismail is a former editor for Information Age (from 2018 to 2022) before moving on to become Global Head of Brand Journalism at HCLTech. He has a particular interest in smart technologies, AI and... More by Nick Ismail

Avoiding the downtime blame game

Greater visibility

Early warning signs

Risk analysis

Nick Ismail

Related Topics

Related Stories

Andrew McAfee – ‘Human beings are chronically overconfident’

Keys to effective cybersecurity threat monitoring

How businesses can vet their cybersecurity vendors

Five key signs of a bad MSP relationship – and what to do about them

Related Stories

What does leadership in a hybrid world look like?

Future workers to work three-and-a-half day weeks, says JP Morgan chief

Five key steps towards a connected enterprise

CCI Kenya: why more companies are turning towards the BPO sector in East Africa