Every organisation needs a business continuity management strategy that is prepared for any possible incident. This means effectively stress-testing your infrastructure at every opportunity.
“Technology has had an increasingly powerful impact on the business landscape in recent years, it is critical to ensure business continuity by managing quality and risk,” said Ivan Ericsson, head of quality assurance at Expleo.
“But whilst their value is indisputable, the sheer complexity of the different tech systems on the market today means there’s also a high chance for things to go wrong – both at implementation stage, and post deployment.”
With this in mind, let’s explore how to effectively stress-test your business continuity management for the long term.
Four key components
“The most effective business continuity plans have four components: business recovery, IT disaster recovery, supplier risk management, and emergency management,” said Griffin. “Implemented unilaterally, these contingency measures can not only prevent a crisis, but actively sow the seeds of recovery – ensuring resources are managed efficiently to support rebuilding efforts going forward.
“Testing your continuity plan should be a formal, strategic process. It should demonstrate to you, your staff, and your stakeholders and investors that you’re not only aware of the risks facing your business, but have legitimate means of coping should crises arise.”
How companies can ensure business continuity during the pandemic
Accepting the worst case scenario
He said: “Disaster comes in many forms, and when it is declared and the playbooks come out, time is of the essence. A proper resilience strategy needs to accept that the organisation could be hit and potentially lose all IT services. Accepting that a worst case scenario is possible helps design against that scary reality. While we often want to start with the technology, the starting point for BCM and DR should always be with the risk to the business.
“The Business Impact Analysis (BIA) helps define what strategies must be put in place to cope with many different scenarios. These can be a supplier that can’t deliver or a pandemic (or ransomware during a pandemic). How ready is the business to continue in a technically deprecated analogue world while IT gets the business bootstrapped? How long will a complete bootstrap take? What is the most critical thing the business does and what are the steps necessary to restore that function? Has the business and IT staff rehearsed? Are the playbooks up to date?
“Chronically undocumented and out of date application dependency chains usually are a larger risk to restoring operations than many IT teams expect. Your restore bench strength can measure in the TBs per hour, but if you don’t know what to recover first or how to do it then delays are inevitable. This is why DR testing is so important. If an IT team has never had to recover Active Directory and re-establish synchronisation, then go look at the Microsoft website to look at the process now.
“It is not a trivial process. Aside from the raw horsepower of recovery, things like this are the best place to stress test to identify knowledge and technology gaps and close them.”
Be prepared — how to manage Salesforce Data Recovery end-of-life
Distinguishing hot and cold data
It’s vital that data remains accessible even after disaster strikes, and according to Krishna Subramanian, COO of Komprise: “A backup regime needs to start by knowing the difference between hot and cold data. Cold data doesn’t need to be on the highest-performing, highest-priced storage or repeatedly backed up and replicated.
“Not only is storing cold data on primary storage needlessly expensive, it also means you’re backing up and repeatedly copying data that never changes. This is not only costly but lengthens backup windows, which ultimately affects the performance of your hot data. Now you have a storage, budget, and performance problem.
“By archiving all your cold data to more affordable secondary or object storage, some of your savings can go toward more expensive flash storage for optimal performance of your hot data. And with transparent archiving, your users and applications still access moved data from its original location, without interruption, whilst shrinking backup windows and cutting backup costs.”
Another way to stress-test business continuity management strategies to be considered is to put automation in place, using artificial intelligence (AI) and machine learning (ML) capabilities.
Can we automate data quality to support artificial intelligence and machine learning?
Ericsson explained: “You really need to be in a position to mitigate against any potential risks both before a system is live, and afterwards, so there are no nasty surprises. End to end testing of every platform, both independently and in terms of its integration with the wider network of systems, is therefore critical. However this needs to be balanced against the need to deliver with speed and certainty – so strong automated testing should be seen as a standard component of your production systems.
“This will usually be provided by an independent quality assurance specialist. At Expleo we actually automate this process for clients to account for the complexity and speed of the technology and release cycles. Automated testing not only safeguards quality, but also adds value by providing immediate speed and efficiency gains.
“First, ML cuts through the testing workload and sieves the data at scale, surfacing the highest-priority test cases. Then, AI analyses this data in real-time, so we can respond to risks before they become issues. This is used as the basis for predictive analysis – so you can predict where risk is going to emerge and mitigate it in the most cost effective way.”
Planning, separating and replicating
An in-depth plan with objectives is needed to ensure that targets are met, and assets can be quickly recovered. This, according to Florian Malecki, international product marketing senior director at StorageCraft, involves separating and replicating data to protect it from cyber attacks.
“In the event of a cyber attack, to ensure the best possible chance of business continuity, organisations should work with their IT or managed service provider team to create a disaster recovery (DR) plan that will list the steps necessary to meet their recovery time objective (RTO) and recovery point objective (RPO),” said Malecki. “Any good DR plan should firstly identify high-priority servers that are hosting valuable data and applications, because they are the most critical for prioritising backups and recovery. It also must be able to restore backups quickly – regular backups are the foundation of any organisation’s DR plan.
“However, these backups have no value if they cannot restore quickly and easily when hit by a ransomware attack. A backup area network (BAN) can be used to keep backup data separate from production data. A dedicated backup and disaster recovery (BDR) solution should be on an isolated network, which can be locked down to ensure security is as tight as possible. A good DR solution will also replicate data to a remote location; second site within the company; or to private/public cloud.”
How can vendors and end users ensure cloud security?
Focus on the people dimension
“The single biggest element in your business continuity is your people. Ensure that they are given as much thought and monitoring as your systems,” said Dodd.
“Do they adopt fail-safe decisions in a crisis? Do they prioritise your customers and compliance with laws and regulations to minimise the impact of any downtime? Are you gathering their reactions and their fears in those first moments of business discontinuity? That can tell you an enormous amount about the fault lines in your business operation.”