One of the greatest threats to a system admin or IT executive in charge of Internet assets is the possibility of an outage that could partially or severely disable normal business operations.
With more than 3,000 Internet outages happening every day, redirects and hijacks on the rise and cloud providers, CDNs, data centres and ISPs all susceptible to slowdowns and latencies, the question is not if your Internet infrastructure will experience outages and delays, but when.
The Internet is a complex, interconnected and difficult space to fully understand and control. Businesses need tools and planning to deal with the inherent challenges of the Web and help to improve their performance and ROI.
Following are six ways to limit the impact of slowdowns, hijacks and outages on your Internet infrastructure.
Proactive beats reactive
Apple’s App Store outage this past March cost the company about £22 million ($34 million) in lost sales and severely reduced the quality of the customer experience. During the 11-hour outage, customers couldn’t access the online store and even some physical stores were interrupted – hampering sales and damaging the brand.
> See also: How the always-on world will change business
Similar outages like this summer’s failures at United Airlines, The Wall St Journal and New York Stock Exchange (NYSE) and the California cable cuts that impacted AWS and Netflix are issues that are bound to happen again.
Mitigating for these unavoidable Web performance issues is the only way to ensure that your network will pass through online natural disasters-like events undeterred.
What is your backup strategy and how easy is it to implement in the event of a change in Internet conditions? Start at the DNS layer and consider your connections across your cloud providers, CDNs and ISP hosts.
Know the risks of human error
While external threats to your infrastructure make all of the headlines, internal errors are very often the cause of network failures. If an IT department updates servers or makes edits to website code, even the smallest error can cause an entire site to crash.
A simple typing error can cause redirects as well as cause widespread network problems. Even bringing Web properties offline for site maintenance can cause issues if not handled to perfection.
To create a safe operating environment and avoid downtime caused by human error, consider deploying multiple instances of your services and steer traffic to stable sites while other sites are upgrading. This can be done using load-balancing and a rolling upgrade strategy. This can be especially effective for cloud-based hosting approaches.
Make wise internet infrastructure investments
From the cloud, to data centres, to CDNs, there are a variety of Internet infrastructure vendors in the marketplace that all can play roles in a successful network plan. As cooperative as your cloud partners are, they do not provide visibility into how well your customers are connecting to the cloud instances, how well they are serving your key markets or a consolidated view across cloud vendors.
Make sure that you have the visibility and the tools to monitor, control and optimise your own Internet infrastructure to make changes when a delay or outage occurs.
In addition, don’t make price the only factor in the decision process. As much as infrastructure needs can be a commodity business, multi-faceted infrastructure plays can be coordinated so that redundancies exist to protect against crippling infrastructure issues.
Consider your availability and reachability. Are you available to your customers and do customers in all geographic locals have the same customer experience?
The rule of three – or four
One of the keys to optimising Internet Performance is to have external contingency plans in place that can be quickly enabled should your monitoring of Web performance show that issues have arisen. Use multiple clouds, CDNs and data centres to ensure that your infrastructure will be engineered to sustain an outage.
It is also recommended that businesses with global end-users use an advanced DNS solution with geo-location so you can control which cloud instance serves which customers. This capability gives your business more flexibility and value, allowing you to scale with an always-on impression for customers.
The ability to access different pathways also comes in handy when there are outages or slow load times — whether due to a traffic routing problem or a malicious attack.
Monitor and control assets in one place
Accessing and controlling the assets in your IT infrastructure across technologies in a single place gives you the total picture of connection and performance issues. Separate views and varying measurement will lead to misunderstandings and missed opportunities to do the right thing.
If you’re not using a tool that gives you a comprehensive, end-to-end view into the speed and availability of your information across the Internet, your ability to rectify critical situations is disadvantaged.
Real time data, root cause analysis and total asset visibility is a must when slowdowns and downtime can have serious impacts on sales and brand reputation.
Improve customer support
How many times have you called an IT helpline and received little to no clarification on the source, expansiveness or duration of an outage that is affecting you as an IT exec or even as a consumer? Customer support around outages and Internet Performance issues is notoriously murky.
But what if your business could use slowdowns and outages as an opportunity to improve your customer relations by offering clarity to a situation, educating customers about where the issues exists and assuring them that the performance issue is an issue outside of your network and that your team has the tools to expediently address the problem.
Having the power to identify issues and provide information to your customer support that passes this information onto your valued customers means increased brand trust and loyal customers.
It also lessens or eliminates the mean time to innocence, allowing you to have the knowledge and the know-how to identify and fix problems when and where they arise.
Sourced from Paul Heywood, director for EMEA – Dyn