Managing internet performance is a big data challenge
You can't afford to leave internet performance unmanaged, and this management requires measurement: lots of it
Short of time?
Today’s Internet consumer expects every web page to be served up in no more than a second or two—no matter how complex the page is, how many pieces it has, or where on the Internet those pieces might be hosted. Disappoint them with a slow page load, and users will go elsewhere, abandoning transactions or moving on to the next search result.
With revenue and user loyalty at stake, end-to-end Internet latency has become a closely watched metric. Minimising latency has become critical, not only for the delivery of primary content, but for each of the many third-party services that may be embedded in increasingly complex web pages.
Fortunately, there are new tools that allow enterprise IT teams to not only identify the paths critical traffic takes through the Internet, but quantify the primary sources of latency en route, monitor the evolution of those paths in real-time and even suggest alternative strategies that can get traffic delivered to customers faster and more reliably.
From there to here, from here to there
Today’s Internet is a dizzying mesh of over 48,000 autonomous systems connected by more than 350,000 bilateral relationships. This complexity creates challenges in mapping, measuring and monitoring the Internet as a service delivery platform. Once beyond your enterprise borders, Internet traffic inevitably takes the cheapest path, which is rarely the shortest, fastest or most reliable.
The Internet’s complexity continues to increase every year, and a single transaction can easily involve four or more autonomous systems in each direction, all of which must cooperate to deliver the packets to the end user’s desktop or mobile device. Moreover, the specific cities in which those intermediate providers agree to hand off their traffic can mean the difference between a snappy user experience, and a sluggish one.
The specific causes of these Internet performance problems require detailed measurement and path analysis tools. Existing APM (application performance management) and NPM (network performance management) solutions have traditionally focused on identifying slowdowns within the datacentre, or within the local server-side network infrastructure, but that still leaves a huge analytical gap to be filled by tools that monitor the performance of the Internet’s service delivery paths.
Measure by measure
Because of the complexity of today’s Internet, service delivery can go wrong in a number of ways. Traffic that has to travel a long way will obviously take a long time. That makes the geolocation of key interconnection points an important consideration. Congestion effects can add tens of milliseconds of additional queueing delay, if intermediate providers attempt to save money by underprovisioning at critical points along the way.
Enterprises are increasingly turning to systematic end-to-end path measurement strategies to root out the sources of these Internet performance problems. Most will start simply, using the traceroute program included in every network engineer’s diagnostic toolkit to manually examine the routers along the path to each customer.
With the staggering number of paths that traffic can take and ever-changing network conditions, generating enough measurement probes to adequately map today’s complex Internet is not a simple task. To get a complete picture of potential performance problems, it’s important to identify the geographic paths taken by traffic from each service endpoint (including CDN and cloud providers), to each of an enterprise’s customers, over a long enough time that less common variant paths can be exposed and their latency factored in.
With an average of two intermediate service providers serving both ends of each transaction, the number of paths to be inspected grows quickly. And even then, a direct point-to-point measurement strategy may not reveal the existence of alternative faster paths that traffic could potentially take. A comprehensive Internet performance measurement strategy should combine the best of both worlds: direct end-to-end path measurement between servers and customers, overlaid with indirect global measurement of the alternative competing paths.
Finally, path analysis really becomes a big data challenge when the time dimension is incorporated. The best analytical tool suites today can provide a mix of historical and real-time statistics about Internet performance. The paths go through each supporting service provider, all the way from the cloud to the end customer, looking back over weeks, if not months, of measurement history to identify normal and anomalous paths.
More than just a 'nice to have'
While end-to-end Internet performance can be a daunting measurement and management challenge, it is necessary to understand the end user’s experience and avoid their defection to faster competing services. Internet user experience relies on the performance of service providers beyond the enterprise firewall, and those providers are not always well-incentivised to provide fast, reliable interconnection whose paths align with the enterprise’s global customer footprint.
Simple trace-based measurement from the datacentre can be a starting point for investigation. To scale appropriately, a more complete consideration of all the paths traffic can take through today’s complex Internet requires a more methodical approach. By carefully structuring the investigation of latency problems, asking the right questions about connectivity and paths, and collecting enough measurements of the latencies along those paths, IT leaders can ensure their specific hosting choices are well-aligned to their customers’ needs.
Sourced from Jim Cowie, chief scientist, Dyn