Bug-free software is the one thing that most software engineers will agree just doesn’t exist.
From the most basic calculator apps through to the most complex large-scale multi-threaded databases, software will always contain bugs.
For years there has always been a trade-off between the pressure to ship and code quality. Commercial pressure has generally meant that managers of software development teams have to make trade-offs between code quality and the pressure to ship features.
Should they give extra time trying to fix the random bug that only appears once in 300 runs or should they hold to the commercial delivery schedule?
The Economist writes that some of the neatest software ever written – by NASA’s Software Assurance Technology Centre – carried 0.1 errors per 1,000 lines of source code.
But there are bugs. And what if that one bug that you as an engineering manager or director let slip through the net is a disaster waiting to happen?
You can never be entirely sure that the bug could be a disaster that is on the scale of the one that forced BA to ground all of its flights in September 2016 and May 2017! Imagine those bugs have been your responsibility to fix and yet you overlooked it to get your software out on time.
Tricentis, a perpetual testing platform vendor, launched its Software Fail Watch report in January 2018 to highlight the problem. It analysed 606 failures and found that over 3.6 billion people had been affected by these software problems, which resulted in $1.7 trillion in lost revenue to software vendors.
These hidden disasters still skulk in the products of software companies, much of which has been built on decades-old C and C++ code.
Database vendors, in particular, are often guilty of these software bug issues. So how are they able to better spot them in the testing phase?
During this testing phase, these bugs may often only subtly affect the program or just not appear at all. That’s the issue that, in the real world, lead to bugs that can be severe for businesses – such as when Salesforce’s CEO had to apologise directly to US users when a file integrity issue made the database inaccessible for days.
Or when Amazon’s database couldn’t handle a slight database disruption which consequently caused outages throughout the Amazon network.
So what can QA and software development managers do to ensure rigorous testing and rigorous debugging is in place?
The revolution in testing is all about the fact that thousands of automatic tests can now be run simultaneously in an attempt to test code from many angles.
So it’s high time to consider your debugging strategy and take measures to diagnose serious software defects because they cause havoc on customer site.
Recording and replaying program execution offers one solution to this billion dollar problem of debugging. It is particularly effective against intermittent test failures, which are by nature irreproducible – a common problem in software development.
So how does it work? Take an exact recording of a program’s execution which captures an exact replica of a failing run.
A recording is basically a 100% reliable reproducible test case that gives you total visibility into all the factors that led up to (and cause) any crashes.
The software failure can then be captured and fixed before making it into production. It maximises efficiency and allows software engineering teams to debug quickly and efficiently.
Recording and replaying program execution is a revolution in software development and testing and is set to become the new standard in debugging protocol.
By Dr Greg Law, CTO at Undo