For a niche software development company like Accius, three days’ worth of work represents a substantial investment of time and effort. So when its development server crashed leaving that amount of code corrupted and irrecoverable, the US company had no option but to find a way to get it back – and quickly.
The reason this episode is notable is that the development server was hosted on the Amazon Web Services (AWS) cloud computing platform. The recovery process that followed is therefore one that many organisations may have to go through themselves if they adopt public cloud services in the future.
“After the development server crashed, it wasn’t rebooting and it wasn’t restarting,” recalls Accius founder and president Douglas Moore. “One of our engineers forced the server to disconnect from the virtual storage volume so that we could create a new instance of the server. That’s what corrupted the files – he didn’t realise that a forced disconnect is the same as yanking a cable out of a storage area network (SAN).”
Because the fault occurred within a virtual machine it was not Amazon’s responsibility to fix it. “They take care of the basic infrastructure; they don’t take care of what the operating system does to your files,” says Moore. He therefore turned to computer forensics provider Kroll Ontrack.
“This was the first time we had been asked to recover data from the Amazon cloud,” recalls David Logue, a recovery engineer at Kroll Ontrack, “so we spent some time upfront to understand how it really works.” What Kroll Ontrack found was that, thanks to the degree of control that AWS customers have over their hosted systems and to the fact that those systems are entirely virtualised, it could recover Accius’s data remotely.
The corrupted virtual storage volume was cloned and attached to a new instance of the development server. “If it had been a physical server, that would have meant buying new hardware,” says Moore. Once a piece of software had been installed that allowed Kroll Ontrack to access the new instance of the server remotely, Logue downloaded his recovery tools and used them to bring back the lost data.
“The big advantage of the Amazon cloud is that it gives the customer full control over the virtual disk and the operating system; it’s as if the customer was sitting at their workstation,” explains Logue. “You don’t get that with all hosting providers, and we would not have been able to do this recovery remotely without it.”
Not being able to restore the data remotely is not simply an inconvenience, Logue adds. If the hosting provider stores data belonging to more than one customer on a single machine, it may require the permission of those other customers before recovery can take place. “When an organisation is evaluating a hosting provider, that would be one of the things I suggest they look into,” he says.
He adds, however, that when incidents are more serious than in the case of Accius, recovery may require the hosting provider’s involvement, whatever degree of access they grant their customers. “We’ve seen incidents in hosted environments where it’s not the file or the virtual volume that’s the problem but the whole underlying SAN. This makes recovery much more complex and the hosting company needs to be involved.”