Sandboxing vs. heuristic-based scanning: a malware detection 101

Most people who work in the anti-malware industry are familiar with signature-based detection, where if a file is determined to be malicious, a signature is written so anti-malware programs are able to detect that file or component in the future.

The threat landscape is challenging for signature-based detection with an ever-increasing number of threats and the shortened duration time for the effectiveness of a single signature variation.

Because of these difficulties, complements to signature-based detection, such as heuristic-based scanning, sandboxing and/or multi-scanning (scanning for threats with multiple anti-malware engines) are needed to more effectively address modern risks.

In this post, we look at the pros and cons of both heuristic-based scanning, which is used alongside signature-based detection in multi-scanning solutions to increase detection rates, and sandboxing.

What is heuristic-based scanning?

As opposed to signature-based scanning, which looks to match signatures found in files with that of a database of known malware, heuristic scanning uses rules and/or algorithms to look for commands which may indicate malicious intent.

By using this method, some heuristic scanning methods are able to detect malware without needing a signature. This is why most antivirus programs use both signature and heuristic-based methods in combination, in order to catch any malware that may try to evade detection.

The benefits

Heuristic scanning is usually much faster than sandboxing because it does not execute the file and then wait to record its behavior, with the exception of some emulation-based techniques.

Vendors can change the rules in their heuristic engines with their daily update packages based on new threat vectors without the details being known to malicious actors.

It does not give away details on how malware is flagged (unlike sandboxing), so malware authors will not be aware of what they need to change in order to evade detection.

> See also: Incident response – how late is too late?

And lastly, heuristic scanning is able to detect malware that can evade sandbox detection through blind spots targeted by malware authors.

Its limitations

When scanning a sample, the information found is generally limited to the threat name. Because the engines are looking for specific pieces of code which indicate a malicious action, it can lead to two possible limitations:

If the vendor has not built detection for a particular action, then the malware will evade detection.

If the malicious action is obfuscated successfully (e.g. within an encrypted file), it will evade detection.

Some of the older methods of heuristic-based scanning have a higher propensity for reporting false positives because they are looking for a wide range of actions that could indicate a potentially malicious file.

However, newer methods of heuristic scanning such as generic detection produce false positives less frequently. Generic detection works by looking for features or behaviors that are commonly seen for known threats.

What is sandboxing?

Sandboxes consist of some sort of purpose-built environment, usually virtualised (in some cases physical), where the potentially malicious files are executed and their behavior is recorded.

The recorded behavior is then analyzed automatically through a weights system in the sandbox and/or manually by a malware analyst. The goal of this analysis is to determine whether the file is malicious and if it is, what exactly the file does.

The benefits

Because sandboxing actually opens the file being analyzed, it is able to see in detail exactly what that file will do in that particular environment.

Instead of a binary yes/no and threat name, most sandboxes offer reporting with details on the behavior recorded. In addition to providing more information on how to classify the file, this method can be particularly useful in an incident response environment in order to identify exactly what the intention of the file was, in order to understand what the effects are.

Though it varies by product, many offer the ability to create a highly customized environment. For example, a piece of malware that is designed to only fully execute on a particular user’s machine can be replicated.

Its limitations

Because of the visibility to their methodology and customization that is available in commercial sandboxes, malware creators can build specific behaviors to get around detection. This includes two key categories:

'Sandbox aware' malware which is able to tell it is being executed in a sandbox and will act differently in order to not be flagged as malicious. This may be as simple as not running on any virtual machine, or something more advanced looking for signs specific to a sandbox.

> See also: From prevention to detection: why businesses must shift their security focus

Blind spots will vary based on the product, but in some cases malware creators have created pathways to act maliciously in ways which cannot be detected by the sensors of a particular sandbox.

There needs to be an environment to execute the sample and the time necessary to collect full reports, particularly if trying to accommodate stalled code execution, it takes both a large amount of time and hardware resources to process a given sample, causing relatively low throughput.

While the industry trend is towards automated sandboxes, many still only provide the raw data on behavior of the malware and it is necessary to either build a custom application to interpret the information, or have a malware analyst manually review the information.

Due to the overhead time in running them, many sandboxes are optionally or completely cloud-based, which renders sensitive files as unusable.

As detailed above, sandboxing does have its limitations. We recommend using sandboxing in combination with other methods, like multi-scanning, to increase malware detection rates.

Both heuristic-based scanning and sandboxing present unique strengths and weaknesses, and for different situations one scanning method may be more appropriate than the other. The best security comes from utilising both methods simultaneously in order to minimise the number of samples which may be able to evade detection.

Sourced from Curtis Cade, Sales Engineer at OPSWAT

Avatar photo

Ben Rossi

Ben was Vitesse Media's editorial director, leading content creation and editorial strategy across all Vitesse products, including its market-leading B2B and consumer magazines, websites, research and...

Related Topics

Threat Detection