The dangers of web scraping

Web scraping’s prevalence, sophistication and industry have expanded alongside the internet’s growth, according to a Distil Networks study.

Through analysis of top web scraping platforms and services, the report outlines how the democratisation of web scraping allows users to effortlessly steal sensitive information on the web.

Web scraping is a computer software technique for extracting information from websites, and often includes transforming unstructured website data into a database for analysis or repurposing content into the web scraper’s own website and business operations.

>See also: Besting the bad bots: how advanced persistent bots are attacking sites, and what to do about them

While much of this is not illegal, it sits in a grey area where legality and morality can be debated.

In most cases, bots, which make up 46% of web traffic, are implemented by individuals to perform web scraping at a much faster rate than humans alone.

38% of companies who engage in web scraping do so to obtain content, while it is also used for research, contact scraping, price comparison, weather data monitoring, and website change detection.

The top industries affected by web scraping that the studying identified were (in order): Real estate, digital publishing, e-commerce, directories and classifieds, airlines and travel.

Currently, according to the report, around 2% of online revenues can be lost through misuse of this online content.

This is not the only issue, with web scraping’s ability to expose ‘private’ information that is posted online, and could lead to significant fines in a world of stricter regulations.

>See also: Internet of Things: is API management the weak link?

Diverse actors leverage web scraping bots, including nefarious competitors, internet upstarts, hedge funds, fraudsters, hackers, and spammers, to effortlessly steal whatever pieces of content they are programmed to find, and often mimic regular user behavior, making them hard to detect and even harder to block.

“If your content can be viewed on the web, it can be scraped,” said Rami Essaid, CEO and co-founder of Distil Networks.

“Not only does web scraping pose a critical challenge to a website’s brand, it can threaten sales and conversions, lower SEO rankings, or undermine the integrity of content that took considerable time and resources to produce.”

“Understanding the pervasive nature of today’s web scraping economy not only raises awareness about this growing challenge, it also allows website owners to take action in the protection of their proprietary information.”

>See also: How attackers are quietly creeping inside your perimeter using covert attack communications

As we become more dependent on the internet in the Internet of Things era, the impact of content on the being stolen and re-used will increase.

At the same time, the cost of web scraping services has reduced dramatically – services can be had for as little as $3.33 an hour.

The average web scraper makes $58,000 annually, and when working for a large company specialising in web scraping this can reach $128,000 per year.

Web scraping is becoming increasingly desirable and easy to carry out, with the risks posed to businesses and individuals rising significantly.

Avatar photo

Ben Rossi

Ben was Vitesse Media's editorial director, leading content creation and editorial strategy across all Vitesse products, including its market-leading B2B and consumer magazines, websites, research and...

Related Topics