The dangers of web scraping

Web scraping’s prevalence, sophistication and industry have expanded alongside the internet’s growth, according to a Distil Networks study.

Through analysis of top web scraping platforms and services, the report outlines how the democratisation of web scraping allows users to effortlessly steal sensitive information on the web.

Web scraping is a computer software technique for extracting information from websites, and often includes transforming unstructured website data into a database for analysis or repurposing content into the web scraper’s own website and business operations.

While much of this is not illegal, it sits in a grey area where legality and morality can be debated.

In most cases, bots, which make up 46% of web traffic, are implemented by individuals to perform web scraping at a much faster rate than humans alone.

38% of companies who engage in web scraping do so to obtain content, while it is also used for research, contact scraping, price comparison, weather data monitoring, and website change detection.

The top industries affected by web scraping that the studying identified were (in order): Real estate, digital publishing, e-commerce, directories and classifieds, airlines and travel.

Currently, according to the report, around 2% of online revenues can be lost through misuse of this online content.

This is not the only issue, with web scraping’s ability to expose ‘private’ information that is posted online, and could lead to significant fines in a world of stricter regulations.

Diverse actors leverage web scraping bots, including nefarious competitors, internet upstarts, hedge funds, fraudsters, hackers, and spammers, to effortlessly steal whatever pieces of content they are programmed to find, and often mimic regular user behaviour, making them hard to detect and even harder to block.

“If your content can be viewed on the web, it can be scraped,” said Rami Essaid, CEO and co-founder of Distil Networks.

“Not only does web scraping pose a critical challenge to a website’s brand, it can threaten sales and conversions, lower SEO rankings, or undermine the integrity of content that took considerable time and resources to produce.”

“Understanding the pervasive nature of today’s web scraping economy not only raises awareness about this growing challenge, it also allows website owners to take action in the protection of their proprietary information.”

As we become more dependent on the internet in the Internet of Things era, the impact of content on the being stolen and re-used will increase.

At the same time, the cost of web scraping services has reduced dramatically – services can be had for as little as $3.33 an hour.

The average web scraper makes $58,000 annually, and when working for a large company specialising in web scraping this can reach $128,000 per year.

Web scraping is becoming increasingly desirable and easy to carry out, with the risks posed to businesses and individuals rising significantly.

Ben Rossi

Ben was Vitesse Media's editorial director, leading content creation and editorial strategy across all Vitesse products, including its market-leading B2B and consumer magazines, websites, research and... More by Ben Rossi

The dangers of web scraping

Ben Rossi

Related Topics

Related Stories

What the first 24 hours of a cyber incident should look like

Anti-fragility – what is it and why should it be the goal for your organisation?

Cyber Action Plan and what it means for supply chains

Ransomware has evolved – so must our defences

Related Stories

What the first 24 hours of a cyber incident should look like

Anti-fragility – what is it and why should it be the goal for your organisation?

Cyber Action Plan and what it means for supply chains

Ransomware payments to be banned – the unanswered questions