ChatGPT vs GDPR – what AI chatbots mean for data privacy

While potentially game-changing for the sharing of information online, ChatGPT and other chatbots do bring their fair share of privacy risks that need to be considered.

While OpenAI's ChatGPT is taking the large language model space by storm, there is much to consider when it comes to data privacy.

If you’ve browsed LinkedIn during the last few weeks, you’ll almost definitely have heard some opinions on ChatGPT. Developed by OpenAI, which also created generative AI tools like DALL-E, ChatGPT uses an extensive language model based on billions of data points from across the internet to reply to questions and instructions in a way that mimics a human response. Those interacting with ChatGPT have used it to explain scientific concepts, write poetry, and produce academic essays. As with any technology that offers new and innovative capabilities though, there is also serious potential for exploitation and data privacy risks.

ChatGPT has already been accused of spreading misinformation by replying to factual questions in misleading or inaccurate ways, but its potential use by cyber criminals and bad actors is also a huge cause for concern.

ChatGPT and the GDPR

The method that OpenAI uses to collect the data ChatGPT is based on is still yet to be disclosed, but data protection experts have warned that obtaining training data by simply trawling through the internet is unlawful. In the EU, for example, scraping data points from sites can be in breach of the GDPR (and UK GDPR), the ePrivacy directive, and the EU Charter of Fundamental Rights. A recent example of this is Clearview AI, which built its facial recognition data base using images scraped from the internet and was consequently served enforcement notices by several data protection regulators last year.

Under GDPR, people also have the right to request that their personal data is removed from an organisation’s records completely, through what is known as the “right to erasure”. The trouble with natural language processing tools like ChatGPT is that the system ingests potentially personal data, which is then turned into a kind of ‘data soup’ — making it impossible to extract an individual’s data.

It is, therefore, not at all clear that ChatGPT complies with GDPR. It doesn’t seem to be transparent enough; it may be collecting and processing personal data in unlawful ways, and it seems like data subjects would find it difficult to exercise their rights including the right to be informed and the right to erasure.

The technical risks of ChatGPT

As an open tool, the billions of data points ChatGPT is trained on are made accessible to malicious actors who could use this information to carry out any number of targeted attacks. One of the most concerning capabilities of ChatGPT is its potential to create realistic-sounding conversations for use in social engineering and phishing attacks, such as urging victims to click on malicious links, install malware, or give away sensitive information. The tool also opens up opportunities for more sophisticated impersonation attempts, in which the AI is instructed to imitate a victim’s colleague or family member in order to gain trust.

Another attack vector might be to use machine learning to generate large volumes of automated, legitimate-looking messages to spam victims and steal personal and financial information. These kinds of attacks can be highly detrimental to businesses. For example, a payroll diversion Business Email Compromise (BEC) attack, composed of impersonation and social engineering tactics, can have huge financial, operational, and reputational consequences for an organisation — and ChatGPT will be seen by some malicious actors as a valuable weapon for impersonation and social engineering.

A force for good?

Fortunately, it’s not all doom and gloom: large language models like ChatGPT also have the potential to be a powerful cybersecurity tool in the future. AI systems with a nuanced understanding of natural language can be used to monitor chat conversations for suspicious activity, as well as automating the process of downloading data for GDPR compliance. These automation capabilities and behavioural analysis tools can be used by businesses in cyber incident management to expedite some of the manual analysis usually done by professionals. Realistic language conversations are also a great educational tool for cyber teams if used to generate phishing simulations for training purposes.

While it’s still too early to decide whether or not ChatGPT will become a favourite tool of cyber criminals, researchers have already observed code being posted to cybercrime forums that appears to have been crafted using the tool. As AI continues to develop and expand, tools like ChatGPT will indeed change the game for both cybersecurity attackers and defenders.

Camilla Winlo is head of data privacy at professional services company Gemserv.

Data Privacy Day 2023: keeping data secure and compliant — We explore what organisations need to consider this year when it comes to keeping data secure and compliant with regulations.

Information Age guide to data + privacy — Data and privacy regulation is becoming increasingly complicated, with the EU set to fine companies up to €20m for misusing people’s information. Here are strategies and tools to ensure you stay compliant.

ChatGPT vs GDPR – what AI chatbots mean for data privacy

While OpenAI's ChatGPT is taking the large language model space by storm, there is much to consider when it comes to data privacy.

ChatGPT and the GDPR

The technical risks of ChatGPT

A force for good?

Related:

Related Topics

Related Stories

What is AI-SPM (AI Security Posture Management)?

How is AI transforming the insurtech sector?

How artificial intelligence is helping to slash fraud at UK banks

Will more AI mean more cyberattacks?

Related Stories

Fully Homomorphic Encryption (FHE) with silicon photonics – the future of secure computing

DFIR and its role in modern cybersecurity

Is RaaS becoming commoditised?

Cutting the cord: Can Air-Gapping protect your data?