ChatGPT vs GDPR – what AI chatbots mean for data privacy

While OpenAI's ChatGPT is taking the large language model space by storm, there is much to consider when it comes to data privacy.

If you’ve browsed LinkedIn during the last few weeks, you’ll almost definitely have heard some opinions on ChatGPT. Developed by OpenAI, which also created generative AI tools like DALL-E, ChatGPT uses an extensive language model based on billions of data points from across the internet to reply to questions and instructions in a way that mimics a human response. Those interacting with ChatGPT have used it to explain scientific concepts, write poetry, and produce academic essays. As with any technology that offers new and innovative capabilities though, there is also serious potential for exploitation and data privacy risks.

ChatGPT has already been accused of spreading misinformation by replying to factual questions in misleading or inaccurate ways, but its potential use by cyber criminals and bad actors is also a huge cause for concern.

>See also: Microsoft and OpenAI enter third phase of AI acceleration partnership

ChatGPT and the GDPR

The method that OpenAI uses to collect the data ChatGPT is based on is still yet to be disclosed, but data protection experts have warned that obtaining training data by simply trawling through the internet is unlawful. In the EU, for example, scraping data points from sites can be in breach of the GDPR (and UK GDPR), the ePrivacy directive, and the EU Charter of Fundamental Rights. A recent example of this is Clearview AI, which built its facial recognition data base using images scraped from the internet and was consequently served enforcement notices by several data protection regulators last year.

Under GDPR, people also have the right to request that their personal data is removed from an organisation’s records completely, through what is known as the “right to erasure”. The trouble with natural language processing tools like ChatGPT is that the system ingests potentially personal data, which is then turned into a kind of ‘data soup’ — making it impossible to extract an individual’s data.

It is, therefore, not at all clear that ChatGPT complies with GDPR. It doesn’t seem to be transparent enough; it may be collecting and processing personal data in unlawful ways, and it seems like data subjects would find it difficult to exercise their rights including the right to be informed and the right to erasure.

The technical risks of ChatGPT

As an open tool, the billions of data points ChatGPT is trained on are made accessible to malicious actors who could use this information to carry out any number of targeted attacks. One of the most concerning capabilities of ChatGPT is its potential to create realistic-sounding conversations for use in social engineering and phishing attacks, such as urging victims to click on malicious links, install malware, or give away sensitive information. The tool also opens up opportunities for more sophisticated impersonation attempts, in which the AI is instructed to imitate a victim’s colleague or family member in order to gain trust.

Another attack vector might be to use machine learning to generate large volumes of automated, legitimate-looking messages to spam victims and steal personal and financial information. These kinds of attacks can be highly detrimental to businesses. For example, a payroll diversion Business Email Compromise (BEC) attack, composed of impersonation and social engineering tactics, can have huge financial, operational, and reputational consequences for an organisation — and ChatGPT will be seen by some malicious actors as a valuable weapon for impersonation and social engineering.

A force for good?

Fortunately, it’s not all doom and gloom: large language models like ChatGPT also have the potential to be a powerful cybersecurity tool in the future. AI systems with a nuanced understanding of natural language can be used to monitor chat conversations for suspicious activity, as well as automating the process of downloading data for GDPR compliance. These automation capabilities and behavioural analysis tools can be used by businesses in cyber incident management to expedite some of the manual analysis usually done by professionals. Realistic language conversations are also a great educational tool for cyber teams if used to generate phishing simulations for training purposes.

While it’s still too early to decide whether or not ChatGPT will become a favourite tool of cyber criminals, researchers have already observed code being posted to cybercrime forums that appears to have been crafted using the tool. As AI continues to develop and expand, tools like ChatGPT will indeed change the game for both cybersecurity attackers and defenders.

Camilla Winlo is head of data privacy at professional services company Gemserv.

Related:

Data Privacy Day 2023: keeping data secure and compliantWe explore what organisations need to consider this year when it comes to keeping data secure and compliant with regulations.

Information Age guide to data + privacyData and privacy regulation is becoming increasingly complicated, with the EU set to fine companies up to €20m for misusing people’s information. Here are strategies and tools to ensure you stay compliant.