Large Language Models can be backdoored by introducing just a limited number of “poisoned” documents during their training, a team of researchers from the UK’s Alan Turing Institute and partners found. 

“Injecting backdoors through data poisoning may be easier for large models than previously believed as the number of poisons required does not scale up with model size,” they confirmed in a paper this week. 

Prompt injection has been described as the “social engineering of LLMs” and is a growing concern for security leaders – given the potential for attackers to trick AI systems into ignoring rules, running malicious code, or leaking data. OWASP has some useful mitigation guidance here.)

A key finding of the team was that just “250 poisoned documents similarly compromise models across all model and dataset sizes, despite the largest models training on more than 20 times more clean data.”

The October 8 paper, also co-authored by researchers from the UK AI Security Institute, the University of Oxford, and ETH Zurich, tested the ability of a malicious attacker to pre-seed training data that gave them control over the LLM that absorbed them as part of its training data set. 

In this particular instance, they simply triggered the LLMs to “output gibberish text upon seeing a trigger string but behave normally otherwise.” Significantly more malicious opportunities abound.

Earlier studies have “concluded that it is “a practically feasible attack vector for an adversary to modify the public web” to poison LLMs during the pre-training stage, the researchers noted. 

Their key takeaway? “Poisoning attacks… should be analysed in terms of the absolute number of poisoned examples required, rather than as a percentage. This finding has important implications for assessing the threat posed by data poisoning. Most importantly, it reveals that attacks do not become harder as models scale up; instead, they become easier.” 

See also: CVSS 10 bug in LLM-to-SQL library highlights prompt injection risks

The link has been copied!