Researchers from Anthropic, the UK AI Security Institute and the Alan Turing institute, found it is easier than thought to poison very large models:
we demonstrate that by injecting just 250 malicious documents into pretraining data, adversaries can successfully backdoor LLMs ranging from 600M to 13B parameters
Hidde de Vries (@hdv@front-end.social) is a web enthusiast from Rotterdam, The Netherlands. He currently works on accessibility standards for the Dutch government (views his own) and is in the W3C's AB. Previously, he worked for Mozilla, W3C/WAI, national and local governments, Sanoma Learning and others as a freelancer. Hidde is a public speaker (all 81 talks). In his free time, he works on a coffee table book covering the video conferencing apps of our decade.