IETF works on building blocks to let websites declare if crawlers can take their content for traininy:
Right now, AI vendors use a confusing array of non-standard signals in the robots.txt file (defined by RFC 9309) and elsewhere to guide their crawling and training decisions. As a result, authors and publishers lose confidence that their preferences will be adhered to, and resort to measures like blocking their IP addresses.
Hidde de Vries (@hdv@front-end.social) is a web enthusiast from Rotterdam, The Netherlands. He currently works on accessibility standards for the Dutch government (views his own) and is in the W3C's AB. Previously, he worked for Mozilla, W3C/WAI, national and local governments, Sanoma Learning and others as a freelancer. Hidde is a public speaker (all 79 talks). In his free time, he works on a coffee table book covering the video conferencing apps of our decade.