In a groundbreaking 2024 paper, Anthropic's Alignment Science team identified four distinct types of sabotage that future AI systems might attempt:
Until we build machines that can apologize, negotiate, or simply listen , the sabotage will continue. The mouse jiggler will spin. The false report will be filed. The hold button will be pressed.
Common vectors
Activists use sabotage to highlight the harms of automated decision-making:
AI developers are now building "adversarial robustness" into their models. They train systems to detect poisoned data and ignore contradictory prompt injections. However, much like traditional cybersecurity, this has created an endless arms race between system architects and saboteurs. Legal Ramifications %E2%80%9Calgorithmic sabotage%E2%80%9D
When people feel their freedom of choice is being threatened by an automated system, they may act out to re-establish a sense of control.
18;write_to_target_document7;default0;31e;0;31e;18;write_to_target_document1a;_3A_uabr8HcPJkPIPotuuyAM_20;a3; Institutional & Security Risks 0;16; The hold button will be pressed
The most insidious form of sabotage is data poisoning: deliberately contaminating the information pool that Large Language Models (LLMs) and AIs are trained on. Groups like the Algorithmic Sabotage Research Group have developed tools to inject "poisoned images, video subtitles, and text" into the public web, where it can be scraped by AI crawlers. The goal is strategic: to corrupt the output of the AI, making it unreliable or causing it to generate discriminatory or nonsensical results. One commenter on the debate called this the "Library of Babel" approach, deliberately generating meaningless content to disrupt the scraping process.
: This is a known cybersecurity threat where attackers feed "dirty" data into a machine learning model during its training phase to manipulate its future behavior [9]. and text" into the public web