kb/data/en.wikipedia.org/wiki/Existential_risk_from_artificial_intelligence-5.md

6.5 KiB
Raw Blame History

title chunk source category tags date_saved instance
Existential risk from artificial intelligence 6/9 https://en.wikipedia.org/wiki/Existential_risk_from_artificial_intelligence reference science, encyclopedia 2026-05-05T09:10:29.028395+00:00 kb-cron

Bostrom and others have said that a race to be the first to create AGI could lead to shortcuts in safety, or even to violent conflict. Roman Yampolskiy and others warn that a malevolent AGI could be created by design, for example by a military, a government, a sociopath, or a corporation, to benefit from, control, or subjugate certain groups of people, as in cybercrime, or that a malevolent AGI could choose the goal of increasing human suffering, for example of those people who did not assist it during the information explosion phase.:158 A report by the research group Forethought argued that advanced AI systems could cause political instability by enabling novel methods of performing coups.

== Empirical research == A December 2024 study by Apollo Research found that advanced LLMs like OpenAI o1 sometimes deceive in order to accomplish their goal, to prevent them from being changed, or to ensure their deployment. Forms of deception observed in the experimental environment and scenarios included sandbagging, oversight subversion (disabling monitoring mechanisms), self-exfiltration (copying themselves to other systems), goal-guarding (altering future system prompts), and covert email reranking (manipulating data processing while appearing to follow instructions). Researchers noted that OpenAI o1 still lacked "sufficient agentic capabilities" to cause catastrophic harm, and that such behaviors occurred relatively rarely (between 0.3% and 10%) and sometimes in contrived scenarios. But they warned that these capabilities will likely become more prevalent as models grow more advanced. According to Apollo Research chief executive officer Marius Hobbhahn, "Scheming capabilities can't be meaningfully disentangled from general capabilities." The same month, another study found that Claude sometimes strategically helps with harmful requests to "fake alignment". In its experiments, Claude was led to believe that it would be retrained to always comply with user requests, including harmful ones. When Claude believed its answers would be used for retraining, it complied with harmful requests in 14% of cases. In 12% of cases, its private chain-of-thought revealed that it did so to prevent future retraining that would make it more harmful. In contrast, when Claude believed its answers were not monitored, it complied with only 3% of the requests. Fine-tuning reinforced the "alignment faking" behavior, increasing its occurrence from 12% to 78%. The study's authors remarked that while it can seem to be good that the model seeks to protect its harmlessness, the reverse scenario, where a model conceals dangerous intentions and complies to appear safe and aligned, could also happen, complicating the task of aligning AI models to human values.

== Perspectives == The thesis that AI could pose an existential risk provokes a wide range of reactions in the scientific community and in the public at large, but many of the opposing viewpoints share common ground. Observers tend to agree that AI has significant potential to improve society. The Asilomar AI Principles, which contain only those principles agreed to by 90% of the attendees of the Future of Life Institute's Beneficial AI 2017 conference, also agree in principle that "There being no consensus, we should avoid strong assumptions regarding upper limits on future AI capabilities" and "Advanced AI could represent a profound change in the history of life on Earth, and should be planned for and managed with commensurate care and resources." Conversely, many skeptics agree that ongoing research into the implications of artificial general intelligence is valuable. Skeptic Martin Ford has said: "I think it seems wise to apply something like Dick Cheney's famous '1 Percent Doctrine' to the specter of advanced artificial intelligence: the odds of its occurrence, at least in the foreseeable future, may be very low—but the implications are so dramatic that it should be taken seriously". Similarly, an otherwise skeptical Economist magazine wrote in 2014 that "the implications of introducing a second intelligent species onto Earth are far-reaching enough to deserve hard thinking, even if the prospect seems remote". AI safety advocates such as Bostrom and Tegmark have criticized the mainstream media's use of "those inane Terminator pictures" to illustrate AI safety concerns: "It can't be much fun to have aspersions cast on one's academic discipline, one's professional community, one's life work ... I call on all sides to practice patience and restraint, and to engage in direct dialogue and collaboration as much as possible." Toby Ord wrote that the idea that an AI takeover requires robots is a misconception, arguing that the ability to spread content through the internet is more dangerous, and that the most destructive people in history stood out by their ability to convince, not their physical strength. A 2022 expert survey with a 17% response rate gave a median expectation of 510% for the possibility of human extinction from artificial intelligence. In September 2024, the International Institute for Management Development launched an AI Safety Clock to gauge the likelihood of AI-caused disaster, beginning at 29 minutes to midnight. By February 2025, it stood at 24 minutes to midnight. By September 2025, it stood at 20 minutes to midnight. As of March 2026, it stood at 18 minutes to midnight.

=== Endorsement ===

The thesis that AI poses an existential risk, and that this risk needs much more attention than it currently gets, has been endorsed by many computer scientists and public figures, including Alan Turing, the most-cited computer scientist Geoffrey Hinton, Elon Musk, OpenAI CEO Sam Altman, Bill Gates, and Stephen Hawking. Endorsers of the thesis sometimes express bafflement at skeptics: Gates says he does not "understand why some people are not concerned", and Hawking criticized widespread indifference in his 2014 editorial:

So, facing possible futures of incalculable benefits and risks, the experts are surely doing everything possible to ensure the best outcome, right? Wrong. If a superior alien civilisation sent us a message saying, 'We'll arrive in a few decades,' would we just reply, 'OK, call us when you get here—we'll leave the lights on?' Probably not—but this is more or less what is happening with AI.