kb/data/en.wikipedia.org/wiki/Existential_risk_from_artificial_intelligence-3.md

5.4 KiB

title chunk source category tags date_saved instance
Existential risk from artificial intelligence 4/9 https://en.wikipedia.org/wiki/Existential_risk_from_artificial_intelligence reference science, encyclopedia 2026-05-05T09:10:29.028395+00:00 kb-cron

An existential risk is "one that threatens the premature extinction of Earth-originating intelligent life or the permanent and drastic destruction of its potential for desirable future development". Besides extinction risk, there is the risk that the civilization gets permanently locked into a flawed future. One example is a "value lock-in": If humanity still has moral blind spots similar to slavery in the past, AI might irreversibly entrench it, preventing moral progress. AI could also be used to spread and preserve the set of values of whoever develops it. AI could facilitate large-scale surveillance and indoctrination, which could be used to create a stable repressive worldwide totalitarian regime. Atoosa Kasirzadeh proposes to classify existential risks from AI into two categories: decisive and accumulative. Decisive risks encompass the potential for abrupt and catastrophic events resulting from the emergence of superintelligent AI systems that exceed human intelligence, which could ultimately lead to human extinction. In contrast, accumulative risks emerge gradually through a series of interconnected disruptions that may gradually erode societal structures and resilience over time, ultimately leading to a critical failure or collapse. It is difficult or impossible to reliably evaluate whether an advanced AI is sentient and to what degree. But if sentient machines are mass created in the future, engaging in a civilizational path that indefinitely neglects their welfare could be an existential catastrophe. This has notably been discussed in the context of risks of astronomical suffering (also called "s-risks"). Moreover, it may be possible to engineer digital minds that can feel much more happiness than humans with fewer resources, called "super-beneficiaries". Such an opportunity raises the question of how to share the world and which "ethical and political framework" would enable a mutually beneficial coexistence between biological and digital minds. AI may also drastically improve humanity's future. Toby Ord considers the existential risk a reason for "proceeding with due caution", not for abandoning AI. Max More calls AI an "existential opportunity", highlighting the cost of not developing it. According to Bostrom, superintelligence could help reduce the existential risk from other powerful technologies such as molecular nanotechnology or synthetic biology. It is thus conceivable that developing superintelligence before other dangerous technologies would reduce the overall existential risk.

== AI alignment ==

The alignment problem is the research problem of how to reliably assign objectives, preferences or ethical principles to AIs.

=== Instrumental convergence ===

An "instrumental" goal is a sub-goal that helps to achieve an agent's ultimate goal. "Instrumental convergence" refers to the fact that some sub-goals are useful for achieving virtually any ultimate goal, such as acquiring resources or self-preservation. Bostrom argues that if an advanced AI's instrumental goals conflict with humanity's goals, the AI might harm humanity in order to acquire more resources or prevent itself from being shut down, but only as a way to achieve its ultimate goal. Russell argues that a sufficiently advanced machine "will have self-preservation even if you don't program it in... if you say, 'Fetch the coffee', it can't fetch the coffee if it's dead. So if you give it any goal whatsoever, it has a reason to preserve its own existence to achieve that goal."

=== Difficulty of specifying goals === In the "intelligent agent" model, an AI can loosely be viewed as a machine that chooses whatever action appears to best achieve its set of goals, or "utility function". A utility function gives each possible situation a score that indicates its desirability to the agent. Researchers know how to write utility functions that mean "minimize the average network latency in this specific telecommunications model" or "maximize the number of reward clicks", but do not know how to write a utility function for "maximize human flourishing"; nor is it clear whether such a function meaningfully and unambiguously exists. Furthermore, a utility function that expresses some values but not others will tend to trample over the values the function does not reflect. An additional source of concern is that AI "must reason about what people intend rather than carrying out commands literally", and that it must be able to fluidly solicit human guidance if it is too uncertain about what humans want.

=== Corrigibility === Assuming a goal has been successfully defined, a sufficiently advanced AI might resist subsequent attempts to change its goals. If the AI were superintelligent, it would likely succeed in out-maneuvering its human operators and prevent itself from being reprogrammed with a new goal. This is particularly relevant to value lock-in scenarios. The field of "corrigibility" studies how to make agents that will not resist attempts to change their goals.

=== Alignment of superintelligences === Some researchers believe the alignment problem may be particularly difficult when applied to superintelligences. Their reasoning includes: