kb/AI_alignment-6.md at 7f62396ec0fa2f7dd9894ecdee05736bfc350c41

turtle89431 173b4d5af1 Scrape wikipedia-science: 20542 new, 4794 updated, 25967 total (kb-cron)

2026-05-05 09:31:32 -07:00

2.9 KiB

Raw Blame History

title	chunk	source	category	tags	date_saved	instance
AI alignment	7/7	https://en.wikipedia.org/wiki/AI_alignment	reference	science, encyclopedia	2026-05-05T16:30:59.555255+00:00	kb-cron

=== Conservatism === Conservatism is the idea that "change must be cautious", and is a common approach to safety in the control theory literature in the form of robust control, and in the risk management literature in the form of the "worst-case scenario". The field of AI alignment has likewise advocated for "conservative" (or "risk-averse" or "cautious") "policies in situations of uncertainty". Pessimism, in the sense of assuming the worst within reason, has been formally shown to produce conservatism, in the sense of reluctance to cause novelties, including unprecedented catastrophes. Pessimism and worst-case analysis have been found to help mitigate confident mistakes in the setting of distributional shift, reinforcement learning, offline reinforcement learning, language model fine-tuning, imitation learning, and optimization in general.

== Public policy ==

Governmental and treaty organizations have made statements emphasizing the importance of AI alignment. In September 2021, the Secretary-General of the United Nations issued a declaration that included a call to regulate AI to ensure it is "aligned with shared global values". That same month, the PRC published ethical guidelines for AI in China. According to the guidelines, researchers must ensure that AI abides by shared human values, is always under human control, and does not endanger public safety. Also in September 2021, the UK published its 10-year National AI Strategy, which says the British government "takes the long term risk of non-aligned Artificial General Intelligence, and the unforeseeable changes that it would mean for [...] the world, seriously". The strategy describes actions to assess long-term AI risks, including catastrophic risks. In March 2021, the US National Security Commission on Artificial Intelligence said: "Advances in AI [...] could lead to inflection points or leaps in capabilities. Such advances may also introduce new concerns and risks and the need for new policies, recommendations, and technical advances to ensure that systems are aligned with goals and values, including safety, robustness, and trustworthiness. The US should [...] ensure that AI systems and their uses align with our goals and values." In the European Union, AIs must align with substantive equality to comply with EU non-discrimination law and the Court of Justice of the European Union. But the EU has yet to specify with technical rigor how it would evaluate whether AIs are aligned or in compliance.

== See also ==

== Footnotes ==

== References ==

== Further reading == Ngo, Richard; et al. (2024). "The Alignment Problem from a Deep Learning Perspective". ICLR: 7474–7501. Ji, Jiaming; et al. (2023). "AI Alignment: A Comprehensive Survey". ACM Computing Surveys. doi:10.1145/3770749.

2.9 KiB Raw Blame History Unescape Escape

2.9 KiB

Raw Blame History