kb/data/en.wikipedia.org/wiki/Algorithmic_bias-10.md

5.9 KiB

title chunk source category tags date_saved instance
Algorithmic bias 11/13 https://en.wikipedia.org/wiki/Algorithmic_bias reference science, encyclopedia 2026-05-05T16:31:03.393915+00:00 kb-cron

=== Lack of transparency === Commercial algorithms are proprietary, and may be treated as trade secrets. Treating algorithms as trade secrets protects companies, such as search engines, where a transparent algorithm might reveal tactics to manipulate search rankings. This makes it difficult for researchers to conduct interviews or analysis to discover how algorithms function. Critics suggest that such secrecy can also obscure possible unethical methods used in producing or processing algorithmic output. Other critics, such as lawyer and activist Katarzyna Szymielewicz, have suggested that the lack of transparency is often disguised as a result of algorithmic complexity, shielding companies from disclosing or investigating its own algorithmic processes.

=== Lack of data about sensitive categories === A significant barrier to understanding the tackling of bias in practice is that categories, such as demographics of individuals protected by anti-discrimination law, are often not explicitly considered when collecting and processing data. In some cases, there is little opportunity to collect this data explicitly, such as in device fingerprinting, ubiquitous computing and the Internet of Things. In other cases, the data controller may not wish to collect such data for reputational reasons, or because it represents a heightened liability and security risk. It may also be the case that, at least in relation to the European Union's General Data Protection Regulation, such data falls under the 'special category' provisions (Article 9), and therefore comes with more restrictions on potential collection and processing. Some practitioners have tried to estimate and impute these missing sensitive categorizations in order to allow bias mitigation, for example building systems to infer ethnicity from names, however this can introduce other forms of bias if not undertaken with care. Machine learning researchers have drawn upon cryptographic privacy-enhancing technologies such as secure multi-party computation to propose methods whereby algorithmic bias can be assessed or mitigated without these data ever being available to modellers in cleartext. Algorithmic bias does not only include protected categories, but can also concern characteristics less easily observable or codifiable, such as political viewpoints. In these cases, there is rarely an easily accessible or non-controversial ground truth, and removing the bias from such a system is more difficult. Furthermore, false and accidental correlations can emerge from a lack of understanding of protected categories, for example, insurance rates based on historical data of car accidents which may overlap, strictly by coincidence, with residential clusters of ethnic minorities.

== Solutions == A study of 84 policy guidelines on ethical AI found that fairness and "mitigation of unwanted bias" was a common point of concern, and were addressed through a blend of technical solutions, transparency and monitoring, right to remedy and increased oversight, and diversity and inclusion efforts.

=== Technical ===

There have been several attempts to create methods and tools that can detect and observe biases within an algorithm. These emergent fields focus on tools which are typically applied to the (training) data used by the program rather than the algorithm's internal processes. These methods may also analyze a program's output and its usefulness and therefore may involve the analysis of its confusion matrix (or table of confusion). Explainable AI to detect algorithm Bias is a suggested way to detect the existence of bias in an algorithm or learning model. Using machine learning to detect bias is called, "conducting an AI audit", where the "auditor" is an algorithm that goes through the AI model and the training data to identify biases. Ensuring that an AI tool such as a classifier is free from bias is more difficult than just removing the sensitive information from its input signals, because this is typically implicit in other signals. For example, the hobbies, sports and schools attended by a job candidate might reveal their gender to the software, even when this is removed from the analysis. Solutions to this problem involve ensuring that the intelligent agent does not have any information that could be used to reconstruct the protected and sensitive information about the subject, as first demonstrated in where a deep learning network was simultaneously trained to learn a task while at the same time being completely agnostic about the protected feature. A simpler method was proposed in the context of word embeddings, and involves removing information that is correlated with the protected characteristic. Currently, a new IEEE standard is being drafted that aims to specify methodologies which help creators of algorithms eliminate issues of bias and articulate transparency (i.e. to authorities or end users) about the function and possible effects of their algorithms. The project was approved February 2017 and is sponsored by the Software & Systems Engineering Standards Committee, a committee chartered by the IEEE Computer Society. A draft of the standard is expected to be submitted for balloting in June 2019.The standard was published in January 2025. In 2022, the IEEE released a standard aimed at specifying methodologies to help creators of algorithms address issues of bias and promote transparency regarding the function and potential effects of their algorithms. The project, initially approved in February 2017, was sponsored by the Software & Systems Engineering Standards Committee, a committee under the IEEE Computer Society. The standard provides guidelines for articulating transparency to authorities or end users and mitigating algorithmic biases.

=== Transparency and monitoring ===