kb/data/en.wikipedia.org/wiki/Big_data_ethics-0.md

6.0 KiB
Raw Blame History

title chunk source category tags date_saved instance
Big data ethics 1/3 https://en.wikipedia.org/wiki/Big_data_ethics reference science, encyclopedia 2026-05-05T06:58:22.926497+00:00 kb-cron

Big data ethics, also known simply as data ethics, refers to systemizing, defending, and recommending concepts of right and wrong conduct in relation to data, in particular personal data. Since the dawn of the Internet the sheer quantity and quality of data has dramatically increased and is continuing to do so exponentially. Big data describes this large amount of data that is so voluminous and complex that traditional data processing application software is inadequate to deal with them. Recent innovations in medical research and healthcare, such as high-throughput genome sequencing, high-resolution imaging, electronic medical patient records and a plethora of internet-connected health devices have triggered a data deluge that will reach the exabyte range in the near future. Data ethics is of increasing relevance as the quantity of data increases because of the scale of the impact. Big data ethics are different from information ethics because the focus of information ethics is more concerned with issues of intellectual property and concerns relating to librarians, archivists, and information professionals, while big data ethics is more concerned with collectors and disseminators of structured or unstructured data such as data brokers, governments, and large corporations. However, since artificial intelligence or machine learning systems are regularly built using big data sets, the discussions surrounding data ethics are often intertwined with those in the ethics of artificial intelligence. More recently, issues of big data ethics have also been researched in relation with other areas of technology and science ethics, including ethics in mathematics and engineering ethics, as many areas of applied mathematics and engineering use increasingly large data sets.

== Principles == Data ethics is concerned with the following principles:

Ownership Individuals own their personal data. Transaction transparency If an individual's personal data is used, they should have transparent access to the algorithm design used to generate aggregate data sets. Consent If an individual or legal entity would like to use personal data, one needs informed and explicitly expressed consent of what personal data moves to whom, when, and for what purpose from the owner of the data. Privacy If data transactions occur all reasonable effort needs to be made to preserve privacy. Currency Individuals should be aware of financial transactions resulting from the use of their personal data and the scale of these transactions. Openness Aggregate data sets should be freely available.

=== Ownership === Ownership of data involves determining rights and duties over property, such as the ability to exercise individual control over (including limit the sharing of) personal data comprising one's digital identity. The question of data ownership arises when someone records observations on an individual person: the observer and the observed both state a claim to the data. Questions also arise as to the responsibilities that the observer and the observed have in relation to each other. These questions have become increasingly relevant with the Internet magnifying the scale and systematization of observing people and their thoughts. The question of personal data ownership relates to questions of corporate ownership and intellectual property. In the European Union, some people argue that the General Data Protection Regulation indicates that individuals own their personal data, although this is contested.

=== Transaction transparency === Concerns have been raised around how biases can be integrated into algorithm design resulting in systematic oppressionwhether consciously or unconsciously. These manipulations often stem from biases in the data, the design of the algorithm, or the underlying goals of the organization deploying them. One major cause of algorithmic bias is that algorithms learn from historical data, which may perpetuate existing inequities. In many cases, algorithms exhibit reduced accuracy when applied to individuals from marginalized or underrepresented communities. A notable example of this is pulse oximetry, which has shown reduced reliability for certain demographic groups due to a lack of sufficient testing or information on these populations. Additionally, many algorithms are designed to maximize specific metrics, such as engagement or profit, without adequately considering ethical implications. For instance, companies like Facebook and Twitter have been criticized for providing anonymity to harassers and for allowing racist content disguised as humor to proliferate, as such content often increases engagement. These challenges are compounded by the fact that many algorithms operate as "black boxes" for proprietary reasons, meaning that the reasoning behind their outputs is not fully understood by users. This opacity makes it more difficult to identify and address algorithmic bias. In terms of governance, big data ethics is concerned with which types of inferences and predictions should be made using big data technologies such as algorithms. Anticipatory governance is the practice of using predictive analytics to assess possible future behaviors. This has ethical implications because it affords the ability to target particular groups and places which can encourage prejudice and discrimination For example, predictive policing highlights certain groups or neighborhoods which should be watched more closely than others which leads to more sanctions in these areas, and closer surveillance for those who fit the same profiles as those who are sanctioned. The term "control creep" refers to data that has been generated with a particular purpose in mind but which is repurposed. This practice is seen with airline industry data which has been repurposed for profiling and managing security risks at airports.