kb/Big_data-9.md at bfa66fc0f8d5fd91d2e04bab963dcf8e44d1f02e

turtle89431 5f7a7ab0ae Scrape wikipedia-science: 6185 new, 3208 updated, 9666 total (kb-cron)

2026-05-05 02:54:40 -07:00

5.5 KiB

Raw Blame History

title	chunk	source	category	tags	date_saved	instance
Big data	10/10	https://en.wikipedia.org/wiki/Big_data	reference	science, encyclopedia	2026-05-05T09:53:36.506913+00:00	kb-cron

=== Critiques of big data execution === Ulf-Dietrich Reips and Uwe Matzat wrote in 2014 that big data had become a "fad" in scientific research. Researcher danah boyd has raised concerns about the use of big data in science neglecting principles such as choosing a representative sample by being too concerned about handling the huge amounts of data. This approach may lead to results that have a bias in one way or another. Integration across heterogeneous data resources—some that might be considered big data and others not—presents formidable logistical as well as analytical challenges, but many researchers argue that such integrations are likely to represent the most promising new frontiers in science. In the provocative article "Critical Questions for Big Data", the authors title big data a part of mythology: "large data sets offer a higher form of intelligence and knowledge [...], with the aura of truth, objectivity, and accuracy". Users of big data are often "lost in the sheer volume of numbers", and "working with Big Data is still subjective, and what it quantifies does not necessarily have a closer claim on objective truth". Recent developments in BI domain, such as pro-active reporting especially target improvements in the usability of big data, through automated filtering of non-useful data and correlations. Big structures are full of spurious correlations either because of non-causal coincidences (law of truly large numbers), solely nature of big randomness (Ramsey theory), or existence of non-included factors so the hope, of early experimenters to make large databases of numbers "speak for themselves" and revolutionize scientific method, is questioned. Catherine Tucker has pointed to "hype" around big data, writing "By itself, big data is unlikely to be valuable." The article explains: "The many contexts where data is cheap relative to the cost of retaining talent to process it, suggests that processing skills are more important than data itself in creating value for a firm." Big data analysis is often shallow compared to analysis of smaller data sets. In many big data projects, there is no large data analysis happening, but the challenge is the extract, transform, load part of data pre-processing. Big data is a buzzword and a "vague term", but at the same time an "obsession" with entrepreneurs, consultants, scientists, and the media. Big data showcases such as Google Flu Trends failed to deliver good predictions in recent years, overstating the flu outbreaks by a factor of two. Similarly, Academy Awards and election predictions solely based on Twitter were more often off than on target. Big data often poses the same challenges as small data; adding more data does not solve problems of bias, but may emphasize other problems. In particular data sources such as Twitter are not representative of the overall population, and results drawn from such sources may then lead to wrong conclusions. Google Translate—which is based on big data statistical analysis of text—does a good job at translating web pages. However, results from specialized domains may be dramatically skewed. On the other hand, big data may also introduce new problems, such as the multiple comparisons problem: simultaneously testing a large set of hypotheses is likely to produce many false results that mistakenly appear significant. Ioannidis argued that "most published research findings are false" due to essentially the same effect: when many scientific teams and researchers each perform many experiments (i.e. process a big amount of scientific data; although not with big data technology), the likelihood of a "significant" result being false grows fast – even more so, when only positive results are published. Furthermore, big data analytics results are only as good as the model on which they are predicated. In an example, big data took part in attempting to predict the results of the 2016 U.S. presidential election with varying degrees of success.

=== Critiques of big data policing and surveillance === Big data has been used in policing and surveillance by institutions like law enforcement and corporations (see: corporate surveillance and surveillance capitalism). Due to the less visible nature of data-based surveillance as compared to traditional methods of policing, objections to big data policing are less likely to arise. According to Sarah Brayne's Big Data Surveillance: The Case of Policing, big data policing can reproduce existing societal inequalities in three ways:

Placing people under increased surveillance by using the justification of a mathematical and therefore unbiased algorithm Increasing the scope and number of people that are subject to law enforcement tracking and exacerbating existing racial overrepresentation in the criminal justice system Encouraging members of society to abandon interactions with institutions that would create a digital trace, thus creating obstacles to social inclusion If these potential problems are not corrected or regulated, the effects of big data policing may continue to shape societal hierarchies. Conscientious usage of big data policing could prevent individual level biases from becoming institutional biases, Brayne also notes.

== See also ==

== References ==

== Bibliography ==

== Further reading ==

== External links == Media related to Big data at Wikimedia Commons The dictionary definition of big data at Wiktionary

5.5 KiB Raw Blame History Unescape Escape

5.5 KiB

Raw Blame History