Scrape wikipedia-science: 703 new, 17 updated, 742 total (kb-cron)

2026-05-04 20:15:39 -07:00 · 2026-05-04 20:15:39 -07:00 · 06aa9cd531
commit 06aa9cd531
parent e94eece30b
27 changed files with 1008 additions and 0 deletions
--- a/_index.db
+++ b/_index.db
--- a/data/en.wikipedia.org/wiki/Clinical_study_design-0.md
+++ b/data/en.wikipedia.org/wiki/Clinical_study_design-0.md
@ -0,0 +1,74 @@
+---
+title: "Clinical study design"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Clinical_study_design"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:15:27.642273+00:00"
+instance: "kb-cron"
+---
+
+Clinical study design is the formulation of clinical trials and other experiments, as well as observational studies, in medical research involving human beings and involving clinical aspects, including  epidemiology . It is the design of experiments as applied to these fields. The goal of a clinical study is to assess the safety, efficacy, and / or the mechanism of action of an investigational medicinal product (IMP) or procedure, or new drug or device that is in development, but potentially not yet approved by a health authority (e.g. Food and Drug Administration). It can also be to investigate a drug, device or procedure that has already been approved but is still in need of further investigation, typically with respect to long-term effects or cost-effectiveness.
+Some of the considerations here are shared under the more general topic of design of experiments but there can be others, in particular related to patient confidentiality and medical ethics.
+
+
+== Outline of types of designs for clinical studies ==
+
+
+=== Treatment studies ===
+Randomized controlled trial
+Blind trial
+Non-blind trial
+Adaptive clinical trial
+Platform Trials
+Nonrandomized trial (quasi-experiment)
+Interrupted time series design (measures on a sample or a series of samples from the same population are obtained several times before and after a manipulated event or a naturally occurring event) - considered a type of quasi-experiment
+Single-arm study design
+
+
+=== Observational studies ===
+1. Descriptive
+
+Case report
+Case series
+Population study
+2. Analytical
+
+Cohort study
+Prospective cohort
+Retrospective cohort
+Time series study
+Case-control study
+Nested case-control study
+Cross-sectional study
+Community survey (a type of cross-sectional study)
+Ecological study
+
+
+== Important considerations ==
+When choosing a study design, many factors must be taken into account.  Different types of studies are subject to different types of bias.  For example, recall bias is likely to occur in cross-sectional or case-control studies where subjects are asked to recall exposure to risk factors.  Subjects with the relevant condition (e.g. breast cancer) may be more likely to recall the relevant exposures that they had undergone (e.g. hormone replacement therapy) than subjects who don't have the condition.
+The ecological fallacy may occur when conclusions about individuals are drawn from analyses conducted on grouped data.  The nature of this type of analysis tends to overestimate the degree of association between variables.
+
+
+=== Seasonal studies ===
+Conducting studies in seasonal indications (such as allergies, Seasonal Affective Disorder, influenza, and others) can complicate a trial as patients must be enrolled quickly. Additionally, seasonal variations and weather patterns can affect a seasonal study.
+
+
+== Other terms ==
+The term retrospective study is sometimes used as another term for a case-control study. This use of the term "retrospective study" is misleading, however, and should be avoided because other research designs besides case-control studies are also retrospective in orientation.
+Superiority trials are designed to demonstrate that one treatment is more effective than a given reference treatment. This type of study design is often used to test the effectiveness of a treatment compared to placebo or to the currently best available treatment.
+Non-inferiority trials are designed to demonstrate that a treatment is at least not appreciably less effective than a given reference treatment. This type of study design is often employed when comparing a new treatment to an established medical standard of care, in situations where the new treatment is cheaper, safer or more convenient than the reference treatment and would therefore be preferable if not appreciably less effective.
+Equivalence trials are designed to demonstrate that two treatments are equally effective.
+When using "parallel groups", each patient receives one treatment; in a "crossover study", each patient receives several treatments but in different order.
+A longitudinal study assesses research subjects over two or more points in time; by contrast, a cross-sectional study assesses research subjects at only one point in time (so case-control, cohort, and randomized studies are not cross-sectional).
+
+
+== See also ==
+
+
+== References ==
+
+
+== External links ==
+Some aspects of study design Tufts University web site
+Comparison of strength Description of study designs from the National Cancer Institute
--- a/data/en.wikipedia.org/wiki/Consilience-0.md
+++ b/data/en.wikipedia.org/wiki/Consilience-0.md
@ -0,0 +1,24 @@
+---
+title: "Consilience"
+chunk: 1/2
+source: "https://en.wikipedia.org/wiki/Consilience"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:15:28.899393+00:00"
+instance: "kb-cron"
+---
+
+In science and history, consilience (also convergence of evidence or concordance of evidence) is the principle that evidence from independent, unrelated sources can "converge" on strong conclusions. That is, when multiple sources of evidence are in agreement, the conclusion can be very strong even when none of the individual sources of evidence is significantly so on its own. Most established scientific knowledge is supported by a convergence of evidence: if not, the evidence is comparatively weak, and there will probably not be a strong scientific consensus.
+The principle is based on unity of knowledge; measuring the same result by several different methods should lead to the same answer. For example, it should not matter whether one measures distances within the Giza pyramid complex by laser rangefinding, by satellite imaging, or with a metre-stick – in all three cases, the answer should be approximately the same. For the same reason, different dating methods in geochronology should concur, a result in chemistry should not contradict a result in geology, etc.
+The word consilience was originally coined as the phrase "consilience of inductions" by William Whewell (consilience refers to a "jumping together" of knowledge). The word comes from Latin com- "together" and -siliens "jumping" (as in resilience).
+
+== Description ==
+Consilience requires the use of independent methods of measurement, meaning that the methods have few shared characteristics. That is, the mechanism by which the measurement is made is different; each method is dependent on an unrelated natural phenomenon. For example, the accuracy of laser range-finding measurements is based on the scientific understanding of lasers, while satellite pictures and metre-sticks (or yardsticks) rely on different phenomena. Because the methods are independent, when one of several methods is in error, it is very unlikely to be in error in the same way as any of the other methods, and a difference between the measurements will be observed. If the scientific understanding of the properties of lasers was inaccurate, then the laser measurement would be inaccurate but the others would not.
+As a result, when several different methods agree, this is strong evidence that none of the methods are in error and the conclusion is correct. This is because of a greatly reduced likelihood of errors: for a consensus estimate from multiple measurements to be wrong, the errors would have to be similar for all samples and all methods of measurement, which is extremely unlikely. Random errors will tend to cancel out as more measurements are made, due to regression to the mean; systematic errors will be detected by differences between the measurements and will also tend to cancel out since the direction of the error will still be random. This is how scientific theories reach high confidence—over time, they build up a large degree of evidence which converges on the same conclusion.
+When results from different strong methods do appear to conflict, this is treated as a serious problem to be reconciled. For example, in the 19th century, the Sun appeared to be no more than 20 million years old, but the Earth appeared to be no less than 300 million years (resolved by the discovery of nuclear fusion and radioactivity, and the theory of quantum mechanics); or current attempts to resolve theoretical differences between quantum mechanics and general relativity.
+
+== Significance ==
+Because of consilience, the strength of evidence for any particular conclusion is related to how many independent methods are supporting the conclusion, as well as how different these methods are. Those techniques with the fewest (or no) shared characteristics provide the strongest consilience and result in the strongest conclusions. This also means that confidence is usually strongest when considering evidence from different fields because the techniques are usually very different.
+For example, the theory of evolution is supported by a convergence of evidence from genetics, molecular biology, paleontology, geology, biogeography, comparative anatomy, comparative physiology, and many other fields. In fact, the evidence within each of these fields is itself a convergence providing evidence for the theory. As a result, to disprove evolution, most or all of these independent lines of evidence would have to be found to be in error. The strength of the evidence, considered together as a whole, results in the strong scientific consensus that the theory is correct. In a similar way, evidence about the history of the universe is drawn from astronomy, astrophysics, planetary geology, and physics.
+Finding similar conclusions from multiple independent methods is also evidence for the reliability of the methods themselves, because consilience eliminates the possibility of all potential errors that do not affect all the methods equally. This is also used for the validation of new techniques through comparison with the consilient ones. If only partial consilience is observed, this allows for the detection of errors in methodology; any weaknesses in one technique can be compensated for by the strengths of the others. Alternatively, if using more than one or two techniques for every experiment is infeasible, some of the benefits of consilience may still be obtained if it is well-established that these techniques usually give the same result.
+Consilience is important across all of science, including the social sciences, and is often used as an argument for scientific realism by philosophers of science. Each branch of science studies a subset of reality that depends on factors studied in other branches. Atomic physics underlies the workings of chemistry, which studies emergent properties that in turn are the basis of biology. Psychology is not separate from the study of properties emergent from the interaction of neurons and synapses. Sociology, economics, and anthropology are each, in turn, studies of properties emergent from the interaction of countless individual humans. The concept that all the different areas of research are studying one real, existing universe is an apparent explanation of why scientific knowledge determined in one field of inquiry has often helped in understanding other fields.
--- a/data/en.wikipedia.org/wiki/Consilience-1.md
+++ b/data/en.wikipedia.org/wiki/Consilience-1.md
@ -0,0 +1,57 @@
+---
+title: "Consilience"
+chunk: 2/2
+source: "https://en.wikipedia.org/wiki/Consilience"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:15:28.899393+00:00"
+instance: "kb-cron"
+---
+
+== Deviations ==
+Consilience does not forbid deviations: in fact, since not all experiments are perfect, some deviations from established knowledge are expected. However, when the convergence is strong enough, then new evidence inconsistent with the previous conclusion is not usually enough to outweigh that convergence. Without an equally strong convergence on the new result, the weight of evidence will still favor the established result. This means that the new evidence is most likely to be wrong.
+Science denialism (for example, AIDS denialism) is often based on a misunderstanding of this property of consilience. A denier may promote small gaps not yet accounted for by the consilient evidence, or small amounts of evidence contradicting a conclusion without accounting for the pre-existing strength resulting from consilience. More generally, to insist that all evidence converge precisely with no deviations would be naïve falsificationism, equivalent to considering a single contrary result to falsify a theory when another explanation, such as equipment malfunction or misinterpretation of results, is much more likely.
+
+== In history ==
+Historical evidence also converges in an analogous way. For example: if five ancient historians, none of whom knew each other, all claim that Julius Caesar seized power in Rome in 49 BCE, this is strong evidence in favor of that event occurring even if each individual historian is only partially reliable. By contrast, if the same historian had made the same claim five times in five different places (and no other types of evidence were available), the claim is much weaker because it originates from a single source. The evidence from the ancient historians could also converge with evidence from other fields, such as archaeology: for example, evidence that many senators fled Rome at the time, that the battles of Caesar's civil war occurred, and so forth.
+Consilience has also been discussed in reference to Holocaust denial. 
+
+"We [have now discussed] eighteen proofs all converging on one conclusion...the deniers shift the burden of proof to historians by demanding that each piece of evidence, independently and without corroboration between them, prove the Holocaust. Yet no historian has ever claimed that one piece of evidence proves the Holocaust. We must examine the collective whole."
+That is, individually the evidence may underdetermine the conclusion, but together they overdetermine it. A similar way to state this is that to ask for one particular piece of evidence in favor of a conclusion is a flawed question.
+
+== Outside the sciences ==
+In addition to the sciences, consilience can be important to the arts, ethics and religion. Both artists and scientists have identified the importance of biology in the process of artistic innovation.
+
+== History of the concept ==
+Consilience has its roots in the ancient Greek concept of an intrinsic orderliness that governs our cosmos, inherently comprehensible by logical process, a vision at odds with mystical views in many cultures that surrounded the Hellenes. The rational view was recovered during the high Middle Ages, separated from theology during the Renaissance and found its apogee in the Age of Enlightenment.
+Whewell's definition was that:
+
+The Consilience of Inductions takes place when an Induction, obtained from one class of facts, coincides with an Induction obtained from another different class.  Thus Consilience is a test of the truth of the Theory in which it occurs.
+More recent descriptions include:
+
+"Where there is a convergence of evidence, where the same explanation is implied, there is increased confidence in the explanation. Where there is divergence, then either the explanation is at fault or one or more of the sources of information is in error or requires reinterpretation."
+"Proof is derived through a convergence of evidence from numerous lines of inquiry—multiple, independent inductions, all of which point to an unmistakable conclusion."
+
+== Edward O. Wilson ==
+Although the concept of consilience in Whewell's sense was widely discussed by philosophers of science, the term was unfamiliar to the broader public until the end of the 20th century, when it was revived in Consilience: The Unity of Knowledge, a 1998 book by the author and biologist E. O. Wilson, as an attempt to bridge the cultural gap between the sciences and the humanities that was the subject of C. P. Snow's The Two Cultures and the Scientific Revolution (1959).  Wilson believed that "the humanities, ranging from philosophy and history to moral reasoning, comparative religion, and interpretation of the arts, will draw closer to the sciences and partly fuse with them" with the result that science and the scientific method, from within this fusion, would not only explain the physical phenomenon but also provide moral guidance and be the ultimate source of all truths.
+Wilson held that with the rise of the modern sciences, the sense of unity gradually was lost in the increasing fragmentation and specialization of knowledge in the last two centuries. He asserted that the sciences, humanities, and arts have a common goal: to give a purpose to understand the details, to lend to all inquirers "a conviction, far deeper than a mere working proposition, that the world is orderly and can be explained by a small number of natural laws." An important point made by Wilson is that hereditary human nature and evolution itself profoundly affect the evolution of culture, in essence, a sociobiological concept. Wilson's concept is a much broader notion of consilience than that of Whewell, who was merely pointing out that generalizations invented to account for one set of phenomena often account for others as well.
+A parallel view lies in the term universology, which literally means "the science of the universe." Universology was first promoted for the study of the interconnecting principles and truths of all domains of knowledge by Stephen Pearl Andrews, a 19th-century utopian futurist and anarchist.
+
+== See also ==
+Appeal to tradition – Logical fallacy in which a thesis is deemed correct on the basis of tradition
+Appeal to authority – Logical fallacy
+Equifinality – Principle in systems theory
+Philosophy of science § Coherentism – Branch of philosophy
+Scientific method – Interplay between observation, experiment, and theory in science
+Syncretism – Combination of beliefs and traditions
+Tree of knowledge system – Map of history from Big Bang to present
+Unified Science
+
+== Notes ==
+
+== References ==
+
+== External links ==
+
+A conversation with Edward O. Wilson
+William Whewell in the Stanford Encyclopedia of Philosophy
--- a/data/en.wikipedia.org/wiki/Construct_(philosophy)-0.md
+++ b/data/en.wikipedia.org/wiki/Construct_(philosophy)-0.md
@ -0,0 +1,28 @@
+---
+title: "Construct (philosophy)"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Construct_(philosophy)"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:15:30.059307+00:00"
+instance: "kb-cron"
+---
+
+In philosophy, a construct is an object which is ideal, that is, an object of the mind or of thought, meaning that its existence may be said to depend upon a subject's mind. This contrasts with any possibly mind-independent objects, the existence of which purportedly does not depend on the existence of a conscious observing subject. Thus, the distinction between these two terms may be compared to that between phenomenon and noumenon in other philosophical contexts and to many of the typical definitions of the terms realism and idealism also. In the correspondence theory of truth, ideas, such as constructs, are to be judged and checked according to how well they correspond with their referents, often conceived as part of a mind-independent reality.
+
+
+== Overview ==
+As mind-dependent objects, concepts that are typically viewed as constructs include the abstract objects designated by such symbols as 3 or 4, or words such as liberty or cold as they are seen as a result of induction or abstraction that can be later applied to observable objects or compared to other constructs. Therefore, scientific hypotheses and theories (e.g. evolutionary theory, gravitational theory), as well as classifications (for example, in biological taxonomy), are also conceptual entities often considered to be constructs in the aforementioned sense. In contrast, most everyday, concrete things that surround the observer can be classified as objective (in the sense of being "real," that is, believed to be existing externally to the observer).
+How much of what the observer perceives is objective is controversial, so the exact definition of constructs varies greatly across different views and philosophies. The view that the senses capture most or all of the properties of external objects directly is usually associated with the term direct realism. Many forms of nominalism ascribe the process of conceptual construction to language itself, for instance, constructing the idea of "fishness" by drawing distinctions between the word "fish" and other words (such as "rock") or through some kind of resemblance between the referents that the class implied by the word encompasses. Conversely, Platonic idealism generally maintains that a "reality" independent of the subject exists, though this reality is seen as ideal, not physical or material, and so it cannot be known by the senses. As such, the idea of "liberty" or "coldness" is just as real as that of "rockness" or "fishness."
+The creation of constructs is a part of operationalization, especially the creation of theoretical definitions. The usefulness of one conceptualization over another depends largely on construct validity. To address the non-observability of constructs, U.S. federal agencies such as the National Institutes of Health and the National Cancer Institute have created a construct database termed Grid-Enabled Measures (GEM) to improve construct use and reuse.
+In the philosophy of science, particularly in reference to scientific theories, a hypothetical construct is an explanatory variable which is not directly observable. For example, the concepts of intelligence and motivation are used to explain phenomena in psychology, but neither is directly observable. A hypothetical construct differs from an intervening variable in that it has properties and implications which have not been demonstrated in empirical research. These serve as a guide to further research. An intervening variable, on the other hand, is a summary of observed empirical findings.
+
+
+== History ==
+Cronbach and Meehl (1955) define a hypothetical construct as a concept for which there is not a single observable referent, which cannot be directly observed, and for which there exist multiple referents, but none all-inclusive. For example, according to Cronbach and Meehl a fish is not a hypothetical construct because, despite variation in species and varieties of fish, there is an agreed upon definition for a fish with specific characteristics that distinguish a fish from a bird. Furthermore, a fish can be directly observed. On the other hand, a hypothetical construct has no single referent; rather, hypothetical constructs consist of groups of functionally related behaviors, attitudes, processes, and experiences. Instead of seeing intelligence, love, or fear we see indicators or manifestations of what we have agreed to call intelligence, love, or fear.
+
+McCorquodale and Meehl (1948) discussed the distinction between what they called intervening variables and these hypothetical constructs. They describe hypothetical constructs as containing surplus meaning, as they imply more than just the operations by which they are measured.
+In the positivist tradition, Boring (1923) described intelligence as whatever the intelligence test measures.  As a reaction to such operational definitions, Cronbach and Meehl (1955) emphasized the necessity of viewing constructs like intelligence as hypothetical constructs. They asserted that there is no adequate criterion for the operational definition of constructs like abilities and personality. Thus, according to Cronbach and Meehl (1955), a useful construct of intelligence or personality should imply more than simply test scores. Instead, these constructs should predict a wide range of behaviors.
+
+
+== References ==
--- a/data/en.wikipedia.org/wiki/Consumer_demand_tests_(animals)-0.md
+++ b/data/en.wikipedia.org/wiki/Consumer_demand_tests_(animals)-0.md
@ -0,0 +1,71 @@
+---
+title: "Consumer demand tests (animals)"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Consumer_demand_tests_(animals)"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:15:31.295110+00:00"
+instance: "kb-cron"
+---
+
+Consumer demand tests for animals are studies designed to measure the relative strength of an animal's motivation to obtain resources such as different food items. Such demand tests quantify the strength of motivation animals have for resources whilst avoiding anthropomorphism and anthropocentrism.
+The test results are analogous to human patterns of purchasing resources with a limited income. For humans, the cost of resources is usually measured in money; in animal studies the cost is usually represented by energy required, time taken or a risk of injury. Costs of resources can be imposed on animals by an operant task (e.g. lever-pressing), a natural aversion (e.g. crossing water), or a homeostatic challenge (e.g. increased body temperature). Humans usually decrease the amount of an item purchased (or consumed) as the cost of that item increases. Similarly, animals tend to consume less of an item as the cost of that item increases (e.g. more lever presses required).
+Using consumer demand tests one can empirically determine the strength of motivation animals have for a definite need (e.g. food, water) and also for resources we humans might perceive as a luxury or unnecessary but animals might not (e.g. sand for dustbathing or additional space for caged mice). By comparing the strength of motivation for the resource with that for a definite need, we can measure the importance of a resource as perceived by the animals. Animals will be most highly motivated to interact with resources they absolutely need, highly motivated for resources that they perceive as most improving their welfare, and less motivated for resources they perceive as less important. Furthermore, argument by analogy indicates that as with humans, it is more likely that animals will experience negative affective states (e.g. frustration, anxiety) if they are not provided with the resources for which they show high motivation.
+Various other aspects of the animal's behaviour can be measured to aid understanding of motivation for resources, e.g. latency (delay) to approach the point of access, speed of incurring the cost, time with each resource, or the range of activities with each of the resources. These measures can be recorded either by the experimenter or by motion detecting software. Prior to testing, the animals are usually given the opportunity to explore the apparatus and variants to habituate and reduce the effects of novelty.
+
+
+== Terminology ==
+The rate (i.e. regression line) at which the animal decreases its acquisition or consumption of a resource as the cost increases is known as the elasticity of demand. A steep slope of decreasing access indicates a relatively low motivation for a resource, sometimes called 'high elasticity'; a shallow slope indicates relatively high motivation for a resource, sometimes called 'low elasticity', or 'inelastic demand.'
+The 'break point' is the cost at which inelastic demand becomes elastic, i.e. the cost at which constant consumption begins to decrease.
+
+
+== Types of cost ==
+
+
+=== Operant ===
+Lever pressing
+Weighted door
+Breaking light beam
+Wheel running
+
+
+=== Natural aversion ===
+Water traverse
+Air blast
+Long distances
+
+
+=== Homeostatic challenge ===
+Body temperature
+
+
+== Examples ==
+
+
+=== Flooring ===
+Manser et al. showed that laboratory rats were motivated to lift a door weighing 83% of their body weight to allow them to rest on a solid floor rather than on a grid floor, despite their having been kept on grid floors for over 6 months.
+
+
+=== Lighting ===
+Baldwin showed that when animals were given control of their lighting with the equivalent of an on/off switch, pigs kept lights on for 72% of the time and sheep for 82%. However, when the pigs had to work for the light by keeping their snout within a photo-beam, they only kept the lights on for 0.5% of the time, indicating that light was
+a weak reinforcement for this species.  Savory and Duncan showed that individual hens kept in a background of darkness were prepared to work for 4 hours of light per day.
+
+
+=== Burrowing substrate ===
+Sherwin et al. examined the strength of motivation for burrowing substrate in laboratory mice. Despite an increasing cost of gaining access, the mice continued to work to visit the burrowing substrate.  In addition, it was shown that it was the performance of burrowing behaviour that was important to the mice, not simply the functional consequences of the behaviour. King and Welsman showed that when bar pressing gave deermice access to sand, they increased their rate of bar pressing as the number of presses to access the sand was increased.
+
+
+=== Nest box ===
+Duncan and Kite showed that hens were highly motivated to gain access to a nest box, particularly immediately prior to oviposition. The hens would push a weighted door, or walk through water or an air blast to reach a nest box. Duncan and Kite suggested the strength of this motivation was equivalent to that of the strength of motivation to feed after 20 hours deprivation.
+
+
+=== Social contact ===
+Several studies have examined the motivation of animals for social contact either with their offspring or conspecifics.
+
+
+== See also ==
+Preference tests (animals)
+Consumer theory
+
+
+== References ==
--- a/data/en.wikipedia.org/wiki/Cross_impact_analysis-0.md
+++ b/data/en.wikipedia.org/wiki/Cross_impact_analysis-0.md
@ -0,0 +1,34 @@
+---
+title: "Cross impact analysis"
+chunk: 1/2
+source: "https://en.wikipedia.org/wiki/Cross_impact_analysis"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:15:32.477101+00:00"
+instance: "kb-cron"
+---
+
+Cross-impact analysis is a methodology developed by Theodore J. Gordon and Olaf Helmer in 1966 to help determine how relationships between events would impact resulting events and reduce uncertainty in the future. The Central Intelligence Agency (CIA) became interested in the methodology in the late 1960s and early 1970s as an analytic technique for predicting how different factors and variables would impact future decisions. In the mid-1970s, futurists began to use the methodology in larger numbers as a means to predict the probability of specific events and determine how related events impacted one another. By 2006, cross-impact analysis matured into a number of related methodologies with uses for businesses and communities as well as futurists and intelligence analysts.
+
+== Development ==
+The basic principles of cross-impact analysis date back to the late 1960s, but the original processes were relatively simple and were based on a game design. Eventually, advanced techniques, methodologies, and programs were developed to apply the principles of cross-impact analysis, and the basic method is now applied in futures think tanks, business settings, and the intelligence community. Theodore J. Gordon writes that cross-impact analysis was the result of a question: "can forecasting be based on perceptions about how future events may interact?"
+The first format of the method was a card game titled Future, where events were determined by probabilities, a special die, and impacts from previously played events. This initial game format of cross-impact analysis was programmed for computers at UCLA in 1968. From this point on, the methodology underwent increasing development and sophistication to meet certain needs and conditions of users.
+As cross-impact analysis expanded in the early 1970s, researchers and futurists modified the basic principles to improve on the methodology. In 1972, researchers at the Institute for the Future added time-series instead of "Slice of Time", Norman Dalkey used conditional probabilities, and Julius Kane developed "KSIM", a simulation technique that used interactions between time series variables rather than events. In 1974, Duperrin and Godet developed Cross Impact Systems and Matrices (or SMIC) in France for prospective forecasting studies.
+Advancements in simulation models continued into the 1980s. In 1980, Selwyn Enzer at the University of California incorporated cross-impact analysis into a simulation method known as Interax, The Delphi technique was combined with Cross Impact Analysis in 1984, and researchers at Texas A&M University used Cross Impact in a process called "EZ-IMPACT" that was based on Kane's algorithm from KSIM.
+After simulation models and methods were developed for cross-impact analysis, analysts began to develop the range of topics that it could address. Cross-impact analysis was being used to solve real world issues as John Stover applied the methodology to simulate the economy of Uruguay. However, real world application of the methodology advanced rapidly in the 1990s. By 1993, SMIC was used for subjects as diverse as the nuclear industry, world geopolitical evolution, and corporate activities and jobs to 2000. In 1999, Robert Blanning and Bruce Reinig from the Owen Graduate School of Management at Vanderbilt University utilized a modified form of cross-impact analysis to determine futures for Hong Kong and the Hong Kong economy as the United Kingdom relinquished control to the People's Republic of China.
+
+== Methodology ==
+Cross-impact analysis has two schools of thought and ways of approach. The first is the futures forecasting style that originally developed the methodology. The second is a sub-school of intelligence analysts which modified the original methodology to better address their needs. Nevertheless, cross-impact analysis is based upon the premise that events and activities do not happen in a vacuum and other events and the surrounding environment can significantly influence the probability of certain events to occur.
+Cross-impact analysis attempts to connect relationships between events and variables. These relationships are then categorized as positive or negative relative to each other, and are used to determine which events or scenarios are most probable or likely to occur within a given time frame.
+
+=== Futures forecasting style ===
+The futures forecasting style is based in the systems and methods developed during the 1970s and 1980s and follows several strict steps.
+First, analysts must consider the number and type of events to be considered in the analysis and create an event set. Because each event will have an interaction with every other event, Gordon recommends that 10–40 events be used.
+Second, analysts must take the initial probability of each event into account. The probabilities of events must be taken in isolation from one another.
+Third, analysts need to generate conditional probabilities that events have on each other. Basically, this asks the question, "If event 'A' occurs, what is the new probability of event 'B' occurring?" This must be done for every possible interaction between events.
+Fourth, analysts must test their initial conditional probabilities to ensure that there are no mathematical errors. This is usually done by running simulations in a computer several times.
+Fifth, analysts can run the analysis to determine future scenarios, or determine how significant other events are to specific events.
+
+==== Mathematical technique ====
+The futurist forecasting style of cross-impact analysis relies heavily on probabilities and mathematics in its processes. Initial probabilities and conditional probabilities are calculated using either percentages or factor numbers equivalent to percentages. Researchers must calculate the numerical values or percentages very precisely to ensure accurate results and that impacts of events on each other are realistic and not contradictory. In addition, researchers must be careful when calculating negative impacts as the negative influence can create mathematical impossibilities.
+This mathematical strictness makes the futurist forecasting style of cross-impact analysis uniform and differences in actual analytic methods, simulations and programs have only minor differences to fit the needs of the specific researcher or analyst.
--- a/data/en.wikipedia.org/wiki/Cross_impact_analysis-1.md
+++ b/data/en.wikipedia.org/wiki/Cross_impact_analysis-1.md
@ -0,0 +1,45 @@
+---
+title: "Cross impact analysis"
+chunk: 2/2
+source: "https://en.wikipedia.org/wiki/Cross_impact_analysis"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:15:32.477101+00:00"
+instance: "kb-cron"
+---
+
+==== Relationship to Delphi technique ====
+The accuracy of the math and specific events requires special expertise in the events or topic of discussion. In order to get the insight needed to get events and calculations, analysts typically contact a large number of experts and ask their opinions on events or probabilities in person as groups or through surveys.
+These groupings often resemble the Delphi Technique, which is an analytic technique that gathers a group of experts on a subject together and asks their opinion on a scenario or prediction. Usually, analysts consider the average prediction or scenario as the most likely to occur. The two are so closely related, that analysts often use the two techniques in combination or as part of a larger methodology.
+
+==== Strengths ====
+The futurist forecasting style of cross-impact analysis carries a few key strengths. Its use of groups of experts ensures a number of opinions worth considering when calculating probabilities of events. The level of mathematics in calculating probabilities ensures that the results are as accurate as a researcher can make them. In addition, when used on consort with other analytic techniques, this type of cross-impact analysis can give greater quantitative results to an otherwise qualitative analysis. The relative conformity of methods ensures that analysts using different methods or simulations can come to similar results, making the results testable in a broader setting.
+
+==== Weaknesses ====
+Many of the strengths of the futurist forecasting style of cross-impact analysis give rise to many of its weaknesses. The conformity of the style generates a certain level of inflexibility when dealing with variables other than events, like environmental conditions or political issues. In addition, the severe level of mathematics involved in this style leads to long delays as scenarios must be run to ensure mathematical accuracy of probabilities, or particular issues with Bayes' theorem appear. The level of math also require researchers to either be knowledgeable in math or additional computer programs to deal with the scenarios and probabilities of the method.
+
+=== Intelligence analysis style ===
+Shortly after Theodore Gordon and Olaf Helmer developed the original cross-impact method, the United States intelligence community picked up the technique and has been using it for over thirty years.
+While the basic premise of relationships and impacts between multiple variables remains the same, the intelligence community modified cross-impact analysis to meet its various needs.
+The intelligence community has created a more flexible and variable system than the original methodology. Event relationships and impacts are still similar to the method incorporated by futurists. However, intelligence analysts have expanded the parameters of cross-impact analysis beyond comparing events to include variables like environment, political circumstances, and popular opinion to influence probabilities of certain events. In addition, intelligence analysts can choose to use more flexible measurements like "enhancing", "inhibiting", or "unrelated" instead of the rigid mathematics of the tradition methodology to include non-event variables.
+
+==== Cross-impact matrix ====
+A major part of the intelligence analysis style of cross-impact analysis is the cross-impact matrix. The matrix is a visualization of the cross-impact analysis and allows for modification. It also allows an analyst to find both the most influential variables and those variables that are impacted by the most other variables, not just direct, one-to-one relationships. While several traditional cross-impact analysis methods suggest the creation of a matrix, the priority still relies in probabilities, one-to-one relationships, and the order of events.
+In the intelligence analysis style cross-impact matrix, analysts use pluses and minuses instead of numerical values allowing for non-event variables and allowing the analyst to compare variables directly to all other variables without calculations.
+
+==== Strengths ====
+Intelligence analysis style cross-impact analysis has several key advantages. The flexibility of the model allows for analysts to measure different types of variables against each other, not just probable events. In addition, the ability to discard stringent mathematical criteria means that researchers do not need extensive mathematics training or specialized software to use cross-impact analysis. This also enables experts in a topic to use the methodology relatively quickly without having to cross-check the numerous calculations faced by the Futurist Forecasting Style.
+
+==== Weaknesses ====
+The lack of stringent procedures of the intelligence analysis style also bring considerable drawbacks. The flexibility of the style relies heavily on the opinions and knowledge of the analysts involved, and is difficult to reproduce results with a different group. In addition, the option to remove mathematics can harm analysts by creating results that do not have numerical values to back them. This lack of mathematics may make the process easier at first, but the amount of specialized software is limited when compared to the Futurist Forecasting Style, making work more tedious as the number of variables increases.
+
+== Applications ==
+Researchers can use cross-impact analysis for a wide variety of applications. Futurists have already used the methodology for forecasting events in specific industries, politics, markets, and even entire communities.
+In intelligence analysis, analysts can use the method to predict events, conditions, or decisions based on a wide variety of variables and conditions at local, national, and international levels.
+
+== See also ==
+Futures techniques
+Probability
+Analysis of competing hypotheses
+
+== References ==
--- a/data/en.wikipedia.org/wiki/Data_analysis-0.md
+++ b/data/en.wikipedia.org/wiki/Data_analysis-0.md
@ -0,0 +1,37 @@
+---
+title: "Data analysis"
+chunk: 1/5
+source: "https://en.wikipedia.org/wiki/Data_analysis"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:15:33.748083+00:00"
+instance: "kb-cron"
+---
+
+Data analysis is the process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making. Data analysis has multiple facets and approaches, encompassing diverse techniques under a variety of names, and is used in different business, science, and social science domains. In today's business world, data analysis plays an important role in making decisions more scientific and helping businesses operate more effectively. It is widely used in fields such as business analytics, healthcare, and artificial intelligence to extract meaningful insights from data.
+Data mining is a particular data analysis technique that focuses on statistical modeling and knowledge discovery for predictive rather than purely descriptive purposes, while business intelligence covers data analysis that relies heavily on aggregation, focusing mainly on business information. In statistical applications, data analysis can be divided into descriptive statistics, exploratory data analysis (EDA), and confirmatory data analysis (CDA). EDA focuses on discovering new features in the data, while CDA focuses on confirming or falsifying existing hypotheses. Predictive analytics focuses on the application of statistical models for predictive forecasting or classification, while text analytics applies statistical, linguistic, and structural techniques to extract and classify information from textual sources, a variety of unstructured data. All of the above are varieties of data analysis.
+
+== Data analysis process ==
+
+Data analysis is a process for obtaining raw data, and subsequently converting it into information useful for decision-making by users. Statistician John Tukey, defined data analysis in 1961, as:"Procedures for analyzing data, techniques for interpreting the results of such procedures, ways of planning the gathering of data to make its analysis easier, more precise or more accurate, and all the machinery and results of (mathematical) statistics which apply to analyzing data."
+There are several phases, and they are iterative, in that feedback from later phases may result in additional work in earlier phases.
+
+=== Data requirements ===
+The data is necessary as inputs to the analysis, which is specified based upon the requirements of those directing the analytics (or customers, who will use the finished product of the analysis). The general type of entity upon which the data will be collected is referred to as an experimental unit (e.g., a person or population of people). Specific variables regarding a population (e.g., age and income) may be specified and obtained. Data may be numerical or categorical (i.e., a text label for numbers).
+
+=== Data collection ===
+Data may be collected from a variety of sources. A list of data sources are available for study & research. The requirements may be communicated by analysts to custodians of the data; such as, Information Technology personnel within an organization. Data collection or data gathering is the process of gathering and measuring information on targeted variables in an established system, which then enables one to answer relevant questions and evaluate outcomes. The data may also be collected from sensors in the environment, including traffic cameras, satellites, recording devices, etc. It may also be obtained through interviews, downloads from online sources, or reading documentation.
+
+=== Data processing ===
+
+Data integration is a precursor to data analysis: Data, when initially obtained, must be processed or organized for analysis. For instance, this may involve placing data into rows and columns in a table format (known as structured data) for further analysis, often through the use of spreadsheet (e.g. Excel) or statistical software.
+
+=== Data cleaning ===
+
+Once processed and organized, the data may be incomplete, contain duplicates, or contain errors. The need for data cleaning will arise from problems in the way that the data is entered and stored. Data cleaning is the process of preventing and correcting these errors. Common tasks include record matching, identifying inaccuracy of data, overall quality of existing data, deduplication, and column segmentation.
+Such data problems can also be identified through a variety of analytical techniques. For example; with financial information, the totals for particular variables may be compared against separately published numbers that are believed to be reliable. Unusual amounts, above or below predetermined thresholds, may also be reviewed. There are several types of data cleaning that are dependent upon the type of data in the set; this could be phone numbers, email addresses, employers, or other values. Quantitative data methods for outlier detection can be used to get rid of data that appears to have a higher likelihood of being input incorrectly. Text data spell checkers can be used to lessen the amount of mistyped words. However, it is harder to tell if the words are contextually (i.e., semantically and idiomatically) correct.
+
+=== Exploratory data analysis ===
+Once the datasets are cleaned, they can then begin to be analyzed using exploratory data analysis. The process of data exploration may result in additional data cleaning or additional requests for data; thus, the initialization of the iterative phases mentioned above. Descriptive statistics, such as the average, median, and standard deviation, are often used to broadly characterize the data. Data visualization is also used, in which the analyst is able to examine the data in a graphical format in order to obtain additional insights about messages within the data.
+
+=== Modeling and algorithms ===
--- a/data/en.wikipedia.org/wiki/Data_analysis-1.md
+++ b/data/en.wikipedia.org/wiki/Data_analysis-1.md
@ -0,0 +1,45 @@
+---
+title: "Data analysis"
+chunk: 2/5
+source: "https://en.wikipedia.org/wiki/Data_analysis"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:15:33.748083+00:00"
+instance: "kb-cron"
+---
+
+Mathematical formulas or mathematical models (supported by algorithms) may be applied to the data in order to identify relationships among the variables; for example, checking for correlation and by determining whether or not there is the presence of causality.  In general terms, models may be developed to evaluate a specific variable based on other variable(s) contained within the dataset, with some residual error depending on the implemented model's accuracy (e.g., Data = Model + Error).
+Inferential statistics utilizes techniques that measure the relationships between particular variables. For example, regression analysis may be used to model whether a change in advertising (independent variable X), provides an explanation for the variation in sales (dependent variable Y), i.e. is Y a function of X?  This can be described as (Y = aX + b + error), where the model is designed such that (a) and (b) minimize the error when the model predicts Y for a given range of values of X.
+
+=== Data product ===
+A data product is a computer application that takes data inputs and generates outputs, feeding them back into the environment. It may be based on a model or algorithm. For instance, an application that analyzes data about customer purchase history, and uses the results to recommend other purchases the customer might enjoy.
+
+=== Communication ===
+
+Once data is analyzed, it may be reported in many formats to the users of the analysis to support their requirements. The users may have feedback, which results in additional analysis.
+When determining how to communicate the results, the analyst may consider implementing a variety of data visualization techniques to help communicate the message more clearly and efficiently to the audience.  Data visualization uses information displays (graphics such as, tables and charts) to help communicate key messages contained in the data. Tables are a valuable tool by enabling the ability of a user to query and focus on specific numbers; while charts (e.g., bar charts or line charts), may help explain the quantitative messages contained in the data.
+
+== Quantitative messages ==
+
+Stephen Few described eight types of quantitative messages that users may attempt to communicate from a set of data, including the associated graphs.
+
+Time-series: A single variable is captured over a period of time, such as the unemployment rate over a 10-year period. A line chart may be used to demonstrate the trend.
+Ranking: Categorical subdivisions are ranked in ascending or descending order, such as a ranking of sales performance (the measure) by salespersons (the category, with each salesperson a categorical subdivision) during a single period. A bar chart may be used to show the comparison across the salespersons.
+Part-to-whole: Categorical subdivisions are measured as a ratio to the whole (i.e., a percentage out of 100%).  A pie chart or bar chart can show the comparison of ratios, such as the market share represented by competitors in a market.
+Deviation: Categorical subdivisions are compared against a reference, such as a comparison of actual vs. budget expenses for several departments of a business for a given time period.  A bar chart can show the comparison of the actual versus the reference amount.
+Frequency distribution: Shows the number of observations of a particular variable for a given interval, such as the number of years in which the stock market return is between intervals such as 0–10%, 11–20%, etc. A histogram, a type of bar chart, may be used for this analysis.
+Correlation: Comparison between observations represented by two variables (X,Y) to determine if they tend to move in the same or opposite directions. For example, plotting unemployment (X) and inflation (Y) for a sample of months. A scatter plot is typically used for this message.
+Nominal comparison: Comparing categorical subdivisions in no particular order, such as the sales volume by product code. A bar chart may be used for this comparison.
+Geographic or geo-spatial: Comparison of a variable across a map or layout, such as the unemployment rate by state or the number of persons on the various floors of a building. A cartogram is typically used.
+
+== Analyzing quantitative data in finance ==
+
+Author Jonathan Koomey has recommended a series of best practices for understanding quantitative data. These include:
+
+Check raw data for anomalies prior to performing an analysis;
+Re-perform important calculations, such as verifying columns of data that are formula-driven;
+Confirm main totals are the sum of subtotals;
+Check relationships between numbers that should be related in a predictable way, such as ratios over time;
+Normalize numbers to make comparisons easier, such as analyzing amounts per person or relative to GDP or as an index value relative to a base year;
+Break problems into component parts by analyzing factors that led to the results, such as DuPont analysis of return on equity.
+For the variables under examination, analysts typically obtain descriptive statistics, such as the mean (average), median, and standard deviation. They may also analyze the distribution of the key variables to see how the individual values cluster around the mean.
--- a/data/en.wikipedia.org/wiki/Data_analysis-2.md
+++ b/data/en.wikipedia.org/wiki/Data_analysis-2.md
@ -0,0 +1,46 @@
+---
+title: "Data analysis"
+chunk: 3/5
+source: "https://en.wikipedia.org/wiki/Data_analysis"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:15:33.748083+00:00"
+instance: "kb-cron"
+---
+
+McKinsey and Company named a technique for breaking down a quantitative problem into its component parts called the MECE principle. MECE means "Mutually Exclusive and Collectively Exhaustive". Each layer can be broken down into its components; each of the sub-components must be mutually exclusive of each other and collectively add up to the layer above them.  For example, profit by definition can be broken down into total revenue and total cost.
+Analysts may use robust statistical measurements to solve certain analytical problems.  Hypothesis testing is used when a particular hypothesis about the true state of affairs is made by the analyst and data is gathered to determine whether that hypothesis is true or false. For example, the hypothesis might be that "Unemployment has no effect on inflation", which relates to an economics concept called the Phillips Curve. Hypothesis testing involves considering the likelihood of Type I and type II errors, which relate to whether the data supports accepting or rejecting the hypothesis.
+Regression analysis may be used when the analyst is trying to determine the extent to which independent variable X affects dependent variable Y (e.g., "To what extent do changes in the unemployment rate (X) affect the inflation rate (Y)?").
+Necessary condition analysis (NCA) may be used when the analyst is trying to determine the extent to which independent variable X allows variable Y (e.g., "To what extent is a certain unemployment rate (X) necessary for a certain inflation rate (Y)?"). Whereas (multiple) regression analysis uses additive logic where each X-variable can produce the outcome and the X's can compensate for each other (they are sufficient but not necessary), necessary condition analysis (NCA) uses necessity logic, where one or more X-variables allow the outcome to exist, but may not produce it (they are necessary but not sufficient). Each single necessary condition must be present and compensation is not possible.
+
+== Analytical activities of data users ==
+
+Users may have particular data points of interest within a data set, as opposed to the general messaging outlined above. Such low-level user analytic activities are presented in the following table. The taxonomy can also be organized by three poles of activities: retrieving values, finding data points, and arranging data points.
+
+== Barriers to effective analysis ==
+Barriers to effective analysis may exist among the analysts performing the data analysis or among the audience. Distinguishing fact from opinion, cognitive biases, and innumeracy are all challenges to sound data analysis.
+
+=== Confusing fact and opinion ===
+
+Effective analysis requires obtaining relevant facts to answer questions, support a conclusion or formal opinion, or test hypotheses. Facts by definition are irrefutable, meaning that any person involved in the analysis should be able to agree upon them. The auditor of a public company must arrive at a formal opinion on whether financial statements of publicly traded corporations are "fairly stated, in all material respects". This requires extensive analysis of factual data and evidence to support their opinion.
+
+=== Cognitive biases ===
+There are a variety of cognitive biases that can adversely affect analysis. For example, confirmation bias is the tendency to search for or interpret information in a way that confirms one's preconceptions. In addition, individuals may discredit information that does not support their views.
+Analysts may be trained specifically to be aware of these biases and how to overcome them. In his book Psychology of Intelligence Analysis, retired CIA analyst Richards Heuer wrote that analysts should clearly delineate their assumptions and chains of inference and specify the degree and source of the uncertainty involved in the conclusions. He emphasized procedures to help surface and debate alternative points of view.
+
+=== Innumeracy ===
+Effective analysts are generally adept with a variety of numerical techniques. However, audiences may not have such literacy with numbers or numeracy; they are said to be innumerate. Persons communicating the data may also be attempting to mislead or misinform, deliberately using bad numerical techniques.
+For example, whether a number is rising or falling may not be the key factor. More important may be the number relative to another number, such as the size of government revenue or spending relative to the size of the economy (GDP) or the amount of cost relative to revenue in corporate financial statements. This numerical technique is referred to as normalization or common-sizing. There are many such techniques employed by analysts, whether adjusting for inflation (i.e., comparing real vs. nominal data) or considering population increases, demographics, etc.
+Analysts may also analyze data under different assumptions or scenarios. For example, when analysts perform financial statement analysis, they will often recast the financial statements under different assumptions to help arrive at an estimate of future cash flow, which they then discount to present value based on some interest rate, to determine the valuation of the company or its stock. Similarly, the CBO analyzes the effects of various policy options on the government's revenue, outlays and deficits, creating alternative future scenarios for key measures.
+
+== Other applications ==
+
+=== Analytics and business intelligence ===
+
+Analytics is the "extensive use of data, statistical and quantitative analysis, explanatory and predictive models, and fact-based management to drive decisions and actions." It is a subset of business intelligence, which is a set of technologies and processes that uses data to understand and analyze business performance to drive decision-making.
+
+=== Education ===
+In education, most educators have access to a data system for the purpose of analyzing student data. These data systems present data to educators in an over-the-counter data format (embedding labels, supplemental documentation, and a help system and making key package/display and content decisions) to improve the accuracy of educators' data analyses.
+
+== Practitioner notes ==
+This section contains rather technical explanations that may assist practitioners but are beyond the typical scope of a Wikipedia article.
--- a/data/en.wikipedia.org/wiki/Data_analysis-3.md
+++ b/data/en.wikipedia.org/wiki/Data_analysis-3.md
@ -0,0 +1,85 @@
+---
+title: "Data analysis"
+chunk: 4/5
+source: "https://en.wikipedia.org/wiki/Data_analysis"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:15:33.748083+00:00"
+instance: "kb-cron"
+---
+
+=== Initial data analysis ===
+The most important distinction between the initial data analysis phase and the main analysis phase is that during initial data analysis one refrains from any analysis that is aimed at answering the original research question. The initial data analysis phase is guided by the following four questions:
+
+==== Quality of data ====
+The quality of the data should be checked as early as possible. Data quality can be assessed in several ways, using different types of analysis: frequency counts, descriptive statistics (mean, standard deviation, median), normality (skewness, kurtosis, frequency histograms), normal imputation is needed.
+
+Analysis of extreme observations: outlying observations in the data are analyzed to see if they seem to disturb the distribution.
+Comparison and correction of differences in coding schemes: variables are compared with coding schemes of variables external to the data set, and possibly corrected if coding schemes are not comparable.
+Test for common-method variance. The choice of analyses to assess the data quality during the initial data analysis phase depends on the analyses that will be conducted in the main analysis phase.
+
+==== Quality of measurements ====
+The quality of the measurement instruments should only be checked during the initial data analysis phase when this is not the focus or research question of the study. One should check whether structure of measurement instruments corresponds to structure reported in the literature.
+There are two ways to assess measurement quality:
+
+Confirmatory factor analysis
+Analysis of homogeneity (internal consistency), which gives an indication of the reliability of a measurement instrument. During this analysis, one inspects the variances of the items and the scales, the Cronbach's α of the scales, and the change in the Cronbach's alpha when an item would be deleted from a scale
+
+==== Initial transformations ====
+After assessing the quality of the data and of the measurements, one might decide to impute missing data, or to perform initial transformations of one or more variables, although this can also be done during the main analysis phase.
+Possible transformations of variables are:
+
+Square root transformation (if the distribution differs moderately from normal)
+Log-transformation (if the distribution differs substantially from normal)
+Inverse transformation (if the distribution differs severely from normal)
+Make categorical (ordinal / dichotomous) (if the distribution differs severely from normal, and no transformations help)
+
+==== Did the implementation of the study fulfill the intentions of the research design? ====
+One should check the success of the randomization procedure, for instance by checking whether background and substantive variables are equally distributed within and across groups.  If the study did not need or use a randomization procedure, one should check the success of the non-random sampling, for instance by checking whether all subgroups of the population of interest are represented in the sample.Other possible data distortions that should be checked are:
+
+dropout (this should be identified during the initial data analysis phase)
+Item non-response (whether this is random or not should be assessed during the initial data analysis phase)
+Treatment quality (using manipulation checks).
+
+==== Characteristics of data sample ====
+In any report or article, the structure of the sample must be accurately described. It is especially important to exactly determine the size of the subgroup when subgroup analyses will be performed during the main analysis phase.The characteristics of the data sample can be assessed by looking at:
+
+Basic statistics of important variables
+Scatter plots
+Correlations and associations
+Cross-tabulations
+
+==== Final stage of the initial data analysis ====
+During the final stage, the findings of the initial data analysis are documented, and necessary, preferable, and possible corrective actions are taken. Also, the original plan for the main data analyses can and should be specified in more detail or rewritten. In order to do this, several decisions about the main data analyses can and should be made:
+
+In the case of non-normals: should one transform variables; make variables categorical (ordinal/dichotomous); adapt the analysis method?
+In the case of missing data: should one neglect or impute the missing data; which imputation technique should be used?
+In the case of outliers: should one use robust analysis techniques?
+In case items do not fit the scale: should one adapt the measurement instrument by omitting items, or rather ensure comparability with other (uses of the) measurement instrument(s)?
+In the case of (too) small subgroups: should one drop the hypothesis about inter-group differences, or use small sample techniques, like exact tests or bootstrapping?
+In case the randomization procedure seems to be defective: can and should one calculate propensity scores and include them as covariates in the main analyses?
+
+==== Analysis ====
+Several analyses can be used during the initial data analysis phase:
+
+Univariate statistics (single variable)
+Bivariate associations (correlations)
+Graphical techniques (scatter plots)
+It is important to take the measurement levels of the variables into account for the analyses, as special statistical techniques are available for each level:
+
+Nominal and ordinal variables
+Frequency counts (numbers and percentages)
+Associations
+circumambulations (crosstabulations)
+hierarchical loglinear analysis (restricted to a maximum of 8 variables)
+loglinear analysis (to identify relevant/important variables and possible confounders)
+Exact tests or bootstrapping (in case subgroups are small)
+Computation of new variables
+Continuous variables
+Distribution
+Statistics (M, SD, variance, skewness, kurtosis)
+Stem-and-leaf displays
+Box plots
+
+=== Main data analysis ===
+In the main analysis phase, analyses aimed at answering the research question are performed as well as any other relevant analysis needed to write the first draft of the research report.
--- a/data/en.wikipedia.org/wiki/Data_analysis-4.md
+++ b/data/en.wikipedia.org/wiki/Data_analysis-4.md
@ -0,0 +1,66 @@
+---
+title: "Data analysis"
+chunk: 5/5
+source: "https://en.wikipedia.org/wiki/Data_analysis"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:15:33.748083+00:00"
+instance: "kb-cron"
+---
+
+==== Exploratory and confirmatory approaches ====
+In the main analysis phase, either an exploratory or confirmatory approach can be adopted. Usually the approach is decided before data is collected. In an exploratory analysis no clear hypothesis is stated before analysing the data, and the data is searched for models that describe the data well. In a confirmatory analysis, clear hypotheses about the data are tested.
+Exploratory data analysis should be interpreted carefully. When testing multiple models at once there is a high chance on finding at least one of them to be significant, but this can be due to a type 1 error. It is important to always adjust the significance level when testing multiple models with, for example, a Bonferroni correction. Also, one should not follow up an exploratory analysis with a confirmatory analysis in the same dataset. An exploratory analysis is used to find ideas for a theory, but not to test that theory as well. When a model is found exploratory in a dataset, then following up that analysis with a confirmatory analysis in the same dataset could simply mean that the results of the confirmatory analysis are due to the same type 1 error that resulted in the exploratory model in the first place. The confirmatory analysis therefore will not be more informative than the original exploratory analysis.
+
+==== Stability of results ====
+It is important to obtain some indication about how generalizable the results are. While this is often difficult to check, one can look at the stability of the results. Are the results reliable and reproducible? There are two main ways of doing that.
+
+Cross-validation. By splitting the data into multiple parts, we can check if an analysis (like a fitted model) based on one part of the data generalizes to another part of the data as well. Cross-validation is generally inappropriate, though, if there are correlations within the data, e.g. with panel data. Hence other methods of validation sometimes need to be used. For more on this topic, see statistical model validation.
+Sensitivity analysis. A procedure to study the behavior of a system or model when global parameters are (systematically) varied. One way to do that is via bootstrapping.
+
+== Free software for data analysis ==
+Free software for data analysis include:
+
+DevInfo – A database system endorsed by the United Nations Development Group for monitoring and analyzing human development.
+ELKI – Data mining framework in Java with data mining oriented visualization functions.
+KNIME – The Konstanz Information Miner, a user friendly and comprehensive data analytics framework.
+Orange – A visual programming tool featuring interactive data visualization and methods for statistical data analysis, data mining, and machine learning.
+Pandas – Python library for data analysis.
+PAW – FORTRAN/C data analysis framework developed at CERN.
+R – A programming language and software environment for statistical computing and graphics.
+ROOT –  C++ data analysis framework developed at CERN.
+SciPy – Python library for scientific computing.
+Julia – A programming language well-suited for numerical analysis and computational science.
+
+== Reproducible analysis ==
+The typical data analysis workflow involves collecting data, running analyses, creating visualizations, and writing reports. However, this workflow presents challenges, including a separation between analysis scripts and data, as well as a gap between analysis and documentation. Often, the correct order of running scripts is only described informally or resides in the data scientist's memory. The potential for losing this information creates issues for reproducibility.
+To address these challenges, it is essential to document analysis script content and workflow. Additionally, overall documentation is crucial, as well as providing reports that are understandable by both machines and humans, and ensuring accurate representation of the analysis workflow even as scripts evolve.
+
+== Data analysis contests ==
+Different companies and organizations hold data analysis contests to encourage researchers to utilize their data or to solve a particular question using data analysis. A few examples of well-known international data analysis contests are:
+
+Kaggle competitions; the Kaggle platform is owned and run by Google.
+LTPP data analysis contest  held by FHWA and ASCE.
+
+== See also ==
+
+== References ==
+
+=== Citations ===
+
+=== Bibliography ===
+Adèr, Herman J. (2008a). "Chapter 14: Phases and initial steps in data analysis". In Adèr, Herman J.; Mellenbergh, Gideon J.; Hand, David J (eds.). Advising on research methods : a consultant's companion. Huizen, Netherlands: Johannes van Kessel Pub. pp. 333–356. ISBN 9789079418015. OCLC 905799857.
+Adèr, Herman J. (2008b). "Chapter 15: The main analysis phase". In Adèr, Herman J.; Mellenbergh, Gideon J.; Hand, David J (eds.). Advising on research methods : a consultant's companion. Huizen, Netherlands: Johannes van Kessel Pub. pp. 357–386. ISBN 9789079418015. OCLC 905799857.
+Tabachnick, B.G. & Fidell, L.S. (2007). Chapter 4: Cleaning up your act. Screening data prior to analysis. In B.G. Tabachnick & L.S. Fidell (Eds.), Using Multivariate Statistics, Fifth Edition (pp. 60–116). Boston: Pearson Education, Inc. / Allyn and Bacon.
+
+== Further reading ==
+
+Adèr, H.J. & Mellenbergh, G.J. (with contributions by D.J. Hand) (2008). Advising on Research Methods: A Consultant's Companion. Huizen, the Netherlands: Johannes van Kessel Publishing.  ISBN 978-90-79418-01-5
+Chambers, John M.; Cleveland, William S.; Kleiner, Beat; Tukey, Paul A. (1983). Graphical Methods for Data Analysis, Wadsworth/Duxbury Press. ISBN 0-534-98052-X
+Fandango, Armando (2017). Python Data Analysis, 2nd Edition. Packt Publishers. ISBN 978-1787127487
+Juran, Joseph M.; Godfrey, A. Blanton (1999). Juran's Quality Handbook, 5th Edition. New York: McGraw Hill. ISBN 0-07-034003-X
+Lewis-Beck, Michael S. (1995). Data Analysis: an Introduction, Sage Publications Inc, ISBN 0-8039-5772-6
+NIST/SEMATECH (2008) Handbook of Statistical Methods
+Pyzdek, T, (2003). Quality Engineering Handbook, ISBN 0-8247-4614-7
+Richard Veryard (1984). Pragmatic Data Analysis. Oxford : Blackwell Scientific Publications. ISBN 0-632-01311-7
+Tabachnick, B.G.; Fidell, L.S. (2007). Using Multivariate Statistics, 5th Edition. Boston: Pearson Education, Inc. / Allyn and Bacon, ISBN 978-0-205-45938-4
--- a/data/en.wikipedia.org/wiki/Data_sharing-0.md
+++ b/data/en.wikipedia.org/wiki/Data_sharing-0.md
@ -0,0 +1,41 @@
+---
+title: "Data sharing"
+chunk: 1/2
+source: "https://en.wikipedia.org/wiki/Data_sharing"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:15:35.006801+00:00"
+instance: "kb-cron"
+---
+
+Data sharing denotes the dissemination of research datasets to enable access and use by other investigators. Policies governing this practice are increasingly instituted by funding agencies, academic institutions, and scholarly journals, reflecting the consensus that transparency and openness constitute foundational principles of the scientific method.
+A number of funding agencies and science journals require authors of peer-reviewed papers to share any supplemental information (raw data, statistical methods or source code) necessary to understand, develop or reproduce published research. A great deal of scientific research is not subject to data sharing requirements, and many of these policies have liberal exceptions. In the absence of any binding requirement, data sharing is at the discretion of the scientists themselves. In addition, in certain situations governments and institutions prohibit or severely limit data sharing to protect proprietary interests, national security, and subject/patient/victim confidentiality. Data sharing may also be restricted to protect institutions and scientists from use of data for political purposes.
+Data and methods may be requested from an author years after publication. In order to encourage data sharing and prevent the loss or corruption of data, a number of funding agencies and journals established policies on data archiving. Access to publicly archived data is a recent development in the history of science made possible by technological advances in communications and information technology. To take full advantage of modern rapid communication may require consensual agreement on the criteria underlying mutual recognition of respective contributions. Models recognized for  the timely sharing of data for more effective response to emergent infectious disease threats include the data sharing mechanism introduced by the GISAID Initiative.
+Despite policies on data sharing and archiving, data withholding still happens. Authors may fail to archive data or they only archive a portion of the data. Failure to archive data alone is not data withholding. When a researcher requests additional information, an author sometimes refuses to provide it. When authors withhold data like this, they run the risk of losing the trust of the science community. A 2022 study identified about 3500 research papers which contained statements that the data was available, but upon request and further seeking the data, found that it was unavailable for 94% of papers.
+Data sharing may also indicate the sharing of personal information on a social media platform.
+
+== U.S. government policies ==
+
+=== Federal law ===
+On August 9, 2007, President Bush signed the America COMPETES Act (or the "America Creating Opportunities to Meaningfully Promote Excellence in Technology, Education, and Science Act") requiring civilian federal agencies to provide guidelines, policies and procedures, to facilitate and optimize the open exchange of data and research between agencies, the public and policymakers. See Section 1009.
+
+=== NIH data sharing policy ===
+'The National Institutes of Health (NIH) Grants Policy Statement defines "data" as "recorded information, regardless of the form or medium on which it may be recorded, and includes writings, films, sound recordings, pictorial reproductions, drawings, designs, or other graphic representations, procedural manuals, forms, diagrams, work flow charts, equipment descriptions, data files, data processing or computer programs (software), statistical records, and other research data."' 
+The NIH Final Statement of Sharing of Research Data says:
+
+'NIH reaffirms its support for the concept of data sharing. We believe that data sharing is essential for expedited translation of research results into knowledge, products, and procedures to improve human health. The NIH endorses the sharing of final research data to serve these and other important scientific goals. The NIH expects and supports the timely release and sharing of final research data from NIH-supported studies for use by other researchers.
+'NIH recognizes that the investigators who collect the data have a legitimate interest in benefiting from their investment of time and effort. We have therefore revised our definition of "the timely release and sharing" to be no later than the acceptance for publication of the main findings from the final data set. NIH continues to expect that the initial investigators may benefit from first and continuing use but not from prolonged exclusive use.' 
+
+=== NSF Policy from Grant General Conditions ===
+36. Sharing of Findings, Data, and Other Research Products
+a. NSF …expects investigators to share with other researchers, at no more than incremental cost and within a reasonable time, the data, samples, physical collections and other supporting materials created or gathered in the course of the work. It also encourages awardees to share software and inventions or otherwise act to make the innovations they embody widely useful and usable.
+
+b. Adjustments and, where essential, exceptions may be allowed to safeguard the rights of individuals and subjects, the validity of results, or the integrity of collections or to accommodate legitimate interests of investigators. 
+
+== Office of Research Integrity ==
+Allegations of misconduct in medical research carry severe consequences. The United States Department of Health and Human Services established an office to oversee investigations of allegations of misconduct, including data withholding. The website defines the mission:
+
+"The Office of Research Integrity (ORI) promotes integrity in biomedical and behavioral research supported by the U.S. Public Health Service (PHS) at about 4,000 institutions worldwide. ORI monitors institutional investigations of research misconduct and facilitates the responsible conduct of research (RCR) through educational, preventive, and regulatory activities." 
+
+== Ideals in data sharing ==
+Some research organizations feel particularly strongly about data sharing. Stanford University's WaveLab has a philosophy about reproducible research and disclosing all algorithms and source code necessary to reproduce the research. In a paper titled "WaveLab and Reproducible Research," the authors describe some of the problems they encountered in trying to reproduce their own research after a period of time. In many cases, it was so difficult they gave up the effort. These experiences are what convinced them of the importance of disclosing source code. The philosophy is described:
--- a/data/en.wikipedia.org/wiki/Data_sharing-1.md
+++ b/data/en.wikipedia.org/wiki/Data_sharing-1.md
@ -0,0 +1,65 @@
+---
+title: "Data sharing"
+chunk: 2/2
+source: "https://en.wikipedia.org/wiki/Data_sharing"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:15:35.006801+00:00"
+instance: "kb-cron"
+---
+
+The idea is: An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures.
+The Data Observation Network for Earth (DataONE) and Data Conservancy are projects supported by the National Science Foundation to encourage and facilitate data sharing among research scientists and better support meta-analysis. In environmental sciences, the research community is recognizing that major scientific advances involving integration of knowledge in and across fields will require that researchers overcome not only the technological barriers to data sharing but also the historically entrenched institutional and sociological barriers.  Dr. Richard J. Hodes, director of the National Institute on Aging has stated, "the old model in which researchers jealously guarded their data is no longer applicable".
+The Alliance for Taxpayer Access is a group of organizations that support open access to government sponsored research. The group has expressed a "Statement of Principles" explaining why they believe open access is important. They also list a number of international public access policies.  This is no more so than in timely communication of essential information to effectively respond to health emergencies. While public domain archives have been embraced for depositing data, mainly post formal publication, they have failed to encourage rapid data sharing during health emergencies, among them the Ebola and Zika, outbreaks. More clearly defined principles are required to recognize the interests of those generating the data while permitting free, unencumbered access to and use of the data (pre-publication) for research and practical application, such as those adopted by the GISAID Initiative to counter emergent threats from influenza.
+
+== International policies ==
+Australia
+Austria
+Europe — Commission of European Communities
+Germany
+United Kingdom
+'Omic Data Sharing — a list of policies of major science funders FAIRsharing.org Catalogue of Data Policies
+India -National Data Sharing and Accessibility Policy – Government of India
+
+== Data sharing problems in academia ==
+
+=== Genetics ===
+Withholding of data has become so commonplace in genetics that researchers at Massachusetts General Hospital published a journal article on the subject. The study found that "Because they were denied access to data, 28% of geneticists reported that they had been unable to confirm published research."
+
+=== Psychology ===
+In a 2006 study, it was observed that, of 141 authors of a publication from the American Psychological Association (APA) empirical articles, 103 (73%) did not respond with their data over a 6-month period. In a follow-up study published in 2015, it was found that 246 out of 394 contacted authors of papers in APA journals did not share their data upon request (62%).
+
+=== Archaeology ===
+A 2018 study reported on study of a random sample of 48 articles published during February–May 2017 in the Journal of Archaeological Science which found openly available raw data for 18 papers (53%), with compositional and dating data being the most frequently shared types. The same study also emailed authors of articles on experiments with stone artifacts that were published during 2009 and 2015 to request data relating to the publications. They contacted the authors of 23 articles and received 15 replies, resulting in a 70% response rate. They received five responses that included data files, giving an overall sharing rate of 20%.
+
+=== Scientists in training ===
+A study of scientists in training indicated many had already experienced data withholding. This study has given rise to the fear the future generation of scientists will not abide by the established practices.
+
+== Differing approaches in different fields ==
+Requirements for the sharing of data are more frequently mandated within the medical and biological sciences than within the physical sciences. These requirements differ considerably in terms of whether data must be shared at all, the parties with whom data must be shared, and the responsibility for covering the costs associated with sharing.
+Funding bodies such as the National Institutes of Health and the National Science Foundation generally impose stronger expectations for data sharing. However, even these policies acknowledge important considerations, including the protection of patient confidentiality, the financial burden of data dissemination, and the legitimacy of the request for access. Private interests and public agencies with national security interests (defense and law enforcement) often discourage sharing of data and methods through non-disclosure agreements.
+Data sharing poses specific challenges in participatory monitoring initiatives, for example where forest communities collect data on local social and environmental conditions. In this case, a rights-based approach to the development of data-sharing protocols can be based on principles of free, prior and informed consent, and prioritise the protection of the rights of those who generated the data, and/or those potentially affected by data-sharing.
+
+== See also ==
+Data archive
+Data dissemination
+Data privacy
+Data publishing
+Data citation
+FAIR data
+File sharing
+Information sharing
+Knowledge sharing
+Open data
+Registry of Research Data Repositories
+
+== References ==
+
+== Literature ==
+
+Committee on Issues in the Transborder Flow of Scientific Data, National Research Council (1997). Bits of Power: Issues in Global Access to Scientific Data. Washington, D.C.: National Academy Press. doi:10.17226/5504. ISBN 978-0-309-05635-9. — discusses the international exchange of data in the natural sciences.
+
+== External links ==
+"The Selfish Gene Archived 2008-08-05 at the Wayback Machine: Data Sharing and Withholding in Academic Genetics" by Eric Campbell and David Blumenthal published May 31, 2002.
+Data sharing and data archiving ― American Psychological Association
+The Public Domain of Digital Research Data
--- a/data/en.wikipedia.org/wiki/Deductive-nomological_model-0.md
+++ b/data/en.wikipedia.org/wiki/Deductive-nomological_model-0.md
@ -0,0 +1,23 @@
+---
+title: "Deductive-nomological model"
+chunk: 1/6
+source: "https://en.wikipedia.org/wiki/Deductive-nomological_model"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:15:36.187998+00:00"
+instance: "kb-cron"
+---
+
+The deductive-nomological model (DN model) of scientific explanation, also known as Hempel's model, the Hempel–Oppenheim model, the Popper–Hempel model, or the covering law model, is a formal view of scientifically answering questions asking, "Why...?". The DN model poses scientific explanation as a deductive structure, one where truth of its premises entails truth of its conclusion, hinged on accurate prediction or postdiction of the phenomenon to be explained.
+Because of problems concerning humans' ability to define, discover, and know causality, this was omitted in initial formulations of the DN model. Causality was thought to be incidentally approximated by realistic selection of premises that derive the phenomenon of interest from observed starting conditions plus general laws. Still, the DN model formally permitted causally irrelevant factors. Also, derivability from observations and laws sometimes yielded absurd answers.
+When logical empiricism fell out of favor in the 1960s, the DN model was widely seen as a flawed or greatly incomplete model of scientific explanation. Nonetheless, it remained an idealized version of scientific explanation, and one that was rather accurate when applied to modern physics. In the early 1980s, a revision to the DN model emphasized maximal specificity for relevance of the conditions and axioms stated. Together with Hempel's inductive-statistical model, the DN model forms scientific explanation's covering law model, which is also termed, from critical angle, subsumption theory.
+
+== Form ==
+The term deductive distinguishes the DN model's intended determinism from the probabilism of inductive inferences.  The term nomological is derived from the Greek word νόμος or nomos, meaning "law". The DN model holds to a view of scientific explanation whose conditions of adequacy (CA)—semiformal but stated classically—are derivability (CA1), lawlikeness (CA2), empirical content (CA3), and truth (CA4).
+In the DN model, a law axiomatizes an unrestricted generalization from antecedent A to consequent B by conditional proposition—If A, then B—and has empirical content testable.  A law differs from mere true regularity—for instance, George always carries only $1 bills in his wallet—by supporting counterfactual claims and thus suggesting what must be true, while following from a scientific theory's axiomatic structure.
+The phenomenon to be explained is the explanandum—an event, law, or theory—whereas the premises to explain it are explanans, true or highly confirmed, containing at least one universal law, and entailing the explanandum.  Thus, given the explanans as initial, specific conditions C1, C2, ... Cn plus general laws L1, L2, ... Ln, the phenomenon E as explanandum is a deductive consequence, thereby scientifically explained.
+
+== Roots ==
+Aristotle's scientific explanation in Physics resembles the DN model, an idealized form of scientific explanation.  The framework of Aristotelian physics—Aristotelian metaphysics—reflected the perspective of this principally biologist, who, amid living entities' undeniable purposiveness, formalized vitalism and teleology, an intrinsic morality in nature.  With emergence of Copernicanism, however, Descartes introduced mechanical philosophy, then Newton rigorously posed lawlike explanation, both Descartes and especially Newton shunning teleology within natural philosophy.  At 1740, David Hume staked Hume's fork, highlighted the problem of induction, and found humans ignorant of either necessary or sufficient causality.  Hume also highlighted the fact/value gap, as what is does not itself reveal what ought.
+Near 1780, countering Hume's ostensibly radical empiricism, Immanuel Kant highlighted extreme rationalism—as by Descartes or Spinoza—and sought middle ground.  Inferring the mind to arrange experience of the world into substance, space, and time, Kant placed the mind as part of the causal constellation of experience and thereby found Newton's theory of motion universally true, yet knowledge of things in themselves impossible.  Safeguarding science, then, Kant paradoxically stripped it of scientific realism.  Aborting Francis Bacon's inductivist mission to dissolve the veil of appearance to uncover the noumena—metaphysical view of nature's ultimate truths—Kant's transcendental idealism tasked science with simply modeling patterns of phenomena.  Safeguarding metaphysics, too, it found the mind's constants holding also universal moral truths, and launched German idealism.
+Auguste Comte found the problem of induction rather irrelevant since enumerative induction is grounded on the empiricism available, while science's point is not metaphysical truth.  Comte found human knowledge had evolved from theological to metaphysical to scientific—the ultimate stage—rejecting both theology and metaphysics as asking questions unanswerable and posing answers unverifiable.  Comte in the 1830s expounded positivism—the first modern philosophy of science and simultaneously a political philosophy—rejecting conjectures about unobservables, thus rejecting search for causes.  Positivism predicts observations, confirms the predictions, and states a law, thereupon applied to benefit human society.  From late 19th century into the early 20th century, the influence of positivism spanned the globe.  Meanwhile, evolutionary theory's natural selection brought the Copernican Revolution into biology and eventuated in the first conceptual alternative to vitalism and teleology.
--- a/data/en.wikipedia.org/wiki/Deductive-nomological_model-1.md
+++ b/data/en.wikipedia.org/wiki/Deductive-nomological_model-1.md
@ -0,0 +1,27 @@
+---
+title: "Deductive-nomological model"
+chunk: 2/6
+source: "https://en.wikipedia.org/wiki/Deductive-nomological_model"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:15:36.187998+00:00"
+instance: "kb-cron"
+---
+
+== Growth ==
+Whereas Comtean positivism posed science as description, logical positivism emerged in the late 1920s and posed science as explanation, perhaps to better unify empirical sciences by covering not only fundamental science—that is, fundamental physics—but special sciences, too, such as biology, psychology, economics, and anthropology.  After defeat of National Socialism with World War II's close in 1945, logical positivism shifted to a milder variant, logical empiricism.  All variants of the movement, which lasted until 1965, are neopositivism, sharing the quest of verificationism.
+Neopositivists led emergence of the philosophy subdiscipline philosophy of science, researching such questions and aspects of scientific theory and knowledge.  Scientific realism takes scientific theory's statements at face value, thus accorded either falsity or truth—probable or approximate or actual.  Neopositivists held scientific antirealism as instrumentalism, holding scientific theory as simply a device to predict observations and their course, while statements on nature's unobservable aspects are elliptical at or metaphorical of its observable aspects, rather.
+DN model received its most detailed, influential statement by Carl G Hempel, first in his 1942 article "The function of general laws in history", and more explicitly with Paul Oppenheim in their 1948 article "Studies in the logic of explanation".  Leading logical empiricist, Hempel embraced the Humean empiricist view that humans observe sequence of sensory events, not cause and effect, as causal relations and casual mechanisms are unobservables.  DN model bypasses causality beyond mere constant conjunction: first an event like A, then always an event like B.
+Hempel held natural laws—empirically confirmed regularities—as satisfactory, and if included realistically to approximate causality.  In later articles, Hempel defended DN model and proposed probabilistic explanation by inductive-statistical model (IS model).  DN model and IS model—whereby the probability must be high, such as at least 50%—together form covering law model, as named by a critic, William Dray.  Derivation of statistical laws from other statistical laws goes to the deductive-statistical model (DS model).  Georg Henrik von Wright, another critic, named the totality subsumption theory.
+
+== Decline ==
+Amid failure of neopositivism's fundamental tenets, Hempel in 1965 abandoned verificationism, signaling neopositivism's demise. From 1930 onward, Karl Popper attacked positivism, although, paradoxically, Popper was commonly mistaken for a positivist.  Even Popper's 1934 book embraces DN model, widely accepted as the model of scientific explanation for as long as physics remained the model of science examined by philosophers of science.
+In the 1940s, filling the vast observational gap between cytology and biochemistry, cell biology arose and established existence of cell organelles besides the nucleus.  Launched in the late 1930s, the molecular biology research program cracked a genetic code in the early 1960s and then converged with cell biology as cell and molecular biology, its breakthroughs and discoveries defying DN model by arriving in quest not of lawlike explanation but of causal mechanisms.  Biology became a new model of science, while special sciences were no longer thought defective by lacking universal laws, as borne by physics.
+In 1948, when explicating DN model and stating scientific explanation's semiformal conditions of adequacy, Hempel and Oppenheim acknowledged redundancy of the third, empirical content, implied by the other three—derivability, lawlikeness, and truth.  In the early 1980s, upon widespread view that causality ensures the explanans' relevance, Wesley Salmon called for returning cause to because, and along with James Fetzer helped replace CA3 empirical content with CA3' strict maximal specificity.
+Salmon introduced causal mechanical explanation, never clarifying how it proceeds, yet reviving philosophers' interest in such.  Via shortcomings of Hempel's inductive-statistical model (IS model), Salmon introduced statistical-relevance model (SR model).  Although DN model remained an idealized form of scientific explanation, especially in applied sciences, most philosophers of science consider DN model flawed by excluding many types of explanations generally accepted as scientific.
+
+== Strengths ==
+As theory of knowledge, epistemology differs from ontology, which is a subbranch of metaphysics, theory of reality.  Ontology proposes categories of being—what sorts of things exist—and so, although a scientific theory's ontological commitment can be modified in light of experience, an ontological commitment inevitably precedes empirical inquiry.
+Natural laws, so called, are statements of humans' observations, thus are epistemological—concerning human knowledge—the epistemic.  Causal mechanisms and structures existing putatively independently of minds exist, or would exist, in the natural world's structure itself, and thus are ontological, the ontic. Blurring epistemic with ontic—as by incautiously presuming a natural law to refer to a causal mechanism, or to trace structures realistically during unobserved transitions, or to be true regularities always unvarying—tends to generate a category mistake.
+Discarding ontic commitments, including causality per se, DN model permits a theory's laws to be reduced to—that is, subsumed by—a more fundamental theory's laws.  The higher theory's laws are explained in DN model by the lower theory's laws.  Thus, the epistemic success of Newtonian theory's law of universal gravitation is reduced to—thus explained by—Albert Einstein's general theory of relativity, although Einstein's discards Newton's ontic claim that universal gravitation's epistemic success predicting Kepler's laws of planetary motion is through a causal mechanism of a straightly attractive force instantly traversing absolute space despite absolute time.
+Covering law model reflects neopositivism's vision of empirical science, a vision interpreting or presuming unity of science, whereby all empirical sciences are either fundamental science—that is, fundamental physics—or are special sciences, whether astrophysics, chemistry, biology, geology, psychology, economics, and so on.  All special sciences would network via covering law model.  And by stating boundary conditions while supplying bridge laws, any special law would reduce to a lower special law, ultimately reducing—theoretically although generally not practically—to fundamental science.  (Boundary conditions are specified conditions whereby the phenomena of interest occur.  Bridge laws translate terms in one science to terms in another science.)
--- a/data/en.wikipedia.org/wiki/Deductive-nomological_model-2.md
+++ b/data/en.wikipedia.org/wiki/Deductive-nomological_model-2.md
@ -0,0 +1,14 @@
+---
+title: "Deductive-nomological model"
+chunk: 3/6
+source: "https://en.wikipedia.org/wiki/Deductive-nomological_model"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:15:36.187998+00:00"
+instance: "kb-cron"
+---
+
+== Weaknesses ==
+By DN model, if one asks, "Why is that shadow 20 feet long?", another can answer, "Because that flagpole is 15 feet tall, the Sun is at x angle, and laws of electromagnetism".  Yet by problem of symmetry, if one instead asked, "Why is that flagpole 15 feet tall?", another could answer, "Because that shadow is 20 feet long, the Sun is at x angle, and laws of electromagnetism", likewise a deduction from observed conditions and scientific laws, but an answer clearly incorrect.  By the problem of irrelevance, if one asks, "Why did that man not get pregnant?", one could in part answer, among the explanans, "Because he took birth control pills"—if he factually took them, and the law of their preventing pregnancy—as covering law model poses no restriction to bar that observation from the explanans.
+Many philosophers have concluded that causality is integral to scientific explanation.  DN model offers a necessary condition of a causal explanation—successful prediction—but not sufficient conditions of causal explanation, as a universal regularity can include spurious relations or simple correlations, for instance Z always following Y, but not Z because of Y, instead Y and then Z as an effect of X.  By relating temperature, pressure, and volume of gas within a container, Boyle's law permits prediction of an unknown variable—volume, pressure, or temperature—but does not explain why to expect that unless one adds, perhaps, the kinetic theory of gases.
+Scientific explanations increasingly pose not determinism's universal laws, but probabilism's chance, ceteris paribus laws.  Smoking's contribution to lung cancer fails even the inductive-statistical model (IS model), requiring probability over 0.5 (50%).  (Probability standardly ranges from 0 (0%) to 1 (100%).)  Epidemiology, an applied science that uses statistics in search of associations between events, cannot show causality, but consistently found higher incidence of lung cancer in smokers versus otherwise similar nonsmokers, although the proportion of smokers who develop lung cancer is modest.  Versus nonsmokers, however, smokers as a group showed over 20 times the risk of lung cancer, and in conjunction with basic research, consensus followed that smoking had been scientifically explained as a cause of lung cancer, responsible for some cases that without smoking would not have occurred, a probabilistic counterfactual causality.
--- a/data/en.wikipedia.org/wiki/Deductive-nomological_model-3.md
+++ b/data/en.wikipedia.org/wiki/Deductive-nomological_model-3.md
@ -0,0 +1,37 @@
+---
+title: "Deductive-nomological model"
+chunk: 4/6
+source: "https://en.wikipedia.org/wiki/Deductive-nomological_model"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:15:36.187998+00:00"
+instance: "kb-cron"
+---
+
+== Covering action ==
+Through lawlike explanation, fundamental physics—often perceived as fundamental science—has proceeded through intertheory relation and theory reduction, thereby resolving experimental paradoxes to great historical success, resembling covering law model.  In early 20th century, Ernst Mach as well as Wilhelm Ostwald had resisted Ludwig Boltzmann's reduction of thermodynamics—and thereby Boyle's law—to statistical mechanics partly because it rested on kinetic theory of gas, hinging on atomic/molecular theory of matter.  Mach as well as Ostwald viewed matter as a variant of energy, and molecules as mathematical illusions, as even Boltzmann thought possible.
+In 1905, via statistical mechanics, Albert Einstein predicted the phenomenon Brownian motion—unexplained since reported in 1827 by botanist Robert Brown.  Soon, most physicists accepted that atoms and molecules were unobservable yet real.  Also in 1905, Einstein explained the electromagnetic field's energy as distributed in particles, doubted until this helped resolve atomic theory in the 1910s and 1920s.  Meanwhile, all known physical phenomena were gravitational or electromagnetic, whose two theories misaligned.  Yet belief in aether as the source of all physical phenomena was virtually unanimous.  At experimental paradoxes, physicists modified the aether's hypothetical properties.
+Finding the luminiferous aether a useless hypothesis, Einstein in 1905 a priori unified all inertial reference frames to state special principle of relativity, which, by omitting aether, converted space and time into relative phenomena whose relativity aligned electrodynamics with the Newtonian principle Galilean relativity or invariance.   Originally epistemic or instrumental, this was interpreted as ontic or realist—that is, a causal mechanical explanation—and the principle became a theory, refuting Newtonian gravitation.  By predictive success in 1919, general relativity apparently overthrew Newton's theory, a revolution in science resisted by many yet fulfilled around 1930.
+In 1925, Werner Heisenberg as well as Erwin Schrödinger independently formalized quantum mechanics (QM).  Despite clashing explanations, the two theories made identical predictions.  Paul Dirac's 1928 model of the electron was set to special relativity, launching QM into the first quantum field theory (QFT), quantum electrodynamics (QED).  From it, Dirac interpreted and predicted the electron's antiparticle, soon discovered and termed positron, but the QED failed electrodynamics at high energies.  Elsewhere and otherwise, strong nuclear force and weak nuclear force were discovered.
+In 1941, Richard Feynman introduced QM's path integral formalism, which if taken toward interpretation as a causal mechanical model clashes with Heisenberg's matrix formalism and with Schrödinger's wave formalism, although all three are empirically identical, sharing predictions.  Next, working on QED, Feynman sought to model particles without fields and find the vacuum truly empty.  As each known fundamental force is apparently an effect of a field, Feynman failed.  Louis de Broglie's waveparticle duality had rendered atomism—indivisible particles in a void—untenable, and highlighted the very notion of discontinuous particles as self-contradictory.
+Meeting in 1947, Freeman Dyson, Richard Feynman, Julian Schwinger, and Sin-Itiro Tomonaga soon introduced renormalization, a procedure converting QED to physics' most predictively precise theory, subsuming chemistry, optics, and statistical mechanics.  QED thus won physicists' general acceptance.  Paul Dirac criticized its need for renormalization as showing its unnaturalness, and called for an aether.  In 1947, Willis Lamb had found unexpected motion of electron orbitals, shifted since the vacuum is not truly empty.  Yet emptiness was catchy, abolishing aether conceptually, and physics proceeded ostensibly without it, even suppressing it.  Meanwhile, "sickened by untidy math, most philosophers of physics tend to neglect QED".
+Physicists have feared even mentioning aether, renamed vacuum, which—as such—is nonexistent.  General philosophers of science commonly believe that aether, rather, is fictitious, "relegated to the dustbin of scientific history ever since" 1905 brought special relativity.  Einstein was noncommittal to aether's nonexistence, simply said it superfluous.  Abolishing Newtonian motion for electrodynamic primacy, however, Einstein inadvertently reinforced aether, and to explain motion was led back to aether in general relativity.  Yet resistance to relativity theory became associated with earlier theories of aether, whose word and concept became taboo.  Einstein explained special relativity's compatibility with an aether, but Einstein aether, too, was opposed.  Objects became conceived as pinned directly on space and time by abstract geometric relations lacking ghostly or fluid medium.
+By 1970, QED along with weak nuclear field was reduced to electroweak theory (EWT), and the strong nuclear field was modeled as quantum chromodynamics (QCD).  Comprised by EWT, QCD, and Higgs field, this Standard Model of particle physics is an "effective theory", not truly fundamental.  As QCD's particles are considered nonexistent in the everyday world, QCD especially suggests an aether, routinely found by physics experiments to exist and to exhibit relativistic symmetry.  Confirmation of the Higgs particle, modeled as a condensation within the Higgs field, corroborates aether, although physics need not state or even include aether.  Organizing regularities of observations—as in the covering law model—physicists find superfluous the quest to discover aether.
+In 1905, from special relativity, Einstein deduced mass–energy equivalence, particles being variant forms of distributed energy, how particles colliding at vast speed experience that energy's transformation into mass, producing heavier particles, although physicists' talk promotes confusion.  As "the contemporary locus of metaphysical research", QFTs pose particles not as existing individually, yet as excitation modes of fields, the particles and their masses being states of aether, apparently unifying all physical phenomena as the more fundamental causal reality, as long ago foreseen.  Yet a quantum field is an intricate abstraction—a mathematical field—virtually inconceivable as a classical field's physical properties.  Nature's deeper aspects, still unknown, might elude any possible field theory.
+Though discovery of causality is popularly thought science's aim, search for it was shunned by the Newtonian research program, even more Newtonian than was Isaac Newton.  By now, most theoretical physicists infer that the four, known fundamental interactions would reduce to superstring theory, whereby atoms and molecules, after all, are energy vibrations holding mathematical, geometric forms.  Given uncertainties of scientific realism, some conclude that the concept causality raises comprehensibility of scientific explanation and thus is key folk science, but compromises precision of scientific explanation and is dropped as a science matures.  Even epidemiology is maturing to heed the severe difficulties with presumptions about causality.  Covering law model is among Carl G Hempel's admired contributions to philosophy of science.
+
+== See also ==
+Types of inference
+
+Deductive reasoning
+Inductive reasoning
+Abductive reasoning
+Related subjects
+
+Explanandum and explanans
+Hypothetico-deductive model
+Models of scientific inquiry
+Philosophy of science
+Scientific method
+
+== Notes ==
--- a/data/en.wikipedia.org/wiki/Deductive-nomological_model-4.md
+++ b/data/en.wikipedia.org/wiki/Deductive-nomological_model-4.md
--- a/data/en.wikipedia.org/wiki/Deductive-nomological_model-5.md
+++ b/data/en.wikipedia.org/wiki/Deductive-nomological_model-5.md
@ -0,0 +1,17 @@
+---
+title: "Deductive-nomological model"
+chunk: 6/6
+source: "https://en.wikipedia.org/wiki/Deductive-nomological_model"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:15:36.187998+00:00"
+instance: "kb-cron"
+---
+
+Rowlands, Peter, Oliver Lodge and the Liverpool Physical Society (Liverpool: Liverpool University Press, 1990). Sarkar, Sahotra & Jessica Pfeifer, eds, The Philosophy of Science: An Encyclopedia, Volume 1: A–M (New York: Routledge, 2006). Schwarz, John H (1998). "Recent developments in superstring theory". Proceedings of the National Academy of Sciences of the United States of America. 95 (6): 2750–7. Bibcode:1998PNAS...95.2750S. doi:10.1073/pnas.95.6.2750. PMC 19640. PMID 9501161. Schweber, Silvan S, QED and the Men who Made it: Dyson, Feynman, Schwinger, and Tomonaga (Princeton: Princeton University Press, 1994). Schliesser, Eric, "Hume's Newtonianism and anti-Newtonianism", in Edward N Zalta, ed, The Stanford Encyclopedia of Philosophy, Winter 2008 edn. Spohn, Wolfgang, The Laws of Belief: Ranking Theory and Its Philosophical Applications (Oxford: Oxford University Press, 2012). Suppe, Frederick, ed, The Structure of Scientific Theories, 2nd edn (Urbana, Illinois: University of Illinois Press, 1977). Tavel, Morton, Contemporary Physics and the Limits of Knowledge (Piscataway, NJ: Rutgers University Press, 2002). Torretti, Roberto, The Philosophy of Physics (New York: Cambridge University Press, 1999). Vongehr, Sascha, "Higgs discovery rehabilitating despised Einstein Ether", Science 2.0: Alpha Meme website, 13 Dec 2011. Vongehr, Sascha, "Supporting abstract relational space-time as fundamental without doctrinism against emergence, arXiv (History and Philosophy of Physics):0912.3069, 2 Oct 2011 (last revised). von Wright, Georg Henrik, Explanation and Understanding (Ithaca, NY: Cornell University Press, 1971–2004). Wells, James D, Effective Theories in Physics: From Planetary Orbits to Elementary Particle Masses (Heidelberg, New York, Dordrecht, London: Springer, 2012). Wilczek, Frank, The Lightness of Being: Mass, Ether, and the Unification of Forces (New York: Basic Books, 2008). Whittaker, Edmund T, A History of the Theories of Aether and Electricity: From the Age of Descartes to the Close of the Nineteenth Century (London, New York, Bombay, Calcutta: Longmans, Green, and Co, 1910 / Dublin: Hodges, Figgis, & Co, 1910). Wilczek, Frank (Jan 1999). "The persistence of ether" (PDF). Physics Today. 52 (1): 11–13. Bibcode:1999PhT....52a..11W. doi:10.1063/1.882562. Wolfson, Richard, Simply Einstein: Relativity Demystified (New York: W W Norton & Co, 2003). Woodward, James, "Scientific explanation", in Edward N Zalta, ed, The Stanford Encyclopedia of Philosophy, Winter 2011 edn. Wootton, David, ed, Modern Political Thought: Readings from Machiavelli to Nietzsche (Indianapolis: Hackett Publishing, 1996).
+
+== Further reading ==
+Carl G. Hempel, Aspects of Scientific Explanation and other Essays in the Philosophy of Science (New York: Free Press, 1965).
+Randolph G. Mayes, "Theories of explanation", in Fieser Dowden, ed, Internet Encyclopedia of Philosophy, 2006.
+Ilkka Niiniluoto, "Covering law model", in Robert Audi, ed., The Cambridge Dictionary of Philosophy, 2nd edn (New York: Cambridge University Press, 1996).
+Wesley C. Salmon, Four Decades of Scientific Explanation (Minneapolis: University of Minnesota Press, 1990 / Pittsburgh: University of Pittsburgh Press, 2006).
--- a/data/en.wikipedia.org/wiki/Discovery_science-0.md
+++ b/data/en.wikipedia.org/wiki/Discovery_science-0.md
@ -0,0 +1,30 @@
+---
+title: "Discovery science"
+chunk: 1/3
+source: "https://en.wikipedia.org/wiki/Discovery_science"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:15:37.488236+00:00"
+instance: "kb-cron"
+---
+
+Discovery science (also known as discovery-based science) is a scientific methodology which aims to find new patterns, correlations, and form hypotheses through the analysis of large-scale experimental data. The term “discovery science” encompasses various fields of study, including basic, translational, and computational science and research. Discovery-based methodologies are commonly contrasted with traditional scientific practice, the latter involving hypothesis formation before experimental data is closely examined. Discovery science involves the process of inductive reasoning or using observations to make generalisations, and can be applied to a range of science-related fields, e.g., medicine, proteomics, hydrology, psychology, and psychiatry.
+
+== Overview ==
+
+=== Purpose ===
+Discovery science places an emphasis on 'basic' discovery, which can fundamentally change the status quo. For example, in the early years of water resources research, the use of discovery science was demonstrated by seeking to elucidate phenomena that were, until that point, unexplained. It did not matter how unusual these ideas may have been perceived to be. In this sense, discovery science is based on the attitude that "we must not allow our concepts of the earth, in so far as they transcend the reach of observation, to root themselves so deeply and so firmly in our minds that the process of uprooting them causes mental discomfort" (as stated by Davis in 1926). For discovery science to be utilised, there is a need to revert to creating and testing genuine hypotheses, rather than focusing on praising concepts that are already familiar. While researchers commonly feel that new hypotheses will naturally emerge inductively from curiosity in the relevant field, it should be acknowledged that hypotheses can be generated by models. Additionally, deductive testing must involve field observation, so that imperfect answers can be substituted with questions that are more clearly defined.
+
+=== Tools ===
+Hypothesis-driven studies can be transformed into discovery-driven studies with the help of newly available tools and technology-driven life science research. These tools have allowed for new questions to be asked, and new paradigms to be considered, particularly in the field of biology. However, some of these required tools are limited in the sense that they are inaccessible or too costly because the related technology is still being developed.
+Data mining is the most common tool used in discovery science, and is applied to data from diverse fields of study such as DNA analysis, climate modelling, nuclear reaction modelling, and others. The use of data mining in discovery science follows a general trend of increasing use of computers and computational theory in all fields of science, and newer methods of data mining employ specialised machine learning algorithms for automated hypothesis forming and automated theorem proving.
+
+== Applications ==
+While computational methods are gaining interest, there is a decline in efforts to support critical care through basic and translational science, i.e., forms of discovery science which are essential for advancing understanding of pathophysiology.  A loss of interest in basic and translational science may lead to a failure to discover and develop new therapies, which could have an impact on the critically ill. Within critical care, there is an aim to renew emphasis on basic, translational science through platforms such as medical journals and conferences, as well as the critical care medical curricula. Advances in discovery-based science thereby underlie key discoveries and development in medicine, constituting a 'pipeline' for leading-edge medical development.
+
+=== Medicine ===
+According to the AACR Cancer Progress Report 2021, discovery science has the potential to drive clinical breakthroughs. Since discovery science underlies key discoveries and development of new therapies for medicine, it remains important for advancing critical care. Numerous discoveries have increased life span and productivity, and decreased health-related costs, thereby revolutionising medical care. Resultantly, return on investment for discovery science has proven to be high. For example, its combination of computational methods with knowledge on inflammatory and genomic pathways has resulted in optimised clinical trials. Ultimately, discovery science is currently enabling a transition to the era of personalised medicine for treating complex syndromes, e.g., sepsis and ARDS. With a robust infrastructure, discovery science can resultantly revolutionise medical care and biological research.
+
+=== Genomics ===
+Discovery science has converged with clinical medicine and cancer genomics, and this convergence has been accelerated by recent advances in genome technologies and genomic information. The effect of cancer genomics has been noticeable in every area of cancer research. The majority of successful applications of genomic knowledge in today's clinical medicine involves a wealth of knowledge which has been gathered by a broad range of research and decades of work. Biological insights are required to inform drug discovery and to set a clear clinical path for development. 
+Historically, acquisition of such knowledge through functional and mechanistic studies has been uncoordinated, random, and inefficient. The process of moving from cancer genomic discoveries to personalised medicine involves some major scientific, logistical and regulatory hurdles. This includes patient consent, sample acquisition, clinical annotation and study design, all of which can lead to data generation and computational analyses. Additionally, functional and mechanistic studies remain a challenge, which can lead to drug and biomarker discovery and development, commercial challenges and genomics-informed clinical trials. Importantly, these key scientific challenges are interdependent with each other. Directed and streamlined approaches are sought to be developed for a rapid generation of biological discoveries, which can allow for cancer genomic discoveries to translate to the clinic. Delivering personalised cancer medicine benefits from traditional, unconstrained and non-directed academic exploration, with the goal of directing scientific inquiry to convert genomic discovery to diagnostic and therapeutic targets.
--- a/data/en.wikipedia.org/wiki/Discovery_science-1.md
+++ b/data/en.wikipedia.org/wiki/Discovery_science-1.md
@ -0,0 +1,18 @@
+---
+title: "Discovery science"
+chunk: 2/3
+source: "https://en.wikipedia.org/wiki/Discovery_science"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:15:37.488236+00:00"
+instance: "kb-cron"
+---
+
+=== Proteomics ===
+Another example of discovery science is proteomics, a technology-driven and technology limited discovery science. Technologies for proteomic analysis provide information that is useful in discovery science. Proteome analysis as a discovery science is applicable in biotechnology, e.g., it assists in 1) the discovery of biochemical pathways which can identify targets for therapies, 2) developing new processes for manufacturing biological materials, 3) monitoring manufacturing processes for the purpose of quality control, and 4) developing diagnostic tests and efficacious treatment strategies for clinical diseases. In the context of proteomics, current life-science research remains technology-limited, however, recent available tools have assisted in evolving such research from being hypothesis-driven to discovery-driven.
+
+=== Hydrology ===
+Field hydrology has experienced a decline in progress due to a change from discovery-based field work to the gathering of data for modal parameterisation. In field hydrology, models are not any more useful than an understanding of how systems work, and discovery science allows for this understanding. Several important examples of field-based inquiry and discovery have taken place in field hydrology. These include: identifying spatial patterns of soil moisture and how they relate to topography; interrogating such data through the use of geostatistics; and discovering the importance of macropore flow and hydrological connectivity. Some discovery-based questions that have been asked in field hydrology include 1) determining which parts of the watershed are most important in determining water delivery to the channel, 2) how the presence of 'old' water can be explained by groundwater travelling into the stream, and 3) how there can be an explanation for flashy hydrographs when there is no overland flow visible. Therefore, there is a need for discovery science in field hydrology, despite any unusual hydrological hypotheses that are formed.
+
+=== Psychology ===
+An example of discovery science being enhanced for human brain function can be seen in the 1000 Functional Connectomes Project (FCP). This project was launched in 2009 as a way of generating and collecting functional magnetic resonance imaging (fMRI) data from over 1,000 individuals. Similarly to decoding the human genome, the mapping of human brain function presents challenges to the functional neuroimaging community. For the first phase of discovery science, it is necessary to accumulate and share large-scale datasets for data mining. Traditionally, the neuroimaging community within psychology has focused on task-based and hypothesis-driven approaches, however, a powerful tool for discovery science has emerged in the form of resting-state functional MRI (R-fMRI). The potential of discovery science remains vast, e.g. 1) helping with decision-making and guiding clinical diagnoses by developing objective measures of brain functional integrity, 2) assessing the level of efficacy of treatment interventions, and 3) tracking responses to treatment. Among the scientific community, recruiting participation and achieving collaboration from the broad population is essential for successfully implementing discovery-based science in the context of human brain function.
--- a/data/en.wikipedia.org/wiki/Discovery_science-2.md
+++ b/data/en.wikipedia.org/wiki/Discovery_science-2.md
@ -0,0 +1,19 @@
+---
+title: "Discovery science"
+chunk: 3/3
+source: "https://en.wikipedia.org/wiki/Discovery_science"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:15:37.488236+00:00"
+instance: "kb-cron"
+---
+
+== Methodology ==
+Discovery-based methodologies are often viewed in contrast to traditional scientific practice, where hypotheses are formed before close examination of experimental data. However, from a philosophical perspective where all or most of the observable "low-hanging fruit" has already been plucked, examining the phenomenological world more closely than the senses alone (even augmented senses, e.g. via microscopes, telescopes, bifocals etc.) opens a new source of knowledge for hypothesis formation. This process is also known as inductive reasoning or the use of specific observations to make generalisations.
+Discovery science is usually a complex process, and consequently does not follow a simple linear cause and effect pattern. This means that outcomes are uncertain, and it is expected to have disappointing results as a fundamental part of discovery science. In particular, this may apply to medicine for the critically ill, where disease syndromes may be complex and multi-factorial. In psychiatry, studying complex relationships between brain and behaviour requires a large-scale science. This calls for a need to conceptually switch from hypothesis-driven studies to hypothesis-generating research which is discovery-based. Normally, discovery-based approaches for research are initially hypothesis-free, however, hypothesis testing can be elevated to a new level that effectively supports traditional hypothesis-driven studies. Researchers hope that combining integrative analyses of data from a range of different levels can result in new classification approaches to enable personalised interventions. Some biologists, such as Leroy Hood, have suggested that the model of ‘discovery science’ is a model which certain research fields are heading towards. For example, it is believed that more information about gene function can be discovered, through the evolution of data-mining tools.
+Discovery-based approaches are often referred to as “big data” approaches, because of the large-scale datasets that they involve analyses of. Big data includes large-scale homogenous study designs and highly variant datasets, and can be further divided into different kinds of datasets. For example, in neuropsychiatric studies, big data can be categorised as ‘broad’ or ‘deep’ data. Broad data is complex and heterogenous, as it is collected from multiple sources (e.g., labs and institutions) and uses different kinds of standards. On the other hand, deep data is collected at multiple levels, e.g., from genes to molecules, cells, circuits, behaviours, and symptoms. Broad data allows for population level inferences to be made; deep data is required for personalised medicine. However, combining broad and deep data and storing them in large-scale databases makes it practically impossible to rely on traditional statistical approaches. Instead, the use of discovery-based big data approaches can allow for the generation of hypotheses and offer an analytical tool with high-throughput for pattern recognition and data mining. It is in this way that discovery-based approaches can provide insight into causes and mechanisms of the area of study.
+Although discovery-based and data-driven big data approaches can inform understanding of mechanisms behind the topic of concern, the success of these approaches depends on integrated analyses of the various types of relevant data, and the resultant insight provided. For example, when researching psychiatric dysfunction, it is important to integrate vast and complex data such as brain imaging, genomic data and behavioural data, to uncover any brain-behaviour connections that are relevant to psychiatric dysfunction. Therefore, there are challenges to integrating data and developing mining tools. Furthermore, validation of results is a big challenge for discovery-based science. Although it is possible for results to be statistically validated by independent datasets, tests of functionality affect ultimate validation. Collaborative efforts are therefore critical for success.
+
+== References ==
+
+Chen, J.; Call, G. B.; Beyer, E.; Bui, C.; Cespedes, A.; Chan, A.; Chan, J .; Chan, S.; Chhabra, A. (February 2005). "Discovery-Based Science Education: Functional Genomic Dissection in Drosophila by Undergraduate Researchers". PLOS Biology. 3 (2): e59. doi:10.1371/journal.pbio.0030059. PMC 548953. PMID 15719063.
--- a/data/en.wikipedia.org/wiki/Evidence-based_practice-0.md
+++ b/data/en.wikipedia.org/wiki/Evidence-based_practice-0.md
@ -0,0 +1,29 @@
+---
+title: "Evidence-based practice"
+chunk: 1/3
+source: "https://en.wikipedia.org/wiki/Evidence-based_practice"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:15:38.733240+00:00"
+instance: "kb-cron"
+---
+
+Evidence-based practice (EBP) is the idea that occupational practices ought to be based on scientific evidence. The movement towards evidence-based practices attempts to encourage and, in some instances, require professionals and other decision-makers to pay more attention to evidence to inform their decision-making. The goal of evidence-based practice is to eliminate unsound or outdated practices in favor of more-effective ones by shifting the basis for decision making from tradition, intuition, and unsystematic experience to firmly grounded scientific research. The proposal has been controversial, with some arguing that results may not specialize to individuals as well as traditional practices.
+Evidence-based practices have been gaining ground since the introduction of evidence-based medicine and have spread to the allied health professions, education, management, law, public policy, architecture, and other fields.  In light of studies showing problems in scientific research (such as the replication crisis), there is also a movement to apply evidence-based practices in scientific research itself. Research into the evidence-based practice of science is called metascience.
+An individual or organisation is justified in claiming that a specific practice is evidence-based if, and only if, three conditions are met. First, the individual or organisation possesses comparative evidence about the effects of the specific practice in comparison to the effects of at least one alternative practice. Second, the specific practice is supported by this evidence according to at least one of the individual's or organisation's preferences in the given practice area. Third, the individual or organisation can provide a sound account for this support by explaining the evidence and preferences that lay the foundation for the claim.
+
+== History ==
+For most of history, professions have based their practices on expertise derived from experience passed down in the form of tradition. Many of these practices have not been justified by evidence, which has sometimes enabled quackery and poor performance. Even when overt quackery is not present, the quality and efficiency of tradition-based practices may not be optimal. As the scientific method has become increasingly recognized as a sound means to evaluate practices, evidence-based practices have become increasingly adopted.
+
+=== Medicine ===
+One of the earliest proponents of evidence-based practice was Archie Cochrane, an epidemiologist who authored the book Effectiveness and Efficiency: Random Reflections on Health Services in 1972. Cochrane's book argued for the importance of properly testing health care strategies, and was foundational to the evidence-based practice of medicine. Cochrane suggested that because resources would always be limited, they should be used to provide forms of health care which had been shown in properly designed evaluations to be effective. Cochrane maintained that the most reliable evidence was that which came from randomised controlled trials.
+The term "evidence-based medicine" was introduced by Gordon Guyatt in 1990 in an unpublished program description, and the term was later first published in 1992. This marked the first evidence-based practice to be formally established. Some early experiments in evidence-based medicine involved testing primitive medical techniques such as bloodletting, and studying the effectiveness of modern and accepted treatments. There has been a push for evidence-based practices in medicine by insurance providers, which have sometimes refused coverage of practices lacking systematic evidence of usefulness. It is now expected by most clients that medical professionals should make decisions based on evidence, and stay informed about the most up-to-date information. Since the widespread adoption of evidence-based practices in medicine, the use of evidence-based practices has rapidly spread to other fields.
+
+=== Education ===
+More recently, there has been a push for evidence-based education. The use of evidence-based learning techniques such as spaced repetition can improve students' rate of learning. Some commentators have suggested that the lack of any substantial progress in the field of education is attributable to practice resting in the unconnected and noncumulative experience of thousands of individual teachers, each re-inventing the wheel and failing to learn from hard scientific evidence about 'what works'. Opponents of this view argue that it is hard to assess teaching methods because it depends on a host of factors, not least those to do with the style, personality and beliefs of the teacher and the needs of the particular children. Others argue the teacher experience could be combined with research evidence, but without the latter being treated as a privileged source. This is in line with a school of thought suggesting that evidence-based practice has limitations and a better alternative is to use Evidence-informed Practice (EIP). This process includes quantitative evidence, does not include non-scientific prejudices, but includes qualitative factors such as clinical experience and the discernment of practitioners and clients.
+
+== Versus tradition ==
+Evidence-based practice is a philosophical approach that is in opposition to tradition. Some degree of reliance on "the way it was always done" can be found in almost every profession, even when those practices are contradicted by new and better information.
+Some critics argue that since research is conducted on a population level, results may not generalise to each individual within the population. Therefore, evidence-based practices may fail to provide the best solution for each individual, and traditional practices may better accommodate individual differences. In response, researchers have made an effort to test whether particular practices work better for different subcultures, personality types etc. Some authors have redefined evidence-based practice to include practice that incorporates common wisdom, tradition, and personal values alongside practices based on evidence.
+
+== Evaluating evidence ==
--- a/data/en.wikipedia.org/wiki/Evidence-based_practice-1.md
+++ b/data/en.wikipedia.org/wiki/Evidence-based_practice-1.md
@ -0,0 +1,37 @@
+---
+title: "Evidence-based practice"
+chunk: 2/3
+source: "https://en.wikipedia.org/wiki/Evidence-based_practice"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:15:38.733240+00:00"
+instance: "kb-cron"
+---
+
+Evaluating scientific research is extremely complex. The process can be greatly simplified with the use of a heuristic that ranks the relative strengths of results obtained from scientific research, which is called a hierarchy of evidence. The design of the study and the endpoints measured (such as survival or quality of life) affect the strength of the evidence. Typically, systematic reviews and meta-analysis rank at the top of the hierarchy while randomized controlled trials rank above observational studies, and expert opinion and case reports rank at the bottom. There is broad agreement on the relative strength of the different types of studies, but there is no single, universally-accepted hierarchy of evidence. More than 80 different hierarchies have been proposed for assessing medical evidence.
+
+== Applications ==
+
+=== Medicine ===
+
+Evidence-based medicine is an approach to medical practice intended to optimize decision-making by emphasizing the use of evidence from well-designed and well-conducted research. Although all medicine based on science has some degree of empirical support, evidence-based medicine goes further, classifying evidence by its epistemologic strength and requiring that only the strongest types (coming from meta-analyses, systematic reviews, and randomized controlled trials) can yield strong recommendations; weaker types (such as from case-control studies) can yield only weak recommendations. The term was originally used to describe an approach to teaching the practice of medicine and improving decisions by individual physicians about individual patients. Use of the term rapidly expanded to include a previously described approach that emphasized the use of evidence in the design of guidelines and policies that apply to groups of patients and populations ("evidence-based practice policies").
+Whether applied to medical education, decisions about individuals, guidelines and policies applied to populations, or administration of health services in general, evidence-based medicine advocates that to the greatest extent possible, decisions and policies should be based on evidence, not just the beliefs of practitioners, experts, or administrators. It thus tries to ensure that a clinician's opinion, which may be limited by knowledge gaps or biases, is supplemented with all available knowledge from the scientific literature so that best practice can be determined and applied. It promotes the use of formal, explicit methods to analyze evidence and makes it available to decision makers. It promotes programs to teach the methods to medical students, practitioners, and policymakers.
+A process has been specified that provides a standardised route for those seeking to produce evidence of the effectiveness of interventions. Originally developed to establish processes for the production of evidence in the housing sector, the standard is general in nature and is applicable across a variety of practice areas and potential outcomes of interest.
+
+=== Mental health ===
+To improve the dissemination of evidence-based practices, the Association for Behavioral and Cognitive Therapies (ABCT) and the Society of Clinical Child and Adolescent Psychology (SCCAP, Division 53 of the American Psychological Association) maintain updated information on their websites on evidence-based practices in psychology for practitioners and the general public. An evidence-based practice consensus statement was developed at a summit on mental healthcare in 2018. As of June 23, 2019, this statement has been endorsed by 36 organizations.
+
+=== Metascience ===
+
+There has since been a movement for the use of evidence-based practice in conducting scientific research in an attempt to address the replication crisis and other major issues affecting scientific research. The application of evidence-based practices to research itself is called metascience, which seeks to increase the quality of scientific research while reducing waste. It is also known as "research on research" and "the science of science", as it uses research methods to study how research is done and where improvements can be made. The five main areas of research in metascience are methodology, reporting, reproducibility, evaluation, and incentives. Metascience has produced a number of reforms in science such as the use of study pre-registration and the implementation of reporting guidelines with the goal of bettering scientific research practices.
+
+=== Education ===
+
+Evidence-based education (EBE), also known as evidence-based interventions, is a model in which policy-makers and educators use empirical evidence to make informed decisions about education interventions (policies, practices, and programs). In other words, decisions are based on scientific evidence rather than opinion.
+EBE has gained attention since English author David H. Hargreaves suggested in 1996 that education would be more effective if teaching, like medicine, was a "research-based profession".
+Since 2000, studies in Australia, England, Scotland and the US have supported the use of research to improve educational practices in teaching reading.
+In 1997, the National Institute of Child Health and Human Development convened a national panel to assess the effectiveness of different approaches used to teach children to read. The resulting National Reading Panel examined quantitative research studies on many areas of reading instruction, including phonics and whole language. In 2000 it published a report entitled Teaching Children to Read: An Evidence-based Assessment of the Scientific Research Literature on Reading and its Implications for Reading Instruction that provided a comprehensive review of what was known about best practices in reading instruction in the U.S.
+This occurred around the same time as such international studies as the Programme for International Student Assessment  in 2000 and the Progress in International Reading Literacy Study in 2001.
+Subsequently, evidence-based practice in education (also known as Scientifically based research), came into prominence in the U.S. under the No child left behind act of 2001, replace in 2015 by the Every Student Succeeds Act.
+In 2002 the U.S. Department of Education founded the Institute of Education Sciences to provide scientific evidence to guide education practice and policy .
+English author Ben Goldacre advocated in 2013 for systemic change and more randomized controlled trials to assess the effects of educational interventions. In 2014 the National Foundation for Educational Research, Berkshire, England published a report entitled Using Evidence in the Classroom: What Works and Why. In 2014 the British Educational Research Association and the Royal Society of Arts advocated for a closer working partnership between teacher-researchers and the wider academic research community.
--- a/data/en.wikipedia.org/wiki/Evidence-based_practice-2.md
+++ b/data/en.wikipedia.org/wiki/Evidence-based_practice-2.md
@ -0,0 +1,27 @@
+---
+title: "Evidence-based practice"
+chunk: 3/3
+source: "https://en.wikipedia.org/wiki/Evidence-based_practice"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:15:38.733240+00:00"
+instance: "kb-cron"
+---
+
+==== Reviews of existing research on education ====
+The following websites offer free analysis and information on education research:
+
+The Best Evidence Encyclopedia is a free website created by the Johns Hopkins University School of Education's Center for Data-Driven Reform in Education (established in 2004) and is funded by the Institute of Education Sciences, U.S. Department of Education.  It gives educators and researchers reviews about the strength of the evidence supporting a variety of English programs available for students in grades K–12. The reviews cover programs in areas such as Mathematics, Reading, Writing, Science, Comprehensive school reform, and Early childhood Education; and include such topics as the effectiveness of technology and struggling readers.
+The Education Endowment Foundation was established in 2011 by The Sutton Trust, as a lead charity in partnership with Impetus Trust, together being the government-designated What Works Centre for UK Education.
+Evidence for the Every Student Succeeds Act began in 2017 and is produced by the Center for Research and Reform in Education at Johns Hopkins University School of Education. It offers free up-to-date information on current PK-12 programs in reading, writing, math, science, and others that meet the standards of the Every Student Succeeds Act (the United States K–12 public education policy signed by President Obama in 2015). It also provides information on programs that do meet the Every Student Succeeds Act standards as well as those that do not.
+What Works Clearinghouse, established in 2002, evaluates numerous educational programs, in twelve categories, by the quality and quantity of the evidence, and the effectiveness. It is operated by the federal National Center for Education Evaluation, and Regional Assistance, part of the Institute of Education Sciences
+Social programs that work is administered by Arnold Ventures LLC's Evidence-Based Policy team. The team is composed of the former leadership of the Coalition for Evidence-Based Policy, a nonprofit, nonpartisan organization advocating the use of well-conducted randomized controlled trials (RCTs) in policy decisions. It offers information on twelve types of social programs including education.
+A variety of other organizations offer information on research and education.
+
+== See also ==
+
+== References ==
+
+== External links ==
+
+"Development of evidence-based medicine explored in oral history video, AMA, JAN 27, 2014". 27 January 2014.