Scrape wikipedia-science: 1199 new, 971 updated, 2223 total (kb-cron)
This commit is contained in:
parent
9716c50981
commit
b948d093ff
@ -4,7 +4,7 @@ chunk: 1/4
|
||||
source: "https://en.wikipedia.org/wiki/Preregistration_(science)"
|
||||
category: "reference"
|
||||
tags: "science, encyclopedia"
|
||||
date_saved: "2026-05-05T03:50:00.803493+00:00"
|
||||
date_saved: "2026-05-05T04:26:22.218636+00:00"
|
||||
instance: "kb-cron"
|
||||
---
|
||||
|
||||
|
||||
@ -4,7 +4,7 @@ chunk: 2/4
|
||||
source: "https://en.wikipedia.org/wiki/Preregistration_(science)"
|
||||
category: "reference"
|
||||
tags: "science, encyclopedia"
|
||||
date_saved: "2026-05-05T03:50:00.803493+00:00"
|
||||
date_saved: "2026-05-05T04:26:22.218636+00:00"
|
||||
instance: "kb-cron"
|
||||
---
|
||||
|
||||
|
||||
@ -4,7 +4,7 @@ chunk: 3/4
|
||||
source: "https://en.wikipedia.org/wiki/Preregistration_(science)"
|
||||
category: "reference"
|
||||
tags: "science, encyclopedia"
|
||||
date_saved: "2026-05-05T03:50:00.803493+00:00"
|
||||
date_saved: "2026-05-05T04:26:22.218636+00:00"
|
||||
instance: "kb-cron"
|
||||
---
|
||||
|
||||
|
||||
@ -4,7 +4,7 @@ chunk: 4/4
|
||||
source: "https://en.wikipedia.org/wiki/Preregistration_(science)"
|
||||
category: "reference"
|
||||
tags: "science, encyclopedia"
|
||||
date_saved: "2026-05-05T03:50:00.803493+00:00"
|
||||
date_saved: "2026-05-05T04:26:22.218636+00:00"
|
||||
instance: "kb-cron"
|
||||
---
|
||||
|
||||
|
||||
29
data/en.wikipedia.org/wiki/Randomized_controlled_trial-0.md
Normal file
29
data/en.wikipedia.org/wiki/Randomized_controlled_trial-0.md
Normal file
@ -0,0 +1,29 @@
|
||||
---
|
||||
title: "Randomized controlled trial"
|
||||
chunk: 1/7
|
||||
source: "https://en.wikipedia.org/wiki/Randomized_controlled_trial"
|
||||
category: "reference"
|
||||
tags: "science, encyclopedia"
|
||||
date_saved: "2026-05-05T04:26:23.353341+00:00"
|
||||
instance: "kb-cron"
|
||||
---
|
||||
|
||||
A randomized controlled trial (RCT) is a type of statistical experiment designed to evaluate the efficacy or safety of an intervention by minimizing bias through the random allocation of participants to one or more comparison groups.
|
||||
In this approach, at least one group receives the intervention or process under study (such as a drug, surgical procedure, medical device or diet), while the other groups receive an alternative treatment, a placebo, or standard care.
|
||||
RCTs are a fundamental methodology in modern clinical trials and have been widely considered one of the highest-quality sources of evidence in evidence-based medicine, due to their ability to reduce selection bias and the influence of confounding factors. However, they have also been criticized for failing to reduce bias in some cases.
|
||||
Participants who enroll in RCTs differ from one another in known and unknown ways that can influence study outcomes, and yet cannot be directly controlled. By randomly allocating participants among compared treatments, an RCT enables statistical control over these influences. Provided it is designed well, conducted properly, and enrolls enough participants, an RCT may achieve sufficient control over these confounding factors to deliver a useful comparison of the treatments studied.
|
||||
|
||||
== Definition and examples ==
|
||||
An RCT in clinical research typically compares a proposed new treatment against an existing standard of care; these are then termed the 'experimental' and 'control' treatments, respectively. When no such generally accepted treatment is available, a placebo may be used in the control group so that participants are blinded, or not given information, about their treatment allocations. This blinding principle is ideally also extended as much as possible to other parties including researchers, technicians, data analysts, and evaluators. Effective blinding experimentally isolates the physiological effects of treatments from various psychological sources of bias.
|
||||
The randomness in the assignment of participants to treatments reduces selection bias and allocation bias, balancing both known and unknown prognostic factors, in the assignment of treatments. Blinding reduces other forms of experimenter and subject biases.
|
||||
A well-blinded RCT is considered the gold standard for clinical trials. Blinded RCTs are commonly used to test the efficacy of medical interventions and may additionally provide information about adverse effects, such as drug reactions. A randomized controlled trial can provide compelling evidence that the study treatment causes an effect on human health.
|
||||
The terms "RCT" and "randomized trial" are sometimes used synonymously, but the latter term omits mention of controls and can therefore describe studies that compare multiple treatment groups with each other in the absence of a control group. Similarly, the initialism is sometimes expanded as "randomized clinical trial" or "randomized comparative trial", leading to ambiguity in the scientific literature. Not all RCTs are randomized controlled trials (and some of them could never be, as in cases where controls would be impractical or unethical to use). The term randomized controlled clinical trial is an alternative term used in clinical research; however, RCTs are also employed in other research areas, including many of the social sciences.
|
||||
|
||||
== History ==
|
||||
In the posthumously published Ortus Medicinae (1648), Jan Baptist van Helmont made the first proposal of a RCT, to test two treatment regimes of fever. One treatment would be conducted by practitioners of Galenic medicine involving bloodletting and purging, and the other would be conducted by van Helmont. It is likely that he never conducted the trial, and merely proposed it as an experiment that could be conducted.
|
||||
The first reported clinical trial was conducted by James Lind in 1747 to identify a treatment for scurvy, and principles for conducting controlled trials were further elaborated by the Irish physician James Henry in 1843. The first blind experiment was conducted by the French Royal Commission on Animal Magnetism in 1784 to investigate the claims of mesmerism. An early essay advocating the blinding of researchers came from Claude Bernard in the latter half of the 19th century. Bernard recommended that the observer of an experiment should not have knowledge of the hypothesis being tested. This suggestion contrasted starkly with the prevalent Enlightenment-era attitude that scientific observation can only be objectively valid when undertaken by a well-educated, informed scientist. The first study recorded to have a blinded researcher was published in 1907 by W. H. R. Rivers and H. N. Webber to investigate the effects of caffeine.
|
||||
Randomized experiments first appeared in psychology, where they were introduced by Charles Sanders Peirce and Joseph Jastrow in the 1880s, and in education. The earliest experiments comparing treatment and control groups were published by Robert Woodworth and Edward Thorndike in 1901, and by John E. Coover and Frank Angell in 1907.
|
||||
In the early 20th century, randomized experiments appeared in agriculture, due to Jerzy Neyman and Ronald A. Fisher. Fisher's experimental research and his writings popularized randomized experiments.
|
||||
The first published Randomized Controlled Trial in medicine appeared in the 1948 paper entitled "Streptomycin treatment of pulmonary tuberculosis", which described a Medical Research Council investigation. One of the authors of that paper was Austin Bradford Hill, who is credited as having conceived the modern RCT.
|
||||
Trial design was further influenced by the large-scale ISIS trials on heart attack treatments that were conducted in the 1980s.
|
||||
By the late 20th century, RCTs were recognized as the standard method for "rational therapeutics" in medicine. As of 2004, more than 150,000 RCTs were in the Cochrane Library. To improve the reporting of RCTs in the medical literature, an international group of scientists and editors published Consolidated Standards of Reporting Trials (CONSORT) Statements in 1996, 2001 and 2010, and these have become widely accepted.
|
||||
43
data/en.wikipedia.org/wiki/Randomized_controlled_trial-1.md
Normal file
43
data/en.wikipedia.org/wiki/Randomized_controlled_trial-1.md
Normal file
@ -0,0 +1,43 @@
|
||||
---
|
||||
title: "Randomized controlled trial"
|
||||
chunk: 2/7
|
||||
source: "https://en.wikipedia.org/wiki/Randomized_controlled_trial"
|
||||
category: "reference"
|
||||
tags: "science, encyclopedia"
|
||||
date_saved: "2026-05-05T04:26:23.353341+00:00"
|
||||
instance: "kb-cron"
|
||||
---
|
||||
|
||||
== Ethics ==
|
||||
Although subjects almost always provide informed consent for their participation in an RCT, studies since 1982 have documented that RCT subjects may believe that they are certain to receive treatment that is best for them personally; that is, they do not understand the difference between research and treatment. Determining the amount of information required to ensure informed consent can be difficult, and further research is necessary to determine the prevalence of and ways to address therapeutic misconception.
|
||||
Placebo-controlled trials have been deemed unethical in instances where not receiving treatment may lead to harm for the patient, such as an aggravation of symptoms or risk of death. Crossover trials, active-controlled trials, and other approaches have been used to mitigate this issue, though these options may not always be suitable for study, and have received their own criticism.
|
||||
Active-controlled trials in particular may raise ethical considerations regarding clinical equipoise. Although the principle of equipoise ("genuine uncertainty within the expert medical community... about the preferred treatment") is common to clinical trials and has been applied to RCTs, equipoise may be difficult to ascertain, and the ethics of RCTs have special considerations. It has been argued that equipoise itself is insufficient to justify RCTs. "Collective equipoise" may also conflict with a lack of personal equipoise (i.e., a personal belief that an intervention is effective), including that of the patient. Zelen's design, which has been used for some RCTs, randomizes subjects before they provide informed consent, which may be ethical for RCTs of screening and selected therapies, but is likely unethical "for most therapeutic trials." While some randomisation approaches have been used to minimize the risk that patients are exposed to less effective treatment, such as randomising patients with unequal rates, or adapting the rates during the trial's duration based on outcomes, these solutions have been criticized for raising more ethical problems than they resolve.
|
||||
Whilst the above issues have resulted in robust practice guidelines around the conduct of RCTs, formulating balanced regulations tends to be difficult. Strict protections may act in favor of indigenous populations, but could fail on a globalised setting, as their imposition urges the outsourcing of trials to countries with poorer standards and more economically vulnerable populations. Frameworks which place great emphasis on patient well-being have also been criticized by some as paternalistic.
|
||||
The RCT method variations may also create cultural effects that have not been well understood. For example, patients with terminal illness may join trials in the hope of being cured, even when treatments are unlikely to be successful.
|
||||
|
||||
=== Medical trial registration ===
|
||||
In 2004, the International Committee of Medical Journal Editors (ICMJE) announced that all trials starting enrolment after July 1, 2005, must be registered prior to consideration for publication in one of the 12 member journals of the committee. However, trial registration may still occur late or not at all.
|
||||
Medical journals have been slow in adapting policies requiring mandatory clinical trial registration as a prerequisite for publication.
|
||||
|
||||
== Classifications ==
|
||||
|
||||
=== By study design ===
|
||||
One way to classify RCTs is by study design. From most to least common in the healthcare literature, the major categories of RCT study designs are:
|
||||
|
||||
Parallel-group – each participant is randomly assigned to a group, and all the participants in the group receive (or do not receive) an intervention.
|
||||
Crossover – over time, each participant receives (or does not receive) an intervention in a random sequence.
|
||||
Stepped-wedge trial - " involves random and sequential crossover of clusters (of subjects) from control to intervention until all clusters are exposed." In the past, this design has been called a "waiting list designs" or "phased implementations."
|
||||
Cluster – pre-existing groups of participants (e.g., villages, schools) are randomly selected to receive (or not receive) an intervention.
|
||||
Factorial – each participant is randomly assigned to a group that receives a particular combination of interventions or non-interventions (e.g., group 1 receives vitamin X and vitamin Y, group 2 receives vitamin X and placebo Y, group 3 receives placebo X and vitamin Y, and group 4 receives placebo X and placebo Y).
|
||||
An analysis of the 616 RCTs indexed in PubMed during December 2006 found that 78% were parallel-group trials, 16% were crossover, 2% were split-body, 2% were cluster, and 2% were factorial.
|
||||
|
||||
=== By outcome of interest (efficacy vs. effectiveness) ===
|
||||
|
||||
RCTs can be classified as "explanatory" or "pragmatic." Explanatory RCTs test efficacy in a research setting with highly selected participants and under highly controlled conditions. In contrast, pragmatic RCTs (pRCTs) test effectiveness in everyday practice with relatively unselected participants and under flexible conditions; in this way, pragmatic RCTs can "inform decisions about practice."
|
||||
|
||||
=== By hypothesis (superiority vs. noninferiority vs. equivalence) ===
|
||||
|
||||
Another classification of RCTs categorizes them as "superiority trials", "noninferiority trials", and "equivalence trials", which differ in methodology and reporting. Most RCTs are superiority trials, in which one intervention is hypothesized to be superior to another in a statistically significant way. Some RCTs are noninferiority trials "to determine whether a new treatment is no worse than a reference treatment." Other RCTs are equivalence trials in which the hypothesis is that two interventions are indistinguishable from each other.
|
||||
|
||||
== Randomization ==
|
||||
The advantages of proper randomization in RCTs include:
|
||||
41
data/en.wikipedia.org/wiki/Randomized_controlled_trial-2.md
Normal file
41
data/en.wikipedia.org/wiki/Randomized_controlled_trial-2.md
Normal file
@ -0,0 +1,41 @@
|
||||
---
|
||||
title: "Randomized controlled trial"
|
||||
chunk: 3/7
|
||||
source: "https://en.wikipedia.org/wiki/Randomized_controlled_trial"
|
||||
category: "reference"
|
||||
tags: "science, encyclopedia"
|
||||
date_saved: "2026-05-05T04:26:23.353341+00:00"
|
||||
instance: "kb-cron"
|
||||
---
|
||||
|
||||
"It eliminates bias in treatment assignment," specifically selection bias and confounding.
|
||||
"It facilitates blinding (masking) of the identity of treatments from investigators, participants, and assessors."
|
||||
"It permits the use of probability theory to express the likelihood that any difference in outcome between treatment groups merely indicates chance."
|
||||
There are two processes involved in randomizing patients to different interventions. First is choosing a randomization procedure to generate an unpredictable sequence of allocations; this may be a simple random assignment of patients to any of the groups at equal probabilities, may be "restricted", or may be "adaptive." A second and more practical issue is allocation concealment, which refers to the stringent precautions taken to ensure that the group assignment of patients are not revealed prior to definitively allocating them to their respective groups. Non-random "systematic" methods of group assignment, such as alternating subjects between one group and the other, can cause "limitless contamination possibilities" and can cause a breach of allocation concealment.
|
||||
However empirical evidence that adequate randomization changes outcomes relative to inadequate randomization has been difficult to detect.
|
||||
|
||||
=== Procedures ===
|
||||
The treatment allocation is the desired proportion of patients in each treatment arm.
|
||||
An ideal randomization procedure would achieve the following goals:
|
||||
|
||||
Maximize statistical power, especially in subgroup analyses. Generally, equal group sizes maximize statistical power, however, unequal groups sizes may be more powerful for some analyses (e.g., multiple comparisons of placebo versus several doses using Dunnett's procedure ), and are sometimes desired for non-analytic reasons (e.g., patients may be more motivated to enroll if there is a higher chance of getting the test treatment, or regulatory agencies may require a minimum number of patients exposed to treatment).
|
||||
Minimize selection bias. This may occur if investigators can consciously or unconsciously preferentially enroll patients between treatment arms. A good randomization procedure will be unpredictable so that investigators cannot guess the next subject's group assignment based on prior treatment assignments. The risk of selection bias is highest when previous treatment assignments are known (as in unblinded studies) or can be guessed (perhaps if a drug has distinctive side effects).
|
||||
Minimize allocation bias (or confounding). This may occur when covariates that affect the outcome are not equally distributed between treatment groups, and the treatment effect is confounded with the effect of the covariates (i.e., an "accidental bias"). If the randomization procedure causes an imbalance in covariates related to the outcome across groups, estimates of effect may be biased if not adjusted for the covariates (which may be unmeasured and therefore impossible to adjust for).
|
||||
However, no single randomization procedure meets those goals in every circumstance, so researchers must select a procedure for a given study based on its advantages and disadvantages.
|
||||
|
||||
==== Simple ====
|
||||
This is a commonly used and intuitive procedure, similar to "repeated fair coin-tossing." Also known as "complete" or "unrestricted" randomization, it is robust against both selection and accidental biases. However, its main drawback is the possibility of imbalanced group sizes in small RCTs. It is therefore recommended only for RCTs with over 200 subjects.
|
||||
|
||||
==== Restricted ====
|
||||
To balance group sizes in smaller RCTs, some form of "restricted" randomization is recommended. The major types of restricted randomization used in RCTs are:
|
||||
|
||||
Permuted-block randomization or blocked randomization: a "block size" and "allocation ratio" (number of subjects in one group versus the other group) are specified, and subjects are allocated randomly within each block. For example, a block size of 6 and an allocation ratio of 2:1 would lead to random assignment of 4 subjects to one group and 2 to the other. This type of randomization can be combined with "stratified randomization", for example by center in a multicenter trial, to "ensure good balance of participant characteristics in each group." A special case of permuted-block randomization is random allocation, in which the entire sample is treated as one block. The major disadvantage of permuted-block randomization is that even if the block sizes are large and randomly varied, the procedure can lead to selection bias. Another disadvantage is that "proper" analysis of data from permuted-block-randomized RCTs requires stratification by blocks.
|
||||
Adaptive biased-coin randomization methods (of which urn randomization is the most widely known type): In these relatively uncommon methods, the probability of being assigned to a group decreases if the group is overrepresented and increases if the group is underrepresented. The methods are thought to be less affected by selection bias than permuted-block randomization.
|
||||
|
||||
==== Adaptive ====
|
||||
At least two types of "adaptive" randomization procedures have been used in RCTs, but much less frequently than simple or restricted randomization:
|
||||
|
||||
Covariate-adaptive randomization, of which one type is minimization: The probability of being assigned to a group varies in order to minimize "covariate imbalance." Minimization is reported to have "supporters and detractors" because only the first subject's group assignment is truly chosen at random, the method does not necessarily eliminate bias on unknown factors.
|
||||
Response-adaptive randomization, also known as outcome-adaptive randomization: The probability of being assigned to a group increases if the responses of the prior patients in the group were favorable. Although arguments have been made that this approach is more ethical than other types of randomization when the probability that a treatment is effective or ineffective increases during the course of an RCT, ethicists have not yet studied the approach in detail.
|
||||
|
||||
=== Allocation concealment ===
|
||||
42
data/en.wikipedia.org/wiki/Randomized_controlled_trial-3.md
Normal file
42
data/en.wikipedia.org/wiki/Randomized_controlled_trial-3.md
Normal file
@ -0,0 +1,42 @@
|
||||
---
|
||||
title: "Randomized controlled trial"
|
||||
chunk: 4/7
|
||||
source: "https://en.wikipedia.org/wiki/Randomized_controlled_trial"
|
||||
category: "reference"
|
||||
tags: "science, encyclopedia"
|
||||
date_saved: "2026-05-05T04:26:23.353341+00:00"
|
||||
instance: "kb-cron"
|
||||
---
|
||||
|
||||
"Allocation concealment" (defined as "the procedure for protecting the randomization process so that the treatment to be allocated is not known before the patient is entered into the study") is important in RCTs. In practice, clinical investigators in RCTs often find it difficult to maintain impartiality. Stories abound of investigators holding up sealed envelopes to lights or ransacking offices to determine group assignments in order to dictate the assignment of their next patient. Such practices introduce selection bias and confounders (both of which should be minimized by randomization), possibly distorting the results of the study. Adequate allocation concealment should defeat patients and investigators from discovering treatment allocation once a study is underway and after the study has concluded. Treatment related side-effects or adverse events may be specific enough to reveal allocation to investigators or patients thereby introducing bias or influencing any subjective parameters collected by investigators or requested from subjects.
|
||||
Some standard methods of ensuring allocation concealment include sequentially numbered, opaque, sealed envelopes (SNOSE); sequentially numbered containers; pharmacy controlled randomization; and central randomization. It is recommended that allocation concealment methods be included in an RCT's protocol, and that the allocation concealment methods should be reported in detail in a publication of an RCT's results; however, a 2005 study determined that most RCTs have unclear allocation concealment in their protocols, in their publications, or both. On the other hand, a 2008 study of 146 meta-analyses concluded that the results of RCTs with inadequate or unclear allocation concealment tended to be biased toward beneficial effects only if the RCTs' outcomes were subjective as opposed to objective.
|
||||
|
||||
=== Sample size ===
|
||||
|
||||
The number of treatment units (subjects or groups of subjects) assigned to control and treatment groups, affects an RCT's reliability. If the effect of the treatment is small, the number of treatment units in either group may be insufficient for rejecting the null hypothesis in the respective statistical test. The failure to reject the null hypothesis would imply that the treatment shows no statistically significant effect on the treated in a given test. But as the sample size increases, the same RCT may be able to demonstrate a significant effect of the treatment, even if this effect is small.
|
||||
|
||||
== Blinding ==
|
||||
|
||||
An RCT may be blinded, (also called "masked") by "procedures that prevent study participants, caregivers, or outcome assessors from knowing which intervention was received." Unlike allocation concealment, blinding is sometimes inappropriate or impossible to perform in an RCT; for example, if an RCT involves a treatment in which active participation of the patient is necessary (e.g., physical therapy), participants cannot be blinded to the intervention.
|
||||
Traditionally, blinded RCTs have been classified as "single-blind", "double-blind", or "triple-blind"; however, in 2001 and 2006 two studies showed that these terms have different meanings for different people. The 2010 CONSORT Statement specifies that authors and editors should not use the terms "single-blind", "double-blind", and "triple-blind"; instead, reports of blinded RCT should discuss "If done, who was blinded after assignment to interventions (for example, participants, care providers, those assessing outcomes) and how."
|
||||
RCTs without blinding are referred to as "unblinded", "open", or (if the intervention is a medication) "open-label". In 2008 a study concluded that the results of unblinded RCTs tended to be biased toward beneficial effects only if the RCTs' outcomes were subjective as opposed to objective; for example, in an RCT of treatments for multiple sclerosis, unblinded neurologists (but not the blinded neurologists) felt that the treatments were beneficial. In pragmatic RCTs, although the participants and providers are often unblinded, it is "still desirable and often possible to blind the assessor or obtain an objective source of data for evaluation of outcomes."
|
||||
|
||||
== Analysis of data ==
|
||||
The types of statistical methods used in RCTs depend on the characteristics of the data and include:
|
||||
|
||||
For dichotomous (binary) outcome data, logistic regression (e.g., to predict sustained virological response after receipt of peginterferon alfa-2a for hepatitis C) and other methods can be used.
|
||||
For continuous outcome data, analysis of covariance (e.g., for changes in blood lipid levels after receipt of atorvastatin after acute coronary syndrome) tests the effects of predictor variables.
|
||||
For time-to-event outcome data that may be censored, survival analysis (e.g., Kaplan–Meier estimators and Cox proportional hazards models for time to coronary heart disease after receipt of hormone replacement therapy in menopause) is appropriate.
|
||||
Regardless of the statistical methods used, important considerations in the analysis of RCT data include:
|
||||
|
||||
Whether an RCT should be stopped early due to interim results. For example, RCTs may be stopped early if an intervention produces "larger than expected benefit or harm", or if "investigators find evidence of no important difference between experimental and control interventions."
|
||||
The extent to which the groups can be analyzed exactly as they existed upon randomization (i.e., whether a so-called "intention-to-treat analysis" is used). A "pure" intention-to-treat analysis is "possible only when complete outcome data are available" for all randomized subjects; when some outcome data are missing, options include analyzing only cases with known outcomes and using imputed data. Nevertheless, the more that analyses can include all participants in the groups to which they were randomized, the less bias that an RCT will be subject to.
|
||||
Whether subgroup analysis should be performed. These are "often discouraged" because multiple comparisons may produce false positive findings that cannot be confirmed by other studies.
|
||||
|
||||
== Reporting of results ==
|
||||
The CONSORT 2010 Statement is "an evidence-based, minimum set of recommendations for reporting RCTs." The CONSORT 2010 checklist contains 25 items (many with sub-items) focusing on "individually randomised, two group, parallel trials" which are the most common type of RCT.
|
||||
For other RCT study designs, "CONSORT extensions" have been published, some examples are:
|
||||
|
||||
Consort 2010 Statement: Extension to Cluster Randomised Trials
|
||||
Consort 2010 Statement: Non-Pharmacologic Treatment Interventions
|
||||
"Reporting of surrogate endpoints in randomised controlled trial reports (CONSORT-Surrogate): extension checklist with explanation and elaboration"
|
||||
37
data/en.wikipedia.org/wiki/Randomized_controlled_trial-4.md
Normal file
37
data/en.wikipedia.org/wiki/Randomized_controlled_trial-4.md
Normal file
@ -0,0 +1,37 @@
|
||||
---
|
||||
title: "Randomized controlled trial"
|
||||
chunk: 5/7
|
||||
source: "https://en.wikipedia.org/wiki/Randomized_controlled_trial"
|
||||
category: "reference"
|
||||
tags: "science, encyclopedia"
|
||||
date_saved: "2026-05-05T04:26:23.353341+00:00"
|
||||
instance: "kb-cron"
|
||||
---
|
||||
|
||||
=== Relative importance and observational studies ===
|
||||
Two studies published in The New England Journal of Medicine in 2000 found that observational studies and RCTs overall produced similar results. The authors of the 2000 findings questioned the belief that "observational studies should not be used for defining evidence-based medical care" and that RCTs' results are "evidence of the highest grade." However, a 2001 study published in Journal of the American Medical Association concluded that "discrepancies beyond chance do occur and differences in estimated magnitude of treatment effect are very common" between observational studies and RCTs. According to a 2014 (updated in 2024) Cochrane review, there is little evidence for significant effect differences between observational studies and randomized controlled trials. To evaluate differences it is necessary to consider things other than design, such as heterogeneity, population, intervention or comparator.
|
||||
Two other lines of reasoning question RCTs' contribution to scientific knowledge beyond other types of studies:
|
||||
|
||||
If study designs are ranked by their potential for new discoveries, then anecdotal evidence would be at the top of the list, followed by observational studies, followed by RCTs.
|
||||
RCTs may be unnecessary for treatments that have dramatic and rapid effects relative to the expected stable or progressively worse natural course of the condition treated. One example is combination chemotherapy including cisplatin for metastatic testicular cancer, which increased the cure rate from 5% to 60% in a 1977 non-randomized study.
|
||||
|
||||
=== Interpretation of statistical results ===
|
||||
Like all statistical methods, RCTs are subject to both type I ("false positive") and type II ("false negative") statistical errors. Regarding Type I errors, a typical RCT will use 0.05 (i.e., 1 in 20) as the probability that the RCT will falsely find two equally effective treatments significantly different. Regarding Type II errors, despite the publication of a 1978 paper noting that the sample sizes of many "negative" RCTs were too small to make definitive conclusions about the negative results, by 2005-2006 a sizeable proportion of RCTs still had inaccurate or incompletely reported sample size calculations.
|
||||
|
||||
=== Peer review ===
|
||||
Peer review of results is an important part of the scientific method. Reviewers examine the study results for potential problems with design that could lead to unreliable results (for example by creating a systematic bias), evaluate the study in the context of related studies and other evidence, and evaluate whether the study can be reasonably considered to have proven its conclusions. To underscore the need for peer review and the danger of overgeneralizing conclusions, two Boston-area medical researchers performed a randomized controlled trial in which they randomly assigned either a parachute or an empty backpack to 23 volunteers who jumped from either a biplane or a helicopter. The study was able to accurately report that parachutes fail to reduce injury compared to empty backpacks. The key context that limited the general applicability of this conclusion was that the aircraft were parked on the ground, and participants had only jumped about two feet.
|
||||
|
||||
== Advantages ==
|
||||
RCTs are considered to be the most reliable form of scientific evidence in the hierarchy of evidence that influences healthcare policy and practice because RCTs reduce spurious causality and bias. Results of RCTs may be combined in systematic reviews which are increasingly being used in the conduct of evidence-based practice. Some examples of scientific organizations' considering RCTs or systematic reviews of RCTs to be the highest-quality evidence available are:
|
||||
|
||||
As of 1998, the National Health and Medical Research Council of Australia designated "Level I" evidence as that "obtained from a systematic review of all relevant randomised controlled trials" and "Level II" evidence as that "obtained from at least one properly designed randomised controlled trial."
|
||||
Since at least 2001, in making clinical practice guideline recommendations the United States Preventive Services Task Force has considered both a study's design and its internal validity as indicators of its quality. It has recognized "evidence obtained from at least one properly randomized controlled trial" with good internal validity (i.e., a rating of "I-good") as the highest quality evidence available to it.
|
||||
The GRADE Working Group concluded in 2008 that "randomised trials without important limitations constitute high quality evidence."
|
||||
For issues involving "Therapy/Prevention, Aetiology/Harm", the Oxford Centre for Evidence-based Medicine as of 2011 defined "Level 1a" evidence as a systematic review of RCTs that are consistent with each other, and "Level 1b" evidence as an "individual RCT (with narrow Confidence Interval)."
|
||||
Notable RCTs with unexpected results that contributed to changes in clinical practice include:
|
||||
|
||||
After Food and Drug Administration approval, the antiarrhythmic agents flecainide and encainide came to market in 1986 and 1987 respectively. The non-randomized studies concerning the drugs were characterized as "glowing", and their sales increased to a combined total of approximately 165,000 prescriptions per month in early 1989. In that year, however, a preliminary report of an RCT concluded that the two drugs increased mortality. Sales of the drugs then decreased.
|
||||
Prior to 2002, based on observational studies, it was routine for physicians to prescribe hormone replacement therapy for post-menopausal women to prevent myocardial infarction. In 2002 and 2004, however, published RCTs from the Women's Health Initiative claimed that women taking hormone replacement therapy with estrogen plus progestin had a higher rate of myocardial infarctions than women on a placebo, and that estrogen-only hormone replacement therapy caused no reduction in the incidence of coronary heart disease. Possible explanations for the discrepancy between the observational studies and the RCTs involved differences in methodology, in the hormone regimens used, and in the populations studied. The use of hormone replacement therapy decreased after publication of the RCTs.
|
||||
|
||||
== Disadvantages ==
|
||||
Many papers discuss the disadvantages of RCTs. Among the most frequently cited drawbacks are:
|
||||
28
data/en.wikipedia.org/wiki/Randomized_controlled_trial-5.md
Normal file
28
data/en.wikipedia.org/wiki/Randomized_controlled_trial-5.md
Normal file
@ -0,0 +1,28 @@
|
||||
---
|
||||
title: "Randomized controlled trial"
|
||||
chunk: 6/7
|
||||
source: "https://en.wikipedia.org/wiki/Randomized_controlled_trial"
|
||||
category: "reference"
|
||||
tags: "science, encyclopedia"
|
||||
date_saved: "2026-05-05T04:26:23.353341+00:00"
|
||||
instance: "kb-cron"
|
||||
---
|
||||
|
||||
=== Time and costs ===
|
||||
RCTs can be expensive; one study found 28 Phase III RCTs funded by the National Institute of Neurological Disorders and Stroke prior to 2000 with a total cost of US$335 million, for a mean cost of US$12 million per RCT. Nevertheless, the return on investment of RCTs may be high, in that the same study projected that the 28 RCTs produced a "net benefit to society at 10-years" of 46 times the cost of the trials program, based on evaluating a quality-adjusted life year as equal to the prevailing mean per capita gross domestic product.
|
||||
The conduct of an RCT takes several years until being published; thus, data is restricted from the medical community for long years and may be of less relevance at time of publication.
|
||||
It is costly to maintain RCTs for the years or decades that would be ideal for evaluating some interventions.
|
||||
Interventions to prevent events that occur only infrequently (e.g., sudden infant death syndrome) and uncommon adverse outcomes (e.g., a rare side effect of a drug) would require RCTs with extremely large sample sizes and may, therefore, best be assessed by observational studies.
|
||||
Due to the costs of running RCTs, these usually only inspect one variable or very few variables, rarely reflecting the full picture of a complicated medical situation; whereas the case report, for example, can detail many aspects of the patient's medical situation (e.g. patient history, physical examination, diagnosis, psychosocial aspects, follow up).
|
||||
|
||||
=== Conflict of interest dangers ===
|
||||
A 2011 study done to disclose possible conflicts of interests in underlying research studies used for medical meta-analyses reviewed 29 meta-analyses and found that conflicts of interests in the studies underlying the meta-analyses were rarely disclosed. The 29 meta-analyses included 11 from general medicine journals; 15 from specialty medicine journals, and 3 from the Cochrane Database of Systematic Reviews. The 29 meta-analyses reviewed an aggregate of 509 randomized controlled trials (RCTs). Of these, 318 RCTs reported funding sources with 219 (69%) industry funded. 132 of the 509 RCTs reported author conflict of interest disclosures, with 91 studies (69%) disclosing industry financial ties with one or more authors. The information was, however, seldom reflected in the meta-analyses. Only two (7%) reported RCT funding sources and none reported RCT author-industry ties. The authors concluded "without acknowledgment of COI due to industry funding or author industry financial ties from RCTs included in meta-analyses, readers' understanding and appraisal of the evidence from the meta-analysis may be compromised."
|
||||
Some RCTs are fully or partly funded by the health care industry (e.g., the pharmaceutical industry) as opposed to government, nonprofit, or other sources. A systematic review published in 2003 found four 1986–2002 articles comparing industry-sponsored and nonindustry-sponsored RCTs, and in all the articles there was a correlation of industry sponsorship and positive study outcome. A 2004 study of 1999–2001 RCTs published in leading medical and surgical journals determined that industry-funded RCTs "are more likely to be associated with statistically significant pro-industry findings." These results have been mirrored in trials in surgery, where although industry funding did not affect the rate of trial discontinuation it was however associated with a lower odds of publication for completed trials. One possible reason for the pro-industry results in industry-funded published RCTs is publication bias. Other authors have cited the differing goals of academic and industry sponsored research as contributing to the difference. Commercial sponsors may be more focused on performing trials of drugs that have already shown promise in early stage trials, and on replicating previous positive results to fulfill regulatory requirements for drug approval.
|
||||
|
||||
=== Ethics and feasibility ===
|
||||
Whilst RCTs are considered the golden standard of research in evidence-based medicine, they may be inappropriate for study in certain contexts. For instance, RCTs may be improper for studying medical interventions with "obvious" benefits to patients, as such practice would unethically deny the control group of effective treatment. Challenges may also arise in instances where a treatment requires the active participation of participants, such as psychotherapy or approaches based on community development.
|
||||
Historically, it has been difficult to effectively utilize RCTs for the study of surgical procedures. Unlike with the study of medication, where blinding tends to be relatively easy through placebos, blinding of the investigator-surgeon may be impossible within a surgical trial, and the evident physiological impacts of surgery may compromise blinding on the part of the subjects without the use of sham controls, which are only considered possible for a narrow range of surgical interventions.
|
||||
RCTs may also be considered infeasible or unethical for studying the mental health impacts of interventions with obvious physical effects, especially when those are highly sought out by patients, such as with abortion and adolescent transgender healthcare. Other than compromising masking, it is likely that RCT study designs for some of these interventions would also result in high likelihood of withdrawal, non-adherence, and response bias in the control groups, making RCTs potentially unreliable.
|
||||
|
||||
== In social science ==
|
||||
Due to the recent emergence of RCTs in social science, their application in these fields remain a contested issue among academics. Some writers from a medical or health background have argued that existing research in a range of social science disciplines lacks rigour, and should be improved by greater use of randomized control trials. Similarly, many economists have found RCTs are the gold standard for ensuring outcomes represent causal inference and not just correlation. Overall, the adaptation of RCTs into social science has become significant in recent decades.
|
||||
55
data/en.wikipedia.org/wiki/Randomized_controlled_trial-6.md
Normal file
55
data/en.wikipedia.org/wiki/Randomized_controlled_trial-6.md
Normal file
@ -0,0 +1,55 @@
|
||||
---
|
||||
title: "Randomized controlled trial"
|
||||
chunk: 7/7
|
||||
source: "https://en.wikipedia.org/wiki/Randomized_controlled_trial"
|
||||
category: "reference"
|
||||
tags: "science, encyclopedia"
|
||||
date_saved: "2026-05-05T04:26:23.353341+00:00"
|
||||
instance: "kb-cron"
|
||||
---
|
||||
|
||||
=== Economics ===
|
||||
RCTs have become a staple of identifying causal inference among microeconomic studies, particularly in development economics. In 1994, Paul Glewwe, eventual Nobel Prize winner, Michael Kremer, and Sylvie Moulin started one of the earliest RCTs in an economic setting by conducting a long run intervention in a school in Kenya, publishing the results fifteen years later. Three years later in 1997, the largest field experiment in a developing context, the PROGRESA program in Mexico, was studied by a multitude of economic researchers. The impact of RCTs on the discipline has only grown, as economists have found this method as a first-best approach to causal inference identification. While not at the forefront, the use of RCTs has helped to bolster the credibility revolution in empirical microeconomics, as well as becoming popularized as a result of the need for more rigorous identification.
|
||||
Despite the shift towards using RCTs in research, there still remains division between economists on its use. John A. List, a proponent of field experiments, particularly RCTs, finds that this method differs significantly from lab experiments, and therefore provide more robust measures for identification. RCTs also offer the advantage of providing true observational data that can be used where the absence of data would make it difficult to build a causal model with.
|
||||
The American Economic Association maintains a registry of all active and completed RCTs within the discipline. The registry is free to use and is designed to ensure researchers may share information with regard to on-going field work, as well as failures or limitations of study settings. Since its founding in 2013, the AEA has tracked over 7,400 field experiments across 100 countries, with annual RCT registries growing year over year.
|
||||
|
||||
=== Transport science ===
|
||||
Researchers in transport science argue that public spending on programmes such as school travel plans could not be justified unless their efficacy is demonstrated by randomized controlled trials. Graham-Rowe and colleagues reviewed 77 evaluations of transport interventions found in the literature, categorising them into 5 "quality levels". They concluded that most of the studies were of low quality and advocated the use of randomized controlled trials wherever possible in future transport research.
|
||||
Dr. Steve Melia took issue with these conclusions, arguing that claims about the advantages of RCTs, in establishing causality and avoiding bias, have been exaggerated. He proposed the following eight criteria for the use of RCTs in contexts where interventions must change human behaviour to be effective:
|
||||
The intervention:
|
||||
|
||||
Has not been applied to all members of a unique group of people (e.g. the population of a whole country, all employees of a unique organisation etc.)
|
||||
Is applied in a context or setting similar to that which applies to the control group
|
||||
Can be isolated from other activities—and the purpose of the study is to assess this isolated effect
|
||||
Has a short timescale between its implementation and maturity of its effects
|
||||
And the causal mechanisms:
|
||||
|
||||
Are either known to the researchers, or else all possible alternatives can be tested
|
||||
Do not involve significant feedback mechanisms between the intervention group and external environments
|
||||
Have a stable and predictable relationship to exogenous factors
|
||||
Would act in the same way if the control group and intervention group were reversed
|
||||
|
||||
=== Criminology ===
|
||||
A 2005 review found 83 randomized experiments in criminology published in 1982–2004, compared with only 35 published in 1957–1981. The authors classified the studies they found into five categories: "policing", "prevention", "corrections", "court", and "community". Focusing only on offending behavior programs, Hollin (2008) argued that RCTs may be difficult to implement (e.g., if an RCT required "passing sentences that would randomly assign offenders to programmes") and therefore that experiments with quasi-experimental design are still necessary.
|
||||
|
||||
=== Education ===
|
||||
RCTs have been used in evaluating a number of educational interventions. Between 1980 and 2016, over 1,000 reports of RCTs have been published. For example, a 2009 study randomized 260 elementary school teachers' classrooms to receive or not receive a program of behavioral screening, classroom intervention, and parent training, and then measured the behavioral and academic performance of their students. Another 2009 study randomized classrooms for 678 first-grade children to receive a classroom-centered intervention, a parent-centered intervention, or no intervention, and then followed their academic outcomes through age 19.
|
||||
|
||||
== Criticism ==
|
||||
A 2018 review of the 10 most cited randomised controlled trials noted poor distribution of background traits, difficulties with blinding, and discussed other assumptions and biases inherent in randomised controlled trials. These include the "unique time period assessment bias", the "background traits remain constant assumption", the "average treatment effects limitation", the "simple treatment at the individual level limitation", the "all preconditions are fully met assumption", the "quantitative variable limitation" and the "placebo only or conventional treatment only limitation".
|
||||
|
||||
== See also ==
|
||||
Drug development
|
||||
Hypothesis testing
|
||||
Impact evaluation
|
||||
Jadad scale
|
||||
Pipeline planning
|
||||
Patient and public involvement
|
||||
Observational study
|
||||
Blinded experiment
|
||||
Statistical inference
|
||||
Royal Commission on Animal Magnetism – 1784 French scientific bodies' investigations involving systematic controlled trials
|
||||
|
||||
== References ==
|
||||
|
||||
== Further reading ==
|
||||
23
data/en.wikipedia.org/wiki/Rapid_reviews-0.md
Normal file
23
data/en.wikipedia.org/wiki/Rapid_reviews-0.md
Normal file
@ -0,0 +1,23 @@
|
||||
---
|
||||
title: "Rapid reviews"
|
||||
chunk: 1/1
|
||||
source: "https://en.wikipedia.org/wiki/Rapid_reviews"
|
||||
category: "reference"
|
||||
tags: "science, encyclopedia"
|
||||
date_saved: "2026-05-05T04:26:24.531048+00:00"
|
||||
instance: "kb-cron"
|
||||
---
|
||||
|
||||
Rapid reviews are a systematic survey of literature on a topic or question of interest. Compared to a systematic review of literature, in a rapid review, several design decisions and practical steps are undertaken to reduce the time it takes to identify, aggregate and answer the question of interest. The Cochrane Rapid Reviews Methods Group proposes that rapid reviews can take different forms, and they define rapid reviews as: "A form of knowledge synthesis that accelerates the process of conducting a traditional systematic review through streamlining or omitting specific methods to produce evidence for stakeholders in a resource-efficient manner".
|
||||
|
||||
|
||||
== In medicine and healthcare ==
|
||||
Rapid reviews are a form of evidence synthesis, similar to a systematic review, that can be used to inform decision-making and healthcare initiative. The World Health Organization (WHO) considers rapid reviews as a way of generating evidence in a short period using an abbreviated systematic review method. During the COVID-19 pandemic rapid reviews were employed to answer pressing questions under strict time constraints.
|
||||
|
||||
|
||||
== In software engineering ==
|
||||
For Software Engineering, Rico et al. have recently adapted and extended the rapid review method. Their proposal takes into account the unique requirements of industry-academia collaboration in SE research. The extension proposed by them highlights the ways in which practitioners and researchers can collaborate in the planning, design and conduct of a rapid review.
|
||||
The guidelines by Rico et al. have been used in two rapid reviews, one on machine learning and another on software component selection
|
||||
|
||||
|
||||
== References ==
|
||||
21
data/en.wikipedia.org/wiki/Realist_Evaluation-0.md
Normal file
21
data/en.wikipedia.org/wiki/Realist_Evaluation-0.md
Normal file
@ -0,0 +1,21 @@
|
||||
---
|
||||
title: "Realist Evaluation"
|
||||
chunk: 1/1
|
||||
source: "https://en.wikipedia.org/wiki/Realist_Evaluation"
|
||||
category: "reference"
|
||||
tags: "science, encyclopedia"
|
||||
date_saved: "2026-05-05T04:26:25.759116+00:00"
|
||||
instance: "kb-cron"
|
||||
---
|
||||
|
||||
Realist evaluation or realist review (also realist synthesis) is a type of theory-driven evaluation used in evaluating social programmes. It was originally based on the epistemological foundations of critical realism. Ray Pawson, one of the originators of realist evaluation was "initially impressed" by how critical realism explains generative causation in experimental science; however, he later criticised its "philosophical grandstanding" and "explain-all Marxism".
|
||||
Based on specific theories, realist evaluation provides an alternative lens to empiricist evaluation techniques for the study and understanding of programmes and policies. Some writers on realist evaluation argue that interventions are theories. This technique assumes that knowledge is a social and historical product, thus the social and political context as well as theoretical mechanisms, need consideration in analysis of programme or policy effectiveness.
|
||||
Realist evaluation techniques recognise that there are many interwoven variables operative at different levels in society, thus this evaluation method suits complex social interventions, rather than traditional cause-effect, non-contextual methods of analysis. This realist technique acknowledges that intervention programmes and policy changes do not necessarily work for everyone, since people are different and are embedded in different contexts.
|
||||
Realist evaluation was popularised by the work of Ray Pawson and Nick Tilley in 1997. They described the procedure followed in the implementation of realist evaluation techniques in programme evaluation and emphasise that once hypotheses have been generated and data collected, the outcomes of the programme are explored, focusing on the groups that the programme benefitted and those who did not benefit. Effectiveness of a programme is thus not dependent on the outcomes alone (cause–effect), rather there is a consideration of the theoretical mechanisms that are applied, and the socio-historical context in which the programmes were implemented. Thus, the final explanation of a programme considers context-mechanism-outcome.
|
||||
All research methods are applicable in realist evaluations, according to Pawson and Tilley (1997):
|
||||
|
||||
"... it is quite possible to carry out realistic evaluation using: strategies, quantitative and qualitative; timescales, contemporaneous or historical; viewpoints, cross-sectional or longitudinal; samples, large or small; goals, action-oriented or audit-centred; and so on and so forth."
|
||||
Later work emphasised that the realist evaluation approach was an attempt to introduce basic ideas from social science to evaluation. For example (Pawson & Tilley, 2001, p. 324):We have argued that good evaluation is good social science. For us, this embraces the gallant aims of precision in articulation of theory, rigor in empirical testing, confederation in lines of inquiry, and cumulation in the body of findings. The “realist movement,” of which we are a part, is often considered the brash upstart of the evaluation schools. In fact, it depends on these rather venerable ideas. The future, for us, thus lies in keeping faith with some of the grand old principles of social science and in not forgetting the hard-won lessons of the old studies.Although context-mechanism-outcome configurations are often seen as the hallmark of realist evaluations, this too is argued to be pervasive across social and other sciences (Pawson, 2024, p. 42):All scientific investigation utilises explanations relating mechanisms and contexts to empirical patterns.A 2024 book argues that it is possible to run realist randomized controlled trials and Gill Westhorp and Simon Feeny (2024) explain the relevance of surveys and regression models (including interaction terms and covariate adjustment) to testing Context-Mechanism-Outcome configurations. This form of theory-driven evaluation has been increasingly used across a variety of different settings and research agendas including health systems and social policy. Guidelines and methodological resources on realist evaluation have been translated and made available in Spanish through the RAÍCES initiative.
|
||||
|
||||
|
||||
== References ==
|
||||
@ -4,7 +4,7 @@ chunk: 1/1
|
||||
source: "https://en.wikipedia.org/wiki/Secondary_research"
|
||||
category: "reference"
|
||||
tags: "science, encyclopedia"
|
||||
date_saved: "2026-05-05T03:45:52.254825+00:00"
|
||||
date_saved: "2026-05-05T04:26:26.994087+00:00"
|
||||
instance: "kb-cron"
|
||||
---
|
||||
|
||||
|
||||
35
data/en.wikipedia.org/wiki/Single-arm_study_design-0.md
Normal file
35
data/en.wikipedia.org/wiki/Single-arm_study_design-0.md
Normal file
@ -0,0 +1,35 @@
|
||||
---
|
||||
title: "Single-arm study design"
|
||||
chunk: 1/1
|
||||
source: "https://en.wikipedia.org/wiki/Single-arm_study_design"
|
||||
category: "reference"
|
||||
tags: "science, encyclopedia"
|
||||
date_saved: "2026-05-05T04:26:28.254007+00:00"
|
||||
instance: "kb-cron"
|
||||
---
|
||||
|
||||
A single-arm study design is a type of clinical or experimental study where all participants receive the same intervention, with no control or placebo group. The term "single-arm" refers to the presence of only one participant group, in contrast to designs such as randomized controlled trials, which include both a treatment arm and control or placebo arm.
|
||||
|
||||
|
||||
== Overview ==
|
||||
Single-arm trials include one experimental group without the inclusion of a parallel control group. The design is open-label and does not involve randomization or blinding. This type of study design is commonly applied to advance stage cancer, rare diseases, emerging infectious diseases, new treatment methods, and medical devices.
|
||||
While randomized control trials are considered the “gold standard” in clinical research, they are not always feasible due to limitations in the study population, challenges in obtaining evidence, high costs, and ethical considerations. As a result, single arm trials are used to address these concerns.
|
||||
|
||||
|
||||
== Applications ==
|
||||
Since there is no randomization in single-arm trials, all patients receive the same intervention within a particular study/trial. This design is most commonly used in early-phase clinical trials or in studies of rare or serious diseases, where including a control group may be impractical or unethical.
|
||||
|
||||
Phase I studies primarily assess safety, tolerability, and pharmacokinetics over a relatively short duration. They are used to inform dose selection for subsequent studies.
|
||||
Phase II studies primarily assess preliminary efficacy over a longer duration (months to years). They are used to support proof-of-concept assessments.
|
||||
|
||||
|
||||
== Interpreting results ==
|
||||
With the absence of a control group, effectiveness is measured against an external standard. This can include benchmarks established from previous studies or historical control data, and are used to define minimum efficacy thresholds and expected performance ranges.
|
||||
In oncology, time-to-event outcomes such as progression-free survival and overall survival may be used to assess treatment effects.
|
||||
|
||||
|
||||
== Bias and limitations ==
|
||||
Without a control group or randomization there are several sources of bias. Differences in patient selection, baseline characteristics, and clinical management may affect observed outcomes. Confounding variables cannot be fully controlled, and causal relationships cannot be established with certainty.
|
||||
|
||||
|
||||
== References ==
|
||||
25
data/en.wikipedia.org/wiki/Spaced_repetition-0.md
Normal file
25
data/en.wikipedia.org/wiki/Spaced_repetition-0.md
Normal file
@ -0,0 +1,25 @@
|
||||
---
|
||||
title: "Spaced repetition"
|
||||
chunk: 1/3
|
||||
source: "https://en.wikipedia.org/wiki/Spaced_repetition"
|
||||
category: "reference"
|
||||
tags: "science, encyclopedia"
|
||||
date_saved: "2026-05-05T04:26:29.414379+00:00"
|
||||
instance: "kb-cron"
|
||||
---
|
||||
|
||||
Spaced repetition is an evidence-based learning technique that is usually performed with flashcards. Newly introduced and more difficult flashcards are shown more frequently, while older and less difficult flashcards are shown less frequently in order to exploit the psychological spacing effect. The use of spaced repetition has been proven to increase the rate of learning.
|
||||
|
||||
Although the principle is useful in many contexts, spaced repetition is commonly applied in contexts in which a learner must acquire many items and retain them indefinitely in memory. It is, therefore, well suited for the problem of vocabulary acquisition in the course of second-language learning. A number of spaced repetition software programs have been developed to aid the learning process. It is also possible to perform spaced repetition with physical flashcards using the Leitner system. The testing effect and spaced repetition can be combined to improve long-term memory. Therefore, memorization can be easier to do.
|
||||
|
||||
== History ==
|
||||
The method of spaced repetition was first conceived of in the 1880s by German scientist Hermann Ebbinghaus. Ebbinghaus created the 'forgetting curve'—a graph portraying the loss of learned information over time—and postulated that it can be curbed by reviewing such information at several intervals over a period of time.
|
||||
It was also tested by Thomas Landauer and Robert A. Bjork in 1978; they gathered a group of psychology students, showing the students pictures of a certain individual followed by that individual's name. This is also known as a face-name association. With the repetition of seeing the person's name and face they were able to associate the name and face of that individual shown with the expansion of time due to the spaced repetition.
|
||||
Schacter, Rich, and Stampp in 1985 expanded the research to include people who have amnesia and other memory disorders. The findings showed that using spaced repetition can not only help students with name face association but patients dealing with memory impairments.
|
||||
In 1989, C. J. Camp decided that using this technique with Alzheimer's patients may increase their duration of remembering particular things. These results show that the expansion of the time interval shows the strongest benefits for memory.
|
||||
Spaced repetition is a method where the subject is asked to remember a certain fact with the time intervals increasing each time the fact is presented or said. If the subject is able to recall the information correctly the time is doubled to further help them keep the information fresh in their mind to recall in the future. With this method, the patient is able to place the information in their long-term memory. If they are unable to remember the information they go back to the previous step and continue to practice to help make the technique lasting (Vance & Farr, 2007).
|
||||
The expansion is done to ensure a high success level of recalling the information on the first time and increasing the time interval to make the information long-lasting to help keep the information always accessible in their mind.
|
||||
Throughout the development of spaced repetition, they have found that patients using this technique with dementia are able to recall the information weeks—even months—later. The technique has been successful in helping dementia patients remember particular objects' names, daily tasks, name face association, information about themselves, and many other facts and behaviors (Small, 2012). Sufficient test evidence shows that spaced repetition is valuable in learning new information and recalling information from the past.
|
||||
Small combines the works and findings of quite a few scientists to come up with five reasons why spaced repetition works: it helps show the relationship of routine memories, it shows the benefits of learning things with an expansion of time, it helps the patient with Alzheimer's dementia keep their brain active, it has a high success level with little to no errors, and the technique is meaningful for the patient to do and remember more things) Joltin et al. (2003), had a caregiver train a woman with Alzheimer's by giving her the name of her grandchild over the phone while asking her to associate with the picture of the grandchild posted on the refrigerator. After training, the woman was able to recall the name of her grandchild five days later.
|
||||
|
||||
== Research and application ==
|
||||
23
data/en.wikipedia.org/wiki/Spaced_repetition-1.md
Normal file
23
data/en.wikipedia.org/wiki/Spaced_repetition-1.md
Normal file
@ -0,0 +1,23 @@
|
||||
---
|
||||
title: "Spaced repetition"
|
||||
chunk: 2/3
|
||||
source: "https://en.wikipedia.org/wiki/Spaced_repetition"
|
||||
category: "reference"
|
||||
tags: "science, encyclopedia"
|
||||
date_saved: "2026-05-05T04:26:29.414379+00:00"
|
||||
instance: "kb-cron"
|
||||
---
|
||||
|
||||
The notion that spaced repetition could be used for improving learning was first proposed in the book Psychology of Study by C. A. Mace in 1932: "Perhaps the most important discoveries are those which relate to the appropriate distribution of the periods of study... Acts of revision should be spaced in gradually increasing intervals, roughly intervals of one day, two days, four days, eight days, and so on."
|
||||
In 1939, H. F. Spitzer tested the effects of a type of spaced repetition on sixth-grade students in Iowa who were learning science facts. Spitzer tested over 3600 students in Iowa and showed that spaced repetition was effective. This early work went unnoticed, and the field was relatively quiet until the late 1960s when cognitive psychologists, including Melton and Landauer and Bjork, explored manipulation of repetition timing as a means to improve recall. Around the same time, Pimsleur language courses pioneered the practical application of spaced repetition theory to language learning, and in 1973 Sebastian Leitner devised his "Leitner system", an all-purpose spaced repetition learning system based on flashcards.
|
||||
With the increase in access to personal computers in the 1980s, spaced repetition began to be implemented with computer-assisted language learning software-based solutions (see § Software), enabling automated scheduling and statistic gathering, scaling to thousands of cards scheduled individually. To enable the user to reach a target level of achievement (e.g. 90% of all material correctly recalled at any given time point), the software adjusts the repetition spacing interval. Material that is hard appears more often and material that is easy less often, with difficulty defined according to the ease with which the user is able to produce a correct response.
|
||||
The data behind this initial research indicated that an increasing space between rehearsals (expanding) would yield a greater percentage of accuracy at test points. Spaced repetition with expanding intervals is believed to be so effective because with each expanded interval of repetition it becomes more difficult to retrieve the information because of the time elapsed between test periods; this creates a deeper level of processing of the learned information in long-term memory at each point. Another reason that the expanding repetition model is believed to work so effectively is that the first test happens early on in the rehearsal process. The purpose of this is to increase repetition success. By having a first test that followed initial learning with a successful repetition, people are more likely to remember this successful repetition on the following tests. Although expanding retrieval is commonly associated with spaced repetition, a uniform retrieval schedule is also a form of spaced repetition procedure.
|
||||
A study conducted by Bui et al. (2013) examined how the advantages of spaced repetition can be influenced by the difference in working memory and the complexity of tasks that occurs between the repetitions. The researchers found participants with a higher working memory benefited from spaced repetition and showed better performance on challenging tasks.
|
||||
Spaced repetition is typically studied through the use of memorizing facts. Traditionally speaking, it has not been applied to fields that required some manipulation or thought beyond simple factual/semantic information. A more recent study has shown that spaced repetition can benefit tasks such as solving math problems. In a study conducted by Pashler, Rohrer, Cepeda, and Carpenter, participants had to learn a simple math principle in either a spaced or massed retrieval schedule. The participants given the spaced repetition learning tasks showed higher scores on a final test distributed after their final practice session.
|
||||
This is unique in the sense that it shows spaced repetition can be used to not only remember simple facts or contextual data but it can also be used in fields, such as math, where manipulation and the use of particular principles or formulas (e.g. y = mx + b) is necessary. These researchers also found that it is beneficial for feedback to be applied when administering the tests. When a participant gave a wrong response, they were likely to get it correct on the following tests if the researcher gave them the correct answer after a delayed period.
|
||||
Building on this, more recent studies have applied spaced repetition to procedural skill acquisition in complex domains. For example, a pilot study in neurosurgery training found that incorporating spaced repetition into a six-week simulation module improved residents’ proficiency in performing complex surgical procedures. Participants who engaged in structured, repeated practice showed significant improvements in objective performance metrics compared to those who trained using traditional methods alone. This suggests that spaced repetition can effectively facilitate the acquisition of procedural knowledge in surgical contexts, including its demonstrated applications in other areas of medical training.
|
||||
Spaced repetition is a useful tool for learning that is relevant to many domains such as fact learning, mathematics, and procedural skills, and many different tasks (expanding or uniform retrieval). Many studies over the years have contributed to the use and implementation of spaced repetition, and it still remains a subject of interest for many researchers.
|
||||
Over the years, techniques and tests have been formed to better patients with memory difficulties. Spaced repetition is one of these solutions to help better the patients' minds. Spaced repetition is used in many different areas of memory from remembering facts to remembering how to ride a bike to remembering past events from childhood. Recovery practice is used to see if an individual is able to recall something immediately after they have seen or studied it. Increasing recovery practice is frequently used as a technique in improving long-term memory, essentially for young children trying to learn and older individuals with memory diseases.
|
||||
|
||||
== Algorithms ==
|
||||
There are several families of spaced repetition algorithms:
|
||||
47
data/en.wikipedia.org/wiki/Spaced_repetition-2.md
Normal file
47
data/en.wikipedia.org/wiki/Spaced_repetition-2.md
Normal file
@ -0,0 +1,47 @@
|
||||
---
|
||||
title: "Spaced repetition"
|
||||
chunk: 3/3
|
||||
source: "https://en.wikipedia.org/wiki/Spaced_repetition"
|
||||
category: "reference"
|
||||
tags: "science, encyclopedia"
|
||||
date_saved: "2026-05-05T04:26:29.414379+00:00"
|
||||
instance: "kb-cron"
|
||||
---
|
||||
|
||||
Leitner system—a simple scheme that uses five levels and an arbitrary number of study stages
|
||||
Neural-network-based
|
||||
The SM family of algorithms (SuperMemo#Algorithms), ranging from SM-0 (a paper-and-pencil prototype) to SM-18, which is built into SuperMemo 18 and 19.
|
||||
The DASH (Difficulty, Ability and Study History) family
|
||||
SSP-MMC (Stochastic Shortest Path Minimize Memorization Cost) and the closely related FSRS (Free Spaced Repetition Scheduler), which is available in Anki starting with release 23.10 and in RemNote starting with release 1.16
|
||||
|
||||
== Evidence and criticism ==
|
||||
Spaced repetition is widely accepted as a performant learning strategy in a number of domains, with many researchers suggesting implementing this method in formal education. There is evidence that the popular method of "expanding intervals" (when the interval between the repetitions increases with each repetition) performs as well as or better than uniformly spaced repetitions. Some papers find expanding intervals to be beneficial for recall. Other meta-analyses tend to conclude that both methods yield similar results, therefore concluding that "strong recommendations to teachers and students in favor of spaced retrieval practice are warranted".
|
||||
Several mechanisms were suggested for expanding intervals providing an additional benefit; the most notable one is that one of the core tenets of spaced repetition is that spacing increases the effort for retrieval, and that expanding intervals allow to gradually increase that difficulty. However, little evidence has been found to back this claim. It has been argued that the benefit observed for expanding intervals in some studies is due to other factors, such as the timing of the first retrieval, the number of repetitions or the overall spacing between the tests. It has also been proposed that the best schedule is learner-dependent, making general recommendations irrelevant.
|
||||
|
||||
== Implementations ==
|
||||
|
||||
=== Software ===
|
||||
|
||||
Most spaced repetition software (SRS) is modeled after the manual style of learning with physical flashcards: items to memorize are entered into the program as question-answer pairs. When a pair is due to be reviewed, the question is displayed on a screen, and the user must attempt to answer. After answering, the user manually reveals the answer and then tells the program (subjectively) how difficult answering was. The program schedules pairs based on spaced repetition algorithms. Without a computer program, the user has to schedule physical flashcards; this is time-intensive and limits users to simple algorithms like the Leitner system.
|
||||
To optimize review schedules, developments in spaced repetition algorithms focus on predictive modeling. These algorithms use randomly determined equations to determine the most effective timing for review sessions.
|
||||
Further refinements with regard to software:
|
||||
|
||||
Confidence-based repetition: A user rates their confidence in each digital flashcard, e.g. on a scale of 1–5; a lower-confidence card is repeated more frequently until the user upgrades their confidence rating in it.
|
||||
Questions and/or answers can be a sound file to train recognition of spoken words.
|
||||
Automatic generation of pairs (e.g. for vocabulary, it is useful to generate three question-pairs: written foreign word, its pronunciation and its meaning, but data only has to be entered once.)
|
||||
Additional information retrieved automatically is available, such as example sentences containing a word.
|
||||
Opportunities to combine spaced repetition with online community functions, e.g. sharing courses.
|
||||
|
||||
=== Paper flash cards ===
|
||||
|
||||
The Leitner system is a widely used method of efficiently using flashcards that was proposed by the German science journalist Sebastian Leitner in the 1970s. It is a simple implementation of the principle of spaced repetition, where cards are reviewed at increasing intervals.
|
||||
In this method, flashcards are sorted into groups according to how well the learner knows each one in Leitner's learning box. The learners try to recall the solution written on a flashcard. If they succeed, they send the card to the next group. If they fail, they send it back to the first group. Each succeeding group has a longer period of time before the learner is required to revisit the cards. In Leitner's original method, published in his book So lernt man Lernen (How To Learn To Learn), the schedule of repetition was governed by the size of the partitions in the learning box. These were 1, 2, 5, 8 and 14 cm. Only when a partition became full was the learner to review some of the cards it contained, moving them forward or back, depending on whether they remembered them.
|
||||
|
||||
=== Audio instruction ===
|
||||
Graduated-interval recall is a type of spaced repetition published by Paul Pimsleur in 1967. It is used in the Pimsleur language learning system and is particularly suited to programmed audio instruction due to the very short times (measured in seconds or minutes) between the first few repetitions, as compared to other forms of spaced repetition which may not require such precise timings. The intervals published in Pimsleur's paper were: 5 seconds, 25 seconds, 2 minutes, 10 minutes, 1 hour, 5 hours, 1 day, 5 days, 25 days, 4 months, and 2 years.
|
||||
|
||||
== References ==
|
||||
|
||||
== Further reading ==
|
||||
Kail, R. V., & Cavanaugh J. C. (2007). "Spaced Retrieval". Human Development: A Life-Span View (5th ed.). Belmont, CA: Wadsworth.
|
||||
Wozniak, Piotr (February 1999). "Effective learning: Twenty rules of formulating knowledge". – advice on making flashcards for spaced repetition.
|
||||
25
data/en.wikipedia.org/wiki/Systematic_review-0.md
Normal file
25
data/en.wikipedia.org/wiki/Systematic_review-0.md
Normal file
@ -0,0 +1,25 @@
|
||||
---
|
||||
title: "Systematic review"
|
||||
chunk: 1/6
|
||||
source: "https://en.wikipedia.org/wiki/Systematic_review"
|
||||
category: "reference"
|
||||
tags: "science, encyclopedia"
|
||||
date_saved: "2026-05-05T04:26:30.659047+00:00"
|
||||
instance: "kb-cron"
|
||||
---
|
||||
|
||||
A systematic review is a scholarly synthesis of the evidence on a clearly presented topic using critical methods to identify, define and assess research on the topic. A systematic review extracts and interprets data from published studies on the topic (in the scientific literature), then analyzes, describes, critically appraises and summarizes interpretations into a refined evidence-based conclusion. For example, a systematic review of randomized controlled trials is a way of summarizing and implementing evidence-based medicine. Systematic reviews, sometimes along with meta-analyses, are generally considered the highest level of evidence in medical research.
|
||||
While a systematic review may be applied in the biomedical or health care context, it may also be used where an assessment of a precisely defined subject can advance understanding in a field of research. A systematic review may examine clinical tests, public health interventions, environmental interventions, social interventions, adverse effects, qualitative evidence syntheses, methodological reviews, policy reviews, and economic evaluations.
|
||||
Systematic reviews are closely related to meta-analyses, and often the same instance will combine both (being published with a subtitle of "a systematic review and meta-analysis"). The distinction between the two is that a meta-analysis uses statistical methods to induce a single number from the pooled data set (such as an effect size), whereas the strict definition of a systematic review excludes that step. However, in practice, when one is mentioned, the other may often be involved, as it takes a systematic review to assemble the information that a meta-analysis analyzes, and people sometimes refer to an instance as a systematic review, even if it includes the meta-analytical component.
|
||||
An understanding of systematic reviews and how to implement them in practice is common for professionals in health care, public health, and public policy.
|
||||
Systematic reviews contrast with a type of review often called a narrative review. Systematic reviews and narrative reviews both review the literature (the scientific literature), but the term literature review without further specification refers to a narrative review.
|
||||
|
||||
== Characteristics ==
|
||||
A systematic review can be designed to provide a thorough summary of current literature relevant to a research question. A systematic review uses a rigorous and transparent approach for research synthesis, with the aim of assessing and, where possible, minimizing bias in the findings. While many systematic reviews are based on an explicit quantitative meta-analysis of available data, there are also qualitative reviews and other types of mixed-methods reviews that adhere to standards for gathering, analyzing, and reporting evidence.
|
||||
Systematic reviews of quantitative data or mixed-method reviews sometimes use statistical techniques (meta-analysis) to combine results of eligible studies. Scoring levels are sometimes used to rate the quality of the evidence depending on the methodology used, although this is discouraged by the Cochrane Library. As evidence rating can be subjective, multiple people may be consulted to resolve any scoring differences between how evidence is rated.
|
||||
The EPPI-Centre, Cochrane, and the Joanna Briggs Institute have been influential in developing methods for combining both qualitative and quantitative research in systematic reviews. Several reporting guidelines exist to standardise reporting about how systematic reviews are conducted. Such reporting guidelines are not quality assessment or appraisal tools. The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement suggests a standardized way to ensure a transparent and complete reporting of systematic reviews, and is now required for this kind of research by more than 170 medical journals worldwide. The latest version of this commonly used statement corresponds to PRISMA 2020 (the respective article was published in 2021). Several specialized PRISMA guideline extensions have been developed to support particular types of studies or aspects of the review process, including PRISMA-P for review protocols and PRISMA-ScR for scoping reviews. A list of PRISMA guideline extensions is hosted by the EQUATOR (Enhancing the QUAlity and Transparency Of health Research) Network. However, the PRISMA guidelines have been found to be limited to intervention research and the guidelines have to be changed in order to fit non-intervention research. As a result, Non-Interventional, Reproducible, and Open (NIRO) Systematic Reviews was created to counter this limitation.
|
||||
For qualitative reviews, reporting guidelines include ENTREQ (Enhancing transparency in reporting the synthesis of qualitative research) for qualitative evidence syntheses; RAMESES (Realist And MEta-narrative Evidence Syntheses: Evolving Standards) for meta-narrative and realist reviews; and eMERGe (Improving reporting of Meta-Ethnography) for meta-ethnograph.
|
||||
Developments in systematic reviews during the 21st century included realist reviews and the meta-narrative approach, both of which addressed problems of variation in methods and heterogeneity existing on some subjects.
|
||||
|
||||
== Types ==
|
||||
There are over 30 types of systematic review and Table 1 below non-exhaustingly summarises some of these. There is not always consensus on the boundaries and distinctions between the approaches described below.
|
||||
32
data/en.wikipedia.org/wiki/Systematic_review-1.md
Normal file
32
data/en.wikipedia.org/wiki/Systematic_review-1.md
Normal file
@ -0,0 +1,32 @@
|
||||
---
|
||||
title: "Systematic review"
|
||||
chunk: 2/6
|
||||
source: "https://en.wikipedia.org/wiki/Systematic_review"
|
||||
category: "reference"
|
||||
tags: "science, encyclopedia"
|
||||
date_saved: "2026-05-05T04:26:30.659047+00:00"
|
||||
instance: "kb-cron"
|
||||
---
|
||||
|
||||
=== Scoping reviews ===
|
||||
Scoping reviews are distinct from systematic reviews in several ways. A scoping review is an attempt to search for concepts by mapping the language and data which surrounds those concepts and adjusting the search method iteratively to synthesize evidence and assess the scope of an area of inquiry. This can mean that the concept search and method (including data extraction, organisation and analysis) are refined throughout the process, sometimes requiring deviations from any protocol or original research plan. A scoping review may often be a preliminary stage before a systematic review, which 'scopes' out an area of inquiry and maps the language and key concepts to determine if a systematic review is possible or appropriate, or to lay the groundwork for a full systematic review. The goal can be to assess how much data or evidence is available regarding a certain area of interest. This process is further complicated if it is mapping concepts across multiple languages or cultures.
|
||||
As a scoping review should be systematically conducted and reported (with a transparent and repeatable method), some academic publishers categorize them as a kind of 'systematic review', which may cause confusion. Scoping reviews are helpful when it is not possible to carry out a systematic synthesis of research findings, for example, when there are no published clinical trials in the area of inquiry. Scoping reviews are helpful when determining if it is possible or appropriate to carry out a systematic review, and are a useful method when an area of inquiry is very broad, for example, exploring how the public are involved in all stages systematic reviews.
|
||||
There is still a lack of clarity when defining the exact method of a scoping review as it is both an iterative process and is still relatively new. There have been several attempts to improve the standardisation of the method, for example via a Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guideline extension for scoping reviews (PRISMA-ScR). PROSPERO (the International Prospective Register of Systematic Reviews) does not permit the submission of protocols of scoping reviews, although some journals will publish protocols for scoping reviews.
|
||||
|
||||
== Stages ==
|
||||
While there are multiple kinds of systematic review methods, the main stages of a review can be summarised as follows:
|
||||
|
||||
=== Defining the research question ===
|
||||
Some reported that the 'best practices' involve 'defining an answerable question' and publishing the protocol of the review before initiating it to reduce the risk of unplanned research duplication and to enable transparency and consistency between methodology and protocol. Clinical reviews of quantitative data are often structured using the mnemonic PICO, which stands for 'Population or Problem', 'Intervention or Exposure', 'Comparison', and 'Outcome', with other variations existing for other kinds of research. For qualitative reviews, PICo is 'Population or Problem', 'Interest', and 'Context'.
|
||||
|
||||
=== Searching for sources ===
|
||||
Relevant criteria can include selecting research that is of good quality and answers the defined question. The search strategy should be designed to retrieve literature that matches the protocol's specified inclusion and exclusion criteria. The methodology section of a systematic review should list all of the databases and citation indices that were searched. The titles and abstracts of identified articles can be checked against predetermined criteria for eligibility and relevance. Each included study may be assigned an objective assessment of methodological quality, preferably by using methods conforming to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement, or the standards of Cochrane.
|
||||
Common information sources used in searches include scholarly databases of peer-reviewed articles such as MEDLINE, Web of Science, Embase, and PubMed, as well as sources of unpublished literature such as clinical trial registries and grey literature collections. Key references can also be yielded through additional methods such as citation searching, reference list checking (related to a search method called 'pearl growing'), manually searching information sources not indexed in the major electronic databases (sometimes called 'hand-searching'), and directly contacting experts in the field.
|
||||
To be systematic, searchers must use a combination of search skills and tools such as database subject headings, keyword searching, Boolean operators, and proximity searching while attempting to balance sensitivity (systematicity) and precision (accuracy). Inviting and involving an experienced information professional or librarian can improve the quality of systematic review search strategies and reporting.
|
||||
|
||||
=== 'Extraction' of relevant data ===
|
||||
|
||||
Relevant data are 'extracted' from the data sources according to the review method. The data extraction method is specific to the kind of data, and data extracted on 'outcomes' is only relevant to certain types of reviews. For example, a systematic review of clinical trials might extract data about how the research was done (often called the method or 'intervention'), who participated in the research (including how many people), how it was paid for (for example, funding sources) and what happened (the outcomes). Relevant data are being extracted and 'combined' in an intervention effect review, where a meta-analysis is possible.
|
||||
|
||||
=== Assess the eligibility of the data ===
|
||||
This stage involves assessing the eligibility of data for inclusion in the review by judging it against criteria identified at the first stage. This can include assessing if a data source meets the eligibility criteria and recording why decisions about inclusion or exclusion in the review were made. Software programmes can be used to support the selection process, including text mining tools and machine learning, which can automate aspects of the process. The 'Systematic Review Toolbox' is a community-driven, web-based catalog of tools, to help reviewers chose appropriate tools for reviews.
|
||||
43
data/en.wikipedia.org/wiki/Systematic_review-2.md
Normal file
43
data/en.wikipedia.org/wiki/Systematic_review-2.md
Normal file
@ -0,0 +1,43 @@
|
||||
---
|
||||
title: "Systematic review"
|
||||
chunk: 3/6
|
||||
source: "https://en.wikipedia.org/wiki/Systematic_review"
|
||||
category: "reference"
|
||||
tags: "science, encyclopedia"
|
||||
date_saved: "2026-05-05T04:26:30.659047+00:00"
|
||||
instance: "kb-cron"
|
||||
---
|
||||
|
||||
=== Analyse and combine the data ===
|
||||
Analysing and combining data can provide an overall result from all the data. Because this combined result may use qualitative or quantitative data from all eligible sources of data, it is considered more reliable as it provides better evidence, as the more data included in reviews, the more confident we can be of conclusions. When appropriate, some systematic reviews include a meta-analysis, which uses statistical methods to combine data from multiple sources. A review might use quantitative data, or might employ a qualitative meta-synthesis, which synthesises data from qualitative studies. A review may also bring together the findings from quantitative and qualitative studies in a mixed methods or overarching synthesis. The combination of data from a meta-analysis can sometimes be visualised. One method uses a forest plot (also called a blobbogram). In an intervention effect review, the diamond in the 'forest plot' represents the combined results of all the data included. An example of a 'forest plot' is the Cochrane Collaboration logo. The logo is a forest plot of one of the first reviews which showed that corticosteroids given to women who are about to give birth prematurely can save the life of the newborn child.
|
||||
Recent visualisation innovations include the albatross plot, which plots p-values against sample sizes, with approximate effect-size contours superimposed to facilitate analysis. The contours can be used to infer effect sizes from studies that have been analysed and reported in diverse ways. Such visualisations may have advantages over other types when reviewing complex interventions.
|
||||
|
||||
=== Communication and dissemination ===
|
||||
Once these stages are complete, the review may be published, disseminated, and translated into practice after being adopted as evidence. The UK National Institute for Health Research (NIHR) defines dissemination as "getting the findings of research to the people who can make use of them to maximise the benefit of the research without delay".
|
||||
Some users do not have time to invest in reading large and complex documents and/or may lack awareness or be unable to access newly published research. Researchers are, therefore, developing skills to use creative communication methods such as illustrations, blogs, infographics, and board games to share the findings of systematic reviews.
|
||||
|
||||
== Automation ==
|
||||
Living systematic reviews are a newer kind of semi-automated, up-to-date online summaries of research that are updated as new research becomes available. The difference between a living systematic review and a conventional systematic review is the publication format. Living systematic reviews are "dynamic, persistent, online-only evidence summaries, which are updated rapidly and frequently".
|
||||
The automation or semi-automation of the systematic process itself is increasingly being explored. While little evidence exists to demonstrate it is as accurate or involves less manual effort, efforts that promote training and using artificial intelligence for the process are increasing. In particular, since 2023, there has been a growing emergence of tools powered by large language models designed to support, automate, or even generate literature reviews.
|
||||
|
||||
== Research fields ==
|
||||
|
||||
=== Health and medicine ===
|
||||
|
||||
==== Current use of systematic reviews in medicine ====
|
||||
Many organisations around the world use systematic reviews, with the methodology depending on the guidelines being followed. Organisations which use systematic reviews in medicine and human health include the National Institute for Health and Care Excellence (NICE, UK), the Agency for Healthcare Research and Quality (AHRQ, US), and the World Health Organization. Most notable among international organisations is Cochrane, a group of over 37,000 specialists in healthcare who systematically review randomised trials of the effects of prevention, treatments, and rehabilitation as well as health systems interventions. They sometimes also include the results of other types of research. Cochrane Reviews are published in The Cochrane Database of Systematic Reviews section of the Cochrane Library. The 2015 impact factor for The Cochrane Database of Systematic Reviews was 6.103, and it was ranked 12th in the Medicine, General & Internal category.
|
||||
There are several types of systematic reviews, including:
|
||||
|
||||
Intervention reviews assess the benefits and harms of interventions used in healthcare and health policy.
|
||||
Diagnostic test accuracy reviews assess how well a diagnostic test performs in diagnosing and detecting a particular disease. For conducting diagnostic test accuracy reviews, free software such as MetaDTA and CAST-HSROC in the graphical user interface is available.
|
||||
Methodology reviews address issues relevant to how systematic reviews and clinical trials are conducted and reported.
|
||||
Qualitative reviews synthesize qualitative evidence to address questions on aspects other than effectiveness.
|
||||
Prognosis reviews address the probable course or future outcome(s) of people with a health problem.
|
||||
Overviews of Systematic Reviews (OoRs) compile multiple pieces of evidence from systematic reviews into a single accessible document, sometimes referred to as umbrella reviews.
|
||||
Living systematic reviews are continually updated, incorporating relevant new evidence as it becomes available.
|
||||
Rapid reviews are a form of knowledge synthesis that "accelerates the process of conducting a traditional systematic review through streamlining or omitting specific methods to produce evidence for stakeholders in a resource-efficient manner".
|
||||
Reviews of complex health interventions in complex systems are to improve evidence synthesis and guideline development.
|
||||
|
||||
==== Patient and public involvement in systematic reviews ====
|
||||
|
||||
There are various ways patients and the public can be involved in producing systematic reviews and other outputs. Tasks for public members can be organised as 'entry level' or higher. Tasks include:
|
||||
45
data/en.wikipedia.org/wiki/Systematic_review-3.md
Normal file
45
data/en.wikipedia.org/wiki/Systematic_review-3.md
Normal file
@ -0,0 +1,45 @@
|
||||
---
|
||||
title: "Systematic review"
|
||||
chunk: 4/6
|
||||
source: "https://en.wikipedia.org/wiki/Systematic_review"
|
||||
category: "reference"
|
||||
tags: "science, encyclopedia"
|
||||
date_saved: "2026-05-05T04:26:30.659047+00:00"
|
||||
instance: "kb-cron"
|
||||
---
|
||||
|
||||
Joining a collaborative volunteer effort to help categorise and summarise healthcare evidence
|
||||
Data extraction and risk of bias assessment
|
||||
Translation of reviews into other languages
|
||||
A systematic review of how people were involved in systematic reviews aimed to document the evidence-base relating to stakeholder involvement in systematic reviews and to use this evidence to describe how stakeholders have been involved in systematic reviews. Thirty percent involved patients and/or carers. The ACTIVE framework provides a way to describe how people are involved in systematic review and may be used as a way to support systematic review authors in planning people's involvement. Standardised Data on Initiatives (STARDIT) is another proposed way of reporting who has been involved in which tasks during research, including systematic reviews.
|
||||
There has been some criticism of how Cochrane prioritises systematic reviews. Cochrane has a project that involved people in helping identify research priorities to inform Cochrane Reviews. In 2014, the Cochrane–Wikipedia partnership was formalised.
|
||||
|
||||
==== Environmental health and toxicology ====
|
||||
Systematic reviews are a relatively recent innovation in the field of environmental health and toxicology. Although mooted in the mid-2000s, the first full frameworks for conduct of systematic reviews of environmental health evidence were published in 2014 by the US National Toxicology Program's Office of Health Assessment and Translation and the Navigation Guide at the University of California San Francisco's Program on Reproductive Health and the Environment. Uptake has since been rapid, with the estimated number of systematic reviews in the field doubling since 2016 and the first consensus recommendations on best practice, as a precursor to a more general standard, being published in 2020.
|
||||
|
||||
=== Social, behavioural, and educational ===
|
||||
In 1959, social scientist and social work educator Barbara Wootton published one of the first contemporary systematic reviews of literature on anti-social behavior as part of her work, Social Science and Social Pathology.
|
||||
Several organisations use systematic reviews in social, behavioural, and educational areas of evidence-based policy, including the National Institute for Health and Care Excellence (NICE, UK), Social Care Institute for Excellence (SCIE, UK), the Agency for Healthcare Research and Quality (AHRQ, US), the World Health Organization, the International Initiative for Impact Evaluation (3ie), the Joanna Briggs Institute, and the Campbell Collaboration. The quasi-standard for systematic review in the social sciences is based on the procedures proposed by the Campbell Collaboration, which is one of several groups promoting evidence-based policy in the social sciences.
|
||||
|
||||
=== Homelessness ===
|
||||
in 2020 the Campbell Collaboration published its first systematic reviews in homelessness, commissioned by the Centre for Homelessness Impact, in order to encourage robust, policy-relevant evidence into a field with very little tradition of such an approach. The topics for three initial systematic reviews were chosen by using Evidence and Gap Maps published by the Centre, bringing together causal research on homelessness in one place and thereby identifying areas with sufficient research to conduct a synthesis of available evidence.
|
||||
These first systematic reviews on homelessness looked at accommodation-based interventions, institutional discharge and interventions to improve access to health and social care. The accommodation review, which evaluated the impact of different housing models, found that interventions that combine stable housing with high levels of personal support, such as Housing First, tend to be more effective than those offering accommodation with little or no support. The review on the effectiveness of discharge programmes that provide housing assistance for individuals leaving institutions such as hospital, the armed forces and prison, concluded that well-designed and delivered discharge programmes can significantly reduce the likelihood of homelessness. The third looked particularly at access to care for people with multiple and complex needs, and emphasised the importance of integrated, person-centred approaches.
|
||||
Further systematic reviews have since been conducted, including into the effectiveness of case management to support individuals impacted by homelessness and two reviews of the effectiveness of psychosocial interventions for people experiencing homelessness. Again, these were commissioned by the Centre for Homelessness Impact.
|
||||
|
||||
=== Others ===
|
||||
Some attempts to transfer the procedures from medicine to business research have been made, including a step-by-step approach, and developing a standard procedure for conducting systematic literature reviews in business and economics.
|
||||
Systematic reviews are increasingly prevalent in other fields, such as international development research. Subsequently, several donors (including the UK Department for International Development (DFID) and AusAid) are focusing more on testing the appropriateness of systematic reviews in assessing the impacts of development and humanitarian interventions.
|
||||
The Collaboration for Environmental Evidence (CEE) has a journal titled Environmental Evidence, which publishes systematic reviews, review protocols, and systematic maps on the impacts of human activity and the effectiveness of management interventions.
|
||||
|
||||
== Review tools ==
|
||||
A 2022 publication identified 24 systematic review tools and ranked them by inclusion of 30 features deemed most important when performing a systematic review in accordance with best practices. The top six software tools (with at least 21/30 key features) are all proprietary paid platforms, typically web-based, and include:
|
||||
|
||||
Giotto Compliance
|
||||
DistillerSR
|
||||
Nested Knowledge
|
||||
EPPI-Reviewer Web
|
||||
LitStream
|
||||
JBI SUMARI
|
||||
The Cochrane Collaboration provides a handbook for systematic reviewers of interventions which "provides guidance to authors for the preparation of Cochrane Intervention reviews." The Cochrane Handbook also outlines steps for preparing a systematic review and forms the basis of two sets of standards for the conduct and reporting of Cochrane Intervention Reviews (MECIR; Methodological Expectations of Cochrane Intervention Reviews). It also contains guidance on integrating patient-reported outcomes into reviews.
|
||||
|
||||
== Limitations ==
|
||||
43
data/en.wikipedia.org/wiki/Systematic_review-4.md
Normal file
43
data/en.wikipedia.org/wiki/Systematic_review-4.md
Normal file
@ -0,0 +1,43 @@
|
||||
---
|
||||
title: "Systematic review"
|
||||
chunk: 5/6
|
||||
source: "https://en.wikipedia.org/wiki/Systematic_review"
|
||||
category: "reference"
|
||||
tags: "science, encyclopedia"
|
||||
date_saved: "2026-05-05T04:26:30.659047+00:00"
|
||||
instance: "kb-cron"
|
||||
---
|
||||
|
||||
=== Out-dated or risk of bias ===
|
||||
While systematic reviews are regarded as the strongest form of evidence, a 2003 review of 300 studies found that not all systematic reviews were equally reliable, and that their reporting can be improved by a universally agreed upon set of standards and guidelines. A further study by the same group found that of 100 systematic reviews monitored, 7% needed updating at the time of publication, another 4% within a year, and another 11% within 2 years; this figure was higher in rapidly changing fields of medicine, especially cardiovascular medicine. A 2003 study suggested that extending searches beyond major databases, perhaps into grey literature, would increase the effectiveness of reviews. A 2005 analysis of 377 Cochrane systematic reviews of healthcare interventions found that, after four years, 9% of the updated reviews resulted in a changed conclusion compared to the original versions. Among Cochrane systematic reviews on medical interventions published in 2010 and updated by 2017, very few (4%) of the updated reviews reported a change in conclusion for a primary outcome.
|
||||
|
||||
Some authors have highlighted problems with systematic reviews, particularly those conducted by Cochrane, noting that published reviews are often biased, out of date, and excessively long. Cochrane reviews have been criticized as not being sufficiently critical in the selection of trials and including too many of low quality. They proposed several solutions, including limiting studies in meta-analyses and reviews to registered clinical trials, requiring that original data be made available for statistical checking, paying greater attention to sample size estimates, and eliminating dependence on only published data. Some of these difficulties were noted as early as 1994:much poor research arises because researchers feel compelled for career reasons to carry out research that they are ill-equipped to perform, and nobody stops them.
|
||||
Methodological limitations of meta-analysis have also been noted. Another concern is that the methods used to conduct a systematic review are sometimes changed once researchers see the available trials they are going to include. Some websites have described retractions of systematic reviews and published reports of studies included in published systematic reviews. Eligibility criteria that is arbitrary may affect the perceived quality of the review.
|
||||
|
||||
=== Limited reporting of data from human studies ===
|
||||
The AllTrials campaign report that around half of clinical trials have never reported results and works to improve reporting. 'Positive' trials were twice as likely to be published as those with 'negative' results.
|
||||
As of 2016, it is legal for-profit companies to conduct clinical trials and not publish the results. For example, in the past 10 years, 8.7 million patients have taken part in trials that have not published results. These factors mean that it is likely there is a significant publication bias, with only 'positive' or perceived favourable results being published. A recent systematic review of industry sponsorship and research outcomes concluded that "sponsorship of drug and device studies by the manufacturing company leads to more favorable efficacy results and conclusions than sponsorship by other sources" and that the existence of an industry bias that cannot be explained by standard 'risk of bias' assessments.
|
||||
|
||||
=== Poor compliance with review reporting guidelines ===
|
||||
The rapid growth of systematic reviews in recent years has been accompanied by the attendant issue of poor compliance with guidelines, particularly in areas such as declaration of registered study protocols, funding source declaration, risk of bias data, issues resulting from data abstraction, and description of clear study objectives. A host of studies have identified weaknesses in the rigour and reproducibility of search strategies in systematic reviews. To remedy this issue, a new PRISMA guideline extension called PRISMA-S is being developed. Furthermore, tools and checklists for peer-reviewing search strategies have been created, such as the Peer Review of Electronic Search Strategies (PRESS) guidelines.
|
||||
A key challenge for using systematic reviews in clinical practice and healthcare policy is assessing the quality of a given review. Consequently, a range of appraisal tools to evaluate systematic reviews have been designed. The two most popular measurement instruments and scoring tools for systematic review quality assessment are AMSTAR 2 (a measurement tool to assess the methodological quality of systematic reviews) and ROBIS (Risk Of Bias In Systematic reviews); however, these are not appropriate for all systematic review types. Some recent peer-reviewed articles have carried out comparisons between AMSTAR 2 and ROBIS tools.
|
||||
|
||||
== History ==
|
||||
The first publication that is now recognized as equivalent to a modern systematic review was a 1753 paper by James Lind, which reviewed all of the previous publications about scurvy. Systematic reviews appeared only sporadically until the 1980s, and became common after 2000. More than 10,000 systematic reviews are published each year.
|
||||
|
||||
=== History in medicine ===
|
||||
A 1904 British Medical Journal paper by Karl Pearson collated data from several studies in the UK, India and South Africa of typhoid inoculation. He used a meta-analytic approach to aggregate the outcomes of multiple clinical studies. In 1972, Archie Cochrane wrote: "It is surely a great criticism of our profession that we have not organised a critical summary, by specialty or subspecialty, adapted periodically, of all relevant randomised controlled trials". Critical appraisal and synthesis of research findings in a systematic way emerged in 1975 under the term 'meta analysis'. Early syntheses were conducted in broad areas of public policy and social interventions, with systematic research synthesis applied to medicine and health. Inspired by his own personal experiences as a senior medical officer in prisoner of war camps, Archie Cochrane worked to improve the scientific method in medical evidence. His call for the increased use of randomised controlled trials and systematic reviews led to the creation of The Cochrane Collaboration, which was founded in 1993 and named after him, building on the work by Iain Chalmers and colleagues in the area of pregnancy and childbirth.
|
||||
|
||||
== See also ==
|
||||
Critical appraisal
|
||||
Further research is needed
|
||||
Systematic searching
|
||||
Horizon scanning
|
||||
Literature review
|
||||
Living review
|
||||
Meta-analysis
|
||||
Metascience
|
||||
Peer review
|
||||
Review journal
|
||||
Generalized model aggregation (GMA)
|
||||
Umbrella review
|
||||
26
data/en.wikipedia.org/wiki/Systematic_review-5.md
Normal file
26
data/en.wikipedia.org/wiki/Systematic_review-5.md
Normal file
@ -0,0 +1,26 @@
|
||||
---
|
||||
title: "Systematic review"
|
||||
chunk: 6/6
|
||||
source: "https://en.wikipedia.org/wiki/Systematic_review"
|
||||
category: "reference"
|
||||
tags: "science, encyclopedia"
|
||||
date_saved: "2026-05-05T04:26:30.659047+00:00"
|
||||
instance: "kb-cron"
|
||||
---
|
||||
|
||||
== References ==
|
||||
This article was submitted to WikiJournal of Medicine for external academic peer review in 2019 (reviewer reports). The updated content was reintegrated into the Wikipedia page under a CC-BY-SA-3.0 license (2020). The version of record as reviewed is: Jack Nunn; Steven Chang; et al. (9 November 2020). "What are Systematic Reviews?" (PDF). WikiJournal of Medicine. 7 (1): 5. doi:10.15347/WJM/2020.005. ISSN 2002-4436. Wikidata Q99440266.
|
||||
STARDIT report Q101116128.
|
||||
|
||||
== External links ==
|
||||
|
||||
Systematic Review Tools — Search and list of systematic review software tools
|
||||
Cochrane Collaboration
|
||||
MeSH: Review Literature—articles about the review process
|
||||
MeSH: Review [Publication Type] - limit search results to reviews
|
||||
Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) Statement Archived 27 July 2011 at the Wayback Machine, "an evidence-based minimum set of items for reporting in systematic reviews and meta-analyses"
|
||||
PRISMA Extension for Scoping Reviews (PRISMA-ScR): Checklist and explanation
|
||||
Animated Storyboard: What Are Systematic Reviews? - Cochrane Consumers and Communication Group
|
||||
Sysrev - a free platform with open access systematic reviews.
|
||||
STARDIT - an open access data-sharing system to standardise the way that information about initiatives is reported.
|
||||
Cao, Christian; Arora, Rohit; Cento, Paul; Manta, Katherine; Farahani, Elina; Cecere, Matthew; Selemon, Anabel; Sang, Jason; Gong, Ling Xi (19 June 2025). "Automation of Systematic Reviews with Large Language Models". medRxiv 10.1101/2025.06.13.25329541.
|
||||
40
data/en.wikipedia.org/wiki/Theory-driven_evaluation-0.md
Normal file
40
data/en.wikipedia.org/wiki/Theory-driven_evaluation-0.md
Normal file
@ -0,0 +1,40 @@
|
||||
---
|
||||
title: "Theory-driven evaluation"
|
||||
chunk: 1/2
|
||||
source: "https://en.wikipedia.org/wiki/Theory-driven_evaluation"
|
||||
category: "reference"
|
||||
tags: "science, encyclopedia"
|
||||
date_saved: "2026-05-05T04:26:31.832962+00:00"
|
||||
instance: "kb-cron"
|
||||
---
|
||||
|
||||
Theory-driven evaluation (also theory-based evaluation) is an umbrella term for any approach to program evaluation – quantitative, qualitative, or mixed method – that develops a theory of change and uses it to design, implement, analyze, and interpret findings from an evaluation. More specifically, an evaluation is theory-driven if it:
|
||||
|
||||
formulates a theory of change using some combination of social science, lived experience, and program-related professionals' expertise;
|
||||
develops and prioritizes evaluation questions using the theory;
|
||||
uses the theory to guide the design and implementation of the evaluation;
|
||||
uses the theory to operationalize contextual, process, and outcome variables;
|
||||
provides a causal explanation of how and why outcomes were achieved, including whether the program worked and/or had any unintended consequences (desirable or harmful); and
|
||||
explains what factors moderate outcomes.
|
||||
By investigating the mechanisms leading to outcomes, theory-driven approaches facilitate learning to improve programs and how they are implemented, and help knowledge to accumulate – including across ostensibly different programs. This is in contrast to methods-driven "black box" evaluations, which focus on following the steps of a method (for instance, randomized experiment or focus group) and only assess whether a program achieves its intended outcomes. Theory-driven approaches can also improve the validity of evaluations, for instance leading to more precise estimates of impact in randomized controlled trials.
|
||||
|
||||
== History ==
|
||||
Theory-driven evaluation emerged in the 1970s and 80s in response to the limitations of methods-driven "black box" evaluations. The term theory-driven evaluation was coined by Huey T. Chen and Peter H. Rossi. Chen (1990) wrote the first comprehensive introduction to conducting theory-driven evaluations, for example explaining how to develop a program theory of change and the different types of design. Its origins have been traced to a book by Carol Weiss (1972) and a rarely-cited article by Carol Taylor Fitz-Gibbon and Lynn Lyons Morris (1975). However, "the first published use of what we would recognize as program theory" was in an evaluation of training programs, by Don Kirkpatrick in 1959.
|
||||
Funnell and Rogers (2011, pp. 23–24) comment on the confused nomenclature of the field, enumerating 22 approaches such as theory-based evaluation and program theory-driven evaluation science that are equivalent to or overlap significantly with theory-driven evaluation. The first definition of theory-based evaluation, by Fitz-Gibbon and Morris (1975), is near-identical to theory-driven evaluation:
|
||||
|
||||
A theory-based evaluation of a program is one in which the selection of program features to evaluate is determined by an explicit conceptualization of the program in terms of a theory […] which attempts to explain how the program produces the desired effects. The theory might be psychological […] or social psychological […] or philosophical […]. The essential characteristic is that the theory points out a causal relationship between a process A and an outcome B.
|
||||
Consequently, the terms theory-driven and theory-based evaluation are often used interchangeably in the literature. However, theory-based evaluation is sometimes interpreted more narrowly to mean qualitative or small-n case study-based evaluations conducted without a comparison group, for example those using process tracing or qualitative comparative analysis. An example of this narrower meaning is present in the UK government handbook on evaluation, the Magenta Book.
|
||||
Theory‑driven evaluation is also closely related to evidential pluralism, an approach developed within the philosophy of science from around 2007 which argues that for scientists to make a causal claim, they must provide both evidence of an association between a putative cause and its effect and evidence of the mechanisms that connect the two. This dual‑requirement aligns naturally with theory‑driven evaluation, which similarly emphasises not only identifying whether an intervention is associated with outcomes but also understanding how those outcomes arise through underlying causal processes.
|
||||
|
||||
== What is meant by "theory"? ==
|
||||
The theory of theory-driven evaluation seeks to be as close as possible to the causes of a social problem and site of intervention. This is in contrast to a "global" or "grand" theory, that tries to provide an overarching understanding of society, or a metaphysical theory about the nature of social reality. Chen and Rossi (1983) illustrate as follows:
|
||||
|
||||
It advances evaluation practice very little to adopt one or another of current global theories in attacking, say, the problem of juvenile delinquency, but it does help a great deal to understand the authority structure in schools and the mechanisms of peer group influence and parental discipline in designing and evaluating a program that is supposed to reduce disciplinary problems in schools. [...T]he theory-driven perspective is closer to what econometricians call "model specification" than are more complicated and more abstract and general theories.
|
||||
A distinction is also drawn between normative theory, concerning what a program is supposed to do and how it should be implemented, and causal theory, which specifies how the program is thought to work. There can then be two broad ways in which a program fails to lead to the desired outcomes: (1) a program may be implemented as intended according to the normative theory; however, it turns out that the causal theory is incorrect; or (2) the causal theory is correct; however, the program was not implemented correctly.
|
||||
Graphical causal models (GCMs) may be used to formalize causal theories and design, e.g., theory-driven quasi-experiments. One of the advantages of GCMs is that they can be used to automatically determine which variables need to be statistically adjusted or matched on, to estimate the causal effect of a program.
|
||||
|
||||
=== Chen's action model/change model schema ===
|
||||
Chen's action model/change model schema provides an example of how a program theory and its context are conceptualized. The elements of the schema are then completed for each particular program.
|
||||
|
||||
The change model specifies how an intervention of a program leads to outcomes via determinants, also known as intermediate or mediating variables.
|
||||
The action model specifies how staff and delivery organizations deliver the intervention to beneficiaries:
|
||||
35
data/en.wikipedia.org/wiki/Theory-driven_evaluation-1.md
Normal file
35
data/en.wikipedia.org/wiki/Theory-driven_evaluation-1.md
Normal file
@ -0,0 +1,35 @@
|
||||
---
|
||||
title: "Theory-driven evaluation"
|
||||
chunk: 2/2
|
||||
source: "https://en.wikipedia.org/wiki/Theory-driven_evaluation"
|
||||
category: "reference"
|
||||
tags: "science, encyclopedia"
|
||||
date_saved: "2026-05-05T04:26:31.832962+00:00"
|
||||
instance: "kb-cron"
|
||||
---
|
||||
|
||||
The target population includes a specification of who participants are and how they are recruited.
|
||||
The implementing organization (for instance a clinic or school) and its staff of implementers (for instance therapists or teachers) are responsible for allocating resources, training, and delivering the interventions.
|
||||
Intervention and service delivery protocols would include therapy manuals or subject curricula.
|
||||
Associated organizations and community partners refers to organisations other than the implementing organisation. In the case of a psychotherapy intervention, this may include schools or general practitioners who advertise the program or refer beneficiaries to it.
|
||||
Ecological context refers to aspects of the environment, for instance family, friends, co-workers, other students, etc., that may moderate the effects of a program.
|
||||
|
||||
== Theory-driven methods ==
|
||||
The full-range of research methods has been argued to apply. Chen (2015) provides examples using randomized experiments, quasi-experimental designs, process and outcome monitoring, and qualitative methods. Although proponents of theory-driven evaluation are critical of "black box" experiments, Chen and Rossi (1983, p. 292) argue that theory-driven experiments are possible and desirable:
|
||||
|
||||
[A]dvocates of the black box experimental paradigm often neglect the fact that after randomization exogenous variables are still correlated with outcome variables. Knowing how such exogenous factors affect outcomes makes it possible to construct more precise estimates of experimental effects by controlling for such exogenous variables.
|
||||
It has been argued that theory-driven evaluation focusses too much on statistical approaches, such as randomized experiments, quasi-experiments, and structural equation modelling; however, a case has also been made for the importance of qualitative methods, particularly when developing program theories and understanding implementation.
|
||||
There is also methodological debate concerning whether realist evaluations, considered a particular kind of theory-driven approach, may include randomized controlled trials in any form. Some evaluators think they may and conduct what they call "realist trials". Others argue that a realist trial is an "oxymoron", and recommend instead calling them "theory-oriented trials". A 2023 review of trials described as realist concluded that whether they are really realist depends on "ontological and epistemological" commitments of evaluators and that differences "cannot be resolved" by reviewing studies conducted.
|
||||
|
||||
== Examples ==
|
||||
Examples discussed in a 2011 systematic review of 45 theory-driven evaluations include:
|
||||
|
||||
An evaluation of the Fort Bragg Child and Adolescent Mental Health Demonstration, a managed mental health care system with a single point of entry, which used individual interviews, focus groups, and document review to assist the development of a theory of change. The theory explained why it was thought that an integrated care system would be more cost-effective than a fragmented system.
|
||||
An evaluation of a board game created to help teach secondary school business education. This evaluation developed a theory of change and used it to select measures and design regression analyses of process and outcome.
|
||||
An evaluation of a garbage reduction program. The program attempted to encourage residents to reduce the volume of garbage they produce by reducing the frequency of collection; however, an unintended negative consequence identified by the evaluation was that residents produced the same volume as before, simply storing their garbage in their homes on non-collection days. This effect was identified using an comparative interrupted time series analysis with autoregressive integrated moving average (ARIMA).
|
||||
A 2014 review of theory-driven evaluation in school psychology highlighted two illustrative examples:
|
||||
|
||||
An evaluation of conjoint behavioral consultation, a "strength-based intervention focused on building behavioral and social competence in children". The evaluation tested a theory of change using a cluster-randomised controlled trial and mediation analysis.
|
||||
An evaluation of repeated reading and vocabulary previewing which tested causal theory using case study methodology, an adapted alternating treatments design with six students.
|
||||
|
||||
== References ==
|
||||
32
data/en.wikipedia.org/wiki/UK_Evaluation_Society-0.md
Normal file
32
data/en.wikipedia.org/wiki/UK_Evaluation_Society-0.md
Normal file
@ -0,0 +1,32 @@
|
||||
---
|
||||
title: "UK Evaluation Society"
|
||||
chunk: 1/1
|
||||
source: "https://en.wikipedia.org/wiki/UK_Evaluation_Society"
|
||||
category: "reference"
|
||||
tags: "science, encyclopedia"
|
||||
date_saved: "2026-05-05T04:26:33.012095+00:00"
|
||||
instance: "kb-cron"
|
||||
---
|
||||
|
||||
The UK Evaluation Society (UKES) was founded in 1994 and is the principal professional organisation for evaluation of social policy and programmes in the UK. It is a member of the UK Academy of Social Sciences. Its president is Dr Kirstine Szifris (December 2023 until November 2026) and Executive Director is Nick Posford. UKES was registered as a charity on 27 November 2024.
|
||||
|
||||
|
||||
== Activities ==
|
||||
Developed a Manifesto for Evaluation, coinciding with the 2024 United Kingdom general election.
|
||||
Organises an annual conference, which in 2023 was sponsored by RSM UK and Kantar Public (now Verian). In 2023, keynote speakers were Emily Gates (Boston College), Kelly Beaver MBE (Ipsos), Urvashi Parashar (Department for Culture, Media and Sport), and Nigel Ball (Social Purpose Lab) at University of the Arts London).
|
||||
Develops guidance for conducting evaluations, for example Guidelines for Good Practice in Evaluation (revised 2018) and Quality of Evidence Rubrics for Single Cases (2023).
|
||||
Offers training course on evaluation, which are recommended by the UK government for civil servants.
|
||||
Provides peer review for the UK government Magenta Book, the HM Treasury guidance on key considerations when conducting evaluations of public policies and programmes, which was last updated 2020. Also acts as a specialist network, which government draws on when developing evaluation approaches.
|
||||
Publishes a journal, Evaluative Practice.
|
||||
Offers prizes, in 2023 for Data Analytics in Evaluation; impact and efforts of Early Career Evaluators; and novel and Innovative Methods. The prizes were sponsored by RSM UK and Ipsos UK.
|
||||
|
||||
|
||||
== Membership ==
|
||||
The most recent membership figures (up to June 2022) are presented in the table below.
|
||||
|
||||
|
||||
== External links ==
|
||||
UKES official website
|
||||
|
||||
|
||||
== References ==
|
||||
Loading…
Reference in New Issue
Block a user