Scrape wikipedia-science: 684 new, 16 updated, 722 total (kb-cron)

2026-05-04 20:15:07 -07:00 · 2026-05-04 20:15:07 -07:00 · 1d902b83b8
commit 1d902b83b8
parent 05b2e404b6
21 changed files with 1080 additions and 0 deletions
--- a/_index.db
+++ b/_index.db
--- a/data/en.wikipedia.org/wiki/ReScience_C-0.md
+++ b/data/en.wikipedia.org/wiki/ReScience_C-0.md
@ -0,0 +1,28 @@
+---
+title: "ReScience C"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/ReScience_C"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:14:56.068754+00:00"
+instance: "kb-cron"
+---
+
+ReScience C is a scientific journal established in 2015 by Nicolas Rougier and Konrad Hinsen with the aim of publishing researchers' attempts to replicate computations made by other authors, using independently written, free and open-source software (FOSS), with an open process of peer review. The journal states that requiring the replication software to be free and open-source ensures the reproducibility of the original research.
+
+
+== Creation ==
+The journal was established in 2015 by Nicolas Rougier and Konrad Hinsen in the context of the replication crisis of the early 2010s, in which concern about difficulty in replicating (different data or details of method) or reproducing (same data, same method) peer-reviewed, published research papers was widely discussed. The journal's scope is computational research, with the motivation that journals rarely require the provision of source code, and when source code is provided, it is rarely checked against the results claimed in the research article.
+
+
+== Policies and methods ==
+The scope of the journal is mainly focussed on researchers' attempts to replicate computations made by other authors, using independently written, free and open-source software (FOSS). Articles are submitted using the "issues" feature of a git repository run by GitHub, together with other online archiving services, including Zenodo and Software Heritage. Peer review takes place publicly in the same "issues" online format.
+In 2020, Nature reported on the results of the journal's "Ten Years' Reproducibility Challenge", in which scientists were asked to try reproducing the results from peer-reviewed articles that they had published at least ten years earlier, using the same data and software if possible, updated to a modern software environment and free licensing. As of 24 August 2020, out of 35 researchers who had proposed to reproduce the results of 43 of their old articles, 28 reports had been written, 13 had been accepted after peer review and published, among which 11 documented successful reproductions.
+
+
+== References ==
+
+
+== External links ==
+Official website
+free-licensed images by Nicolas Rougier, co-editor of ReScience C
--- a/data/en.wikipedia.org/wiki/Reproducibility_Project-0.md
+++ b/data/en.wikipedia.org/wiki/Reproducibility_Project-0.md
@ -0,0 +1,40 @@
+---
+title: "Reproducibility Project"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Reproducibility_Project"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:14:54.886771+00:00"
+instance: "kb-cron"
+---
+
+The Reproducibility Project is a series of crowdsourced collaborations aiming to reproduce published scientific studies, finding high rates of results which could not be replicated. It has resulted in two major initiatives focusing on the fields of psychology and cancer biology. The project has brought attention to the replication crisis, and has contributed to shifts in scientific culture and publishing practices to address it.
+The project was led by the Center for Open Science and its co-founder, Brian Nosek, who started the project in November 2011.
+
+
+== Results ==
+Brian Nosek of University of Virginia and colleagues sought out to replicate 100 different studies, all published in 2008. The project pulled these studies from three different journals, Psychological Science, the Journal of Personality and Social Psychology, and the Journal of Experimental Psychology: Learning, Memory, and Cognition,  published in 2008 to see if they could get the same results as the initial findings.  97 of the original studies had significant effects, but of those 97, only 36% of the replications yielded significant findings (p value below 0.05), and the effects were often smaller than those in the original papers. The authors emphasized that the findings reflect a problem that affects all of science and not just psychology, and that there is room to improve reproducibility in psychology.
+In 2021, the project showed that of 193 experiments from 53 top papers about cancer published between 2010 and 2012, only 50 experiments from 23 papers could be replicated. Moreover, it showed that the effect sizes of that fraction were 85% smaller on average than the original findings. None of the papers had its experimental protocols fully described and 70% of experiments required asking for key reagents.
+
+
+== Impact ==
+The project, along with broader action in response to the replication crisis, has helped spur changes in scientific culture and publishing practices. The results of the Reproducibility Project might also affect public trust in psychology. Lay people who learned about the low replication rate found in the Reproducibility Project subsequently reported a lower trust in psychology, compared to people who were told that a high number of the studies had replicated.
+
+
+== See also ==
+Invalid science
+John Ioannidis
+Meta-analysis
+Metascience
+Proteus phenomenon
+Publication bias
+Replication crisis
+Scientific method
+
+
+== External links ==
+Official website for the Reproducibility Project: Psychology
+Official website for the Reproducibility Project: Cancer Biology
+
+
+== References ==
--- a/data/en.wikipedia.org/wiki/Research_question-0.md
+++ b/data/en.wikipedia.org/wiki/Research_question-0.md
@ -0,0 +1,57 @@
+---
+title: "Research question"
+chunk: 1/3
+source: "https://en.wikipedia.org/wiki/Research_question"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:14:57.233428+00:00"
+instance: "kb-cron"
+---
+
+A research question is "a question that a research project sets out to answer". Choosing a research question is an essential element of both quantitative and qualitative research. Investigation will require data collection and analysis, and the methodology for this will vary widely. Good research questions seek to improve knowledge on an important topic, and are usually narrow and specific.
+To form a research question, one must determine what type of study will be conducted such as a qualitative, quantitative, or mixed study. Additional factors, such as project funding, may not only affect the research question itself but also when and how it is formed during the research process. Literature suggests several variations on criteria selection for constructing a research question, such as the FINER or PICOT methods.
+
+== Definition ==
+The answer to a research question will help address a research problem or question. Specifying a research question, "the central issue to be resolved by a formal dissertation, thesis, or research project," is typically one of the first steps an investigator takes when undertaking research. Considerations, such as project funding or methodological approaches may influence the research process, including when and how the research question is developed. Clearly and accurately defining the research question can become an iterative process.   How the question is constructed can depend on the type of research or discipline.
+
+== Constructing a research question ==
+Specifying the research question is one of the first methodological steps the investigator has to take when undertaking research.  Having an interest in or knowledge of a particular subject can be useful in the construction of a research question. Formation of the research question is largely determined by, and likewise influences, where and what kind of information will be sought. The research question must be accurately and clearly defined. Choosing a research question is the central element of both quantitative and qualitative research and in some cases it may precede construction of the conceptual framework of study; in all cases, it makes the theoretical assumptions in the framework more explicit and indicates what the researcher wants to know most and first.  Therefore, the investigator must first identify the type of study (qualitative, quantitative, or mixed) before the research question is developed. Forming the research question may become an iterative process when parameters of the research process, such as field of study or methodology, do not fit the original question. Literature suggests several methods for selecting criteria in the development of a research question, two of which are the FINER and PICO methods.
+
+=== Construction method examples ===
+
+==== FINER criteria ====
+The FINER method can be a useful tool for outlining research criteria used in the construction of a research question. Due to the flexibility of the criteria, this method may be used for a variety of research scenarios. The FINER method prompts researchers to determine whether one has the means and interest to conduct the study. It also asks one to consider the ethical ramifications, as well as the relevancy of the research.
+
+According to Farrugia et al., the FINER criteria "highlight useful points that may increase the chances of developing a successful research project". These criteria were first suggested in the book Designing Clinical Research by Hulley et al., detailed below.F – Feasible
+Adequate number of subjects
+Adequate technical expertise
+Affordable in time and money
+Manageable in scope
+I – Interesting
+
+Getting the answer intrigues investigator, peers and community
+N – Novel
+
+Confirms, refutes or extends previous findings
+E – Ethical
+
+Amenable to a study that institutional review board will approve
+R – Relevant
+
+To scientific knowledge
+To clinical and health policy
+To future research
+
+==== PICOT criteria ====
+PICOT criteria tend to be used to frame questions used in evidence-based studies, such as medical studies. Such research may focus on assessment or evaluation of patients or problems, as well as what may be the causal factor(s) with control and experimental groups.P – Patient (or Problem)
+I – Intervention (or Indicator)
+C – Comparison group
+O – Outcomes
+
+T – Time Continuing the research process, the investigator then carries out the research necessary to answer the research question, whether this involves reading secondary sources over a few days for an undergraduate term paper or carrying out primary research over years for a major project. When the research is complete and the researcher knows the (probable) answer to the research question, writing up can begin (as distinct from writing notes, which is a process that goes on through a research project). In term papers, the answer to the question is normally given in summary in the introduction in the form of a thesis statement.
+
+== Aggregated research questions and coordination ==
+
+Scientists often communicate open research questions. Sometimes such questions are crowdsourced and/or aggregated, sometimes supplemented with priorities or other details. A common way open research questions are identified, communicated, established/confirmed and prioritized are their inclusion in scientific reviews of a sub-field or specific research question, including in systematic reviews and meta-analyses. Other channels include reports by science journalists and dedicated (sub-)websites such as 80000hours.org's "research questions by discipline" or the Wikipedia articles of the lists of unsolved problems, aggregative/integrative studies, as well as unsolved online posts on Q&A websites and forums, sometimes categorized/marked as unsolved. There have been online surveys used to generate priority research topics which were then classified into broader themes. Such may improve research relevance and value or strengthen rationale for societal dedication of limited resources or expansions of the limited resources or for funding a specific study.
+
+=== Prioritization and evaluations ===
--- a/data/en.wikipedia.org/wiki/Research_question-1.md
+++ b/data/en.wikipedia.org/wiki/Research_question-1.md
@ -0,0 +1,39 @@
+---
+title: "Research question"
+chunk: 2/3
+source: "https://en.wikipedia.org/wiki/Research_question"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:14:57.233428+00:00"
+instance: "kb-cron"
+---
+
+In terms of priorities and related concepts, the proposed strategy of differential technological development suggests research to focus primarily on questions and tools that are thought to increase safety and mitigate issues rather than risky technologies which are instead best to delay. Concerning control strategies for gene drives, researchers have however cautioned that such may lead to a counterproductive false sense of security. Not all technological progress may be beneficial in general or in contemporary contexts (environments or systems) and various research may for example result in engineered pandemics.
+Many studies "ask uninteresting research questions, [and] make only marginal contributions". One study suggests that while research on climate change "is valuable, it does not tackle head-on the most urgent question: how to change society to mitigate climate change right now". In the ethical framework of effective altruism, research questions with the greatest potential benefits from investments (not necessarily of financial nature) are identified to maximize research benefits. 80,000 Hours has compiled a small list of "Research questions that could have a big social impact, organised by discipline". In public health research, "it is vital that research questions posed are important and that funded research meets a research need or a gap in evidence".
+
+=== ICTs, participation and routine procedures ===
+
+Platforms, e.g. citizen science ones, can "support identification of problems, formulation of research questions, and study design". Participatory research can "improve study outcomes and foster greater data accessibility and utility as well as increase public transparency". Participants can have continued discussions and iterations regarding new questions. Research questions can be or are positioned at varying levels of detail – from broad to very specific questions – which are semantically or can be displayed as nested – for instance via category trees. In one platform, about invasion science and based on Wikidata, users "can zoom into the major research questions and hypotheses" of the field, "which are connected to the relevant studies published in the field and, if available, the underlying raw data" with tools like the Wikimedia project Scholia. Individuals "who can ask novel, field-altering questions" may vary from "those who can answer them" or vary per question. Translation of a (societal) problem "from its meaning in an everyday context into a scientifically valid research question means defining the goals of research in such a way that their contribution to practical solutions of a societal problem is narrow enough to be useful". Both everyday practical knowledge and scientific knowledge play a role in this process. In interdisciplinary research, integration "takes place at the level of the posing of research questions in the overlapping areas between various disciplines". There is research into enabling presenting scholarly knowledge "flexibly enriched with contextual information" for specific research questions.
+Identification of open research questions may be useful for the adoption and application of science in society and accelerating specific research and development. There has been a suggestion for establishing a public non-profit organization that would identify "gaps in the science that need addressing", referring to the field of sustainable food system.
+
+=== Examples and breadth of "research questions" ===
+Similar to outlining open research questions, there have also been proposals to e.g. combine specific fields or sources of data and knowledge as the subject or method of new research or to engage more and more scientifically in specific research topics along with the establishment of new high-quality data gathering systems. One approach for the generation of research questions is [identifying, highlighting, and] challenging assumptions of existing theories and studies.
+Sometimes research questions overlap with or also refer to challenges of a specific theory or field such as how to solve known problems with the Standard Model. Research issues and knowledge gaps can also overlap or be synonymous.
+Examples of lists of open significant research questions in reviews include a list of "major outstanding questions" for (applied) human life extension, "fundamental" research questions in subterranean biology, open research questions for digital twins (across fields), open questions in performance measurement of sustainable supply chains, knowledge gaps in antimicrobial resistance, and unaddressed or neglected questions in the literature about 100% renewable energy systems.
+
+== Types and purpose ==
+The research question serves two purposes
+
+It determines where and what kind of research the writer will be looking for.
+It identifies the specific objectives the study or paper will address.
+Therefore, the writer must first identify the type of study (qualitative, quantitative, or mixed) before the research question is developed.
+
+=== Qualitative study ===
+A qualitative study seeks to learn why or how, so the writer's research must be directed at determining the what, why and how of the research topic. Therefore, when crafting a research question for a qualitative study, the writer will need to ask a why or how question about the topic. For example: How did the company successfully market its new product? The sources needed for qualitative research typically include print and internet texts (written words), audio and visual media.
+Here is Creswell's (2009) example of a script for a qualitative research central question:
+
+_________ (How or what) is the _________ ("story for" for narrative research; "meaning of" the phenomenon for phenomenology; "theory that explains the process of" for grounded theory; "culture-sharing pattern" for ethnography; "issue" in the "case" for case study) of _________ (central phenomenon) for _________ (participants) at _________ (research site).
+
+=== Quantitative study ===
+A quantitative study seeks to learn where, or when, so the writer's research must be directed at determining the where, or when of the research topic. Therefore, when crafting a research question for a quantitative study, the writer will need to ask a where, or when question about the topic. For example: Where should the company market its new product? Unlike a qualitative study, a quantitative study is mathematical analysis of the research topic, so the writer's research will consist of numbers and statistics.
+Here is Creswell's (2009) example of a script for a quantitative research question:
--- a/data/en.wikipedia.org/wiki/Research_question-2.md
+++ b/data/en.wikipedia.org/wiki/Research_question-2.md
@ -0,0 +1,41 @@
+---
+title: "Research question"
+chunk: 3/3
+source: "https://en.wikipedia.org/wiki/Research_question"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:14:57.233428+00:00"
+instance: "kb-cron"
+---
+
+Does _________ (name the theory) explain the relationship between _________ (independent variable) and _________ (dependent variable), controlling for the effects of _________ (control variable)?
+Alternatively, a script for a quantitative null hypothesis might be as follows:
+
+There is no significant difference between _________ (the control and experimental groups on the independent variable) on _________ (dependent variable).
+Quantitative studies also fall into two categories:
+
+Correlational studies: A correlational study is non-experimental, requiring the writer to research relationships without manipulating or randomly selecting the subjects of the research. The research question for a correlational study may look like this: What is the relationship between long-distance commuters and eating disorders?
+Experimental studies: An experimental study is experimental in that it requires the writer to manipulate and randomly select the subjects of the research. The research question for an experimental study may look like this: Does the consumption of fast food lead to eating disorders?
+
+=== Mixed study ===
+A mixed study integrates both qualitative and quantitative studies, so the writer's research must be directed at determining the why or how and the what, where, or when of the research topic. Therefore, the writer will need to craft a research question for each study required for the assignment. A typical study may be expected to have between 1 and 6 research questions.
+Once the writer has determined the type of study to be used and the specific objectives the paper will address, the writer must also consider whether the research question passes the "so what" test. The "so what" test means that the writer must construct evidence to convince the audience why the research is expected to add new or useful knowledge to the literature.
+
+== Related terms ==
+
+=== Problematique ===
+Problematique is a term that functions analogously to the research problem or question used typically when addressing global systemic problems. The term achieved prominence in 1970 when Hasan Özbekhan, Erich Jantsch and Alexander Christakis conceptualized the original prospectus of the Club of Rome titled "The Predicament of Mankind". In this prospectus the authors designated 49 Continuous Critical Problems facing humankind, saying "We find it virtually impossible to view them as problems that exist in isolation – or as problems capable of being solved in their own terms... It is this generalized meta system of problems, which we call the 'problematique' that inheres in our situation."
+Situations similar to the global problematique in their complexity are also called problematiques. These situations receive different designations from other authors. In organizational theory and related fields, researchers C. West Churchman, Horst Rittel and Melvin Webber, and Chris Argyris called these situations wicked problems; Russell Ackoff called them "messes".
+
+== See also ==
+
+== References ==
+
+== Further reading ==
+The Little, Brown Guide to Writing Research Papers
+White, Patrick (2017). Developing Research Questions (2nd ed.). Palgrave Macmillan. ISBN 978-1-137-49047-6.
+Creswell, John W. (2014). Research design: qualitative, quantitative, and mixed methods approaches (4th ed.). Thousand Oaks: SAGE Publications. pp. 131–133. ISBN 978-1-4522-2609-5.
+
+== External links ==
+
+Developing a Research Question (esc.edu)
--- a/data/en.wikipedia.org/wiki/Science_Exchange_(company)-0.md
+++ b/data/en.wikipedia.org/wiki/Science_Exchange_(company)-0.md
@ -0,0 +1,44 @@
+---
+title: "Science Exchange (company)"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Science_Exchange_(company)"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:14:58.417168+00:00"
+instance: "kb-cron"
+---
+
+Science Exchange is a cloud-based software company offering an R&D marketplace to buy and sell scientific services. The marketplace gives life sciences companies access to the outsourced research they need and the platform fully automates R&D outsourcing from source to pay. Commercial contract research organizations (CROs) and academic core facilities can sell their products and services directly through the marketplace.
+Science Exchange's enterprise clients include top pharma and emerging biotechnology companies, including Merck, Amgen, Gilead Sciences, Astellas Pharma, AbbVie, and Regeneron Pharmaceuticals.
+Science Exchange was founded in 2011 by Elizabeth Iorns, Ryan Abbott, and Dan Knox, taking part in the startup accelerator program Y Combinator in the summer of 2011.
+
+
+== History ==
+In 2011, as an assistant professor at the University of Miami Miller School of Medicine, Iorns came up with the idea for Science Exchange after needing to conduct immunology experiments, but having difficulty finding potential collaborators or providers to work with. Iorns formed Science Exchange with Knox and Abbott, and the company applied for a place in the Y Combinator startup accelerator program. The company was accepted into the Summer 2011 batch of Y Combinator and launched the first version of its website in August 2011. In 2012 Iorns was recognized by the Kauffman Foundation for her role in starting Science Exchange.
+
+
+== Business model ==
+Science Exchange is a Software-as-a-Service (SaaS), with customers paying an annual subscription fee.
+
+
+== Projects ==
+
+
+=== Reproducibility initiative ===
+In August 2012 Science Exchange partnered with the open-access scientific publisher Public Library of Science (PLOS) to launch the Reproducibility Initiative, a program developed to assist researchers in validating their findings by repeating their experiments through independent laboratories. The program is facilitated by the Science Exchange platform, which matches scientists with experimental service providers according to areas of expertise. Iorns has been a longtime spokesperson on the issue of reproducibility in academic research.
+In 2013 Science Exchange partnered with the Center for Open Science to reproduce findings from widely cited published research in the field of cancer research. The goal of the study, called the Reproducibility Project: Cancer Biology (RP:CB), is to find common reasons explaining why aspects of experiments are hard to reproduce by independent laboratories. In January 2017, the first five replication studies of the Reproducibility Project: Cancer Biology (RP:CB) were published. Three more RP:CB replication studies were published in June 2017.
+
+
+=== Independent validation program ===
+On 30 July 2013 Science Exchange launched a program with reagent supplier antibodies-online.com, based in Aachen, Germany, to independently validate research antibodies. 
+
+
+== Investors ==
+In June 2011, Science Exchange received a $100,000 investment from StartFunds' Yuri Milner, a $50,000 investment from angel investor Ron Conway, and a $20,000 investment from Y Combinator as part of participating in the startup accelerator program. In December 2011 the company announced it had closed a $1.5-million seed financing round led by Andreessen Horowitz. In May 2013 the company closed a $4-million Series A financing round led by Union Square Ventures, Tim O'Reilly's O'Reilly AlphaTech Ventures and several leading angel investors including Esther Dyson, Joshua Schachter, Lisa Gansky and Yuri Milner. In March 2016 the company announced it had closed a $25-million Series B financing round led by Lee Ainslie's Maverick Capital. In June 2017 Science Exchange raised $28-million in Series C funding, led by Norwest Venture Partners. In October 2019, the company announced it had raised an additional $20 million in financing, from a combination of equity and debt sources. Maverick Ventures and Norwest Venture Partners led the financing.
+
+
+== References ==
+
+
+== External links ==
+Science Exchange website
--- a/data/en.wikipedia.org/wiki/Science_of_science_policy-0.md
+++ b/data/en.wikipedia.org/wiki/Science_of_science_policy-0.md
@ -0,0 +1,32 @@
+---
+title: "Science of science policy"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Science_of_science_policy"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:14:59.501630+00:00"
+instance: "kb-cron"
+---
+
+Science of science policy (SoSP) is an emerging interdisciplinary research area that seeks to develop theoretical and empirical models of the scientific enterprise. This scientific basis can be used to help government, and society in general, make better R&D management decisions by establishing a scientifically rigorous, quantitative basis from which policy makers and researchers may assess the impacts of the nation's scientific and engineering enterprise, improve their understanding of its dynamics, and assess the likely outcomes. Examples of research in the science of science policy include models to understand the production of science, qualitative, quantitative and computational methods to estimate the impact of science, and processes for choosing from alternative science portfolios.
+
+
+== Federal effort ==
+The federal government of the United States has long been a supporter of SoSP. In 2006, in response to Office of Science and Technology Policy Director John H. Marburger's challenge for a new "science of science policy", the National Science and Technology Council's Subcommittee on Social, Behavioral and Economic Sciences (SBE) established an Interagency Task Group on Science of Science Policy (ITG) to serve as part of the internal deliberative process of the Subcommittee. In 2008, the SoSP ITG developed and published "The Science of Science Policy: A Federal Research Roadmap", which outlined the Federal efforts necessary for the long-term development of a science of science policy, and presented this Roadmap to the SoSP Community. The ITG's subsequent work has been guided by the questions outlined in the Roadmap and the action steps developed at the workshop. Furthermore, since 2007, the National Science Foundation, in support of academic research to advance the field, has awarded grants from the Science of Science and Innovation Policy (SciSIP) program. The SciSIP research supports and complements the Federal SoSP efforts by providing new tools with immediate relevance to policymakers.
+
+
+== Science of Science and Innovation Policy program ==
+The Science of Science and Innovation Policy (SciSIP) program was established at the National Science Foundation in 2005 in response to a call from John Marburger for a "specialist scholarly community" to study the science of science policy. The program has three major goals: advancing evidence-based science and innovation policy decision making; building a scientific community to study science and innovation policy; and leveraging the experience of other countries. 
+Between 2007 and 2011, over one hundred and thirty awards were made in five rounds of funding. The awardees include economists, sociologists, political scientists, and psychologists. Some of these awards are already showing results in the form of papers, presentations, software, and data development.
+
+
+== See also ==
+Metascience
+Evidence-based policy
+Evidence-based practices
+
+
+== References ==
+
+
+== Further reading ==
--- a/data/en.wikipedia.org/wiki/Scientific_integrity-0.md
+++ b/data/en.wikipedia.org/wiki/Scientific_integrity-0.md
@ -0,0 +1,20 @@
+---
+title: "Scientific integrity"
+chunk: 1/7
+source: "https://en.wikipedia.org/wiki/Scientific_integrity"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:15:01.659613+00:00"
+instance: "kb-cron"
+---
+
+Research integrity or scientific integrity is an aspect of research ethics that deals with best practice or rules of professional practice of scientists.
+First introduced in the 19th century by Charles Babbage, the concept of research integrity came to the fore in the late 1970s. A series of publicized scandals in the United States led to heightened debate on the ethical norms of sciences and the limitations of the self-regulation processes implemented by scientific communities and institutions. Formalized definitions of scientific misconduct, and codes of conduct, became the main policy response after 1990. In the 21st century, codes of conduct or ethics codes for research integrity are widespread. Along with codes of conduct at institutional and national levels, major international texts include the European Charter for Researchers (2005), the Singapore Statement on Research Integrity (2010), the European Code of Conduct for Research Integrity (2011 & 2017), and the Hong Kong Principles for assessing researchers (2020).
+Scientific literature on research integrity falls mostly into two categories: first, mapping of the definitions and categories, especially in regard to scientific misconduct, and second, empirical surveys of the attitudes and practices of scientists. Following the development of codes of conduct, taxonomies of non-ethical uses have been significantly expanded, beyond the long-established forms of scientific fraud (plagiarism, falsification and fabrication of results). Definitions of "questionable research practices" and the debate over reproducibility also target a grey area of dubious scientific results, which may not be the outcome of voluntary manipulations.
+The concrete impact of codes of conduct and other measures put in place to ensure research integrity remain uncertain. Several case studies have highlighted that while the principles of typical codes of conduct adhere to common scientific ideals, they are seen as remote from actual work practices and their efficiency is criticized.
+After 2010, debates on research integrity have been increasingly linked to open science. International codes of conduct and national legislation on research integrity have officially endorsed open sharing of scientific output (publications, data, and code used to perform statistical analyses on the data) as ways to limit questionable research practices and to enhance reproducibility. Having both the data and the actual code enables others to reproduce the results for themselves (or to uncover problems in the analyses when trying to do so). The European Code of Conduct for Research Integrity 2023 states, for example, the principles that, "Researchers, research institutions, and organisations ensure that access to data is as open as possible, as closed as necessary, and where appropriate in line with the FAIR Principles (Findable, Accessible, Interoperable and Reusable)
+for data management" and that "Researchers, research institutions, and organisations are transparent about how to access and gain permission to use data,
+metadata, protocols, code, software, and other research materials".  References to open science have incidentally opened up the debate over scientific integrity beyond academic communities, as it increasingly concerns a wider audience of scientific readers.
+
+== Definition and history ==
+Research integrity or scientific integrity became an autonomous concept within scientific ethics in the late 1970s. In contrast with other forms of ethical misconducts, the debate over research integrity is focused on "victimless offence" that only hurts "the robustness of scientific record and public trust in science". Infractions to research integrity include chiefly "data fabrication, falsification, or plagiarism". In that sense, research integrity mostly deal with the internal process of science. It can be treated as community issue, that should not involve external observers: "research integrity is more autonomously defined and regulated by the community, while research ethics (again, a narrow definition) has closer links to legislation".
--- a/data/en.wikipedia.org/wiki/Scientific_integrity-1.md
+++ b/data/en.wikipedia.org/wiki/Scientific_integrity-1.md
@ -0,0 +1,24 @@
+---
+title: "Scientific integrity"
+chunk: 2/7
+source: "https://en.wikipedia.org/wiki/Scientific_integrity"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:15:01.659613+00:00"
+instance: "kb-cron"
+---
+
+=== Emergence of the issue ===
+Before the 1970s, ethical issues were largely focused on the conduct of medical experiments, especially in regard to tests on humans. In 1803, the "code" of Thomas Percival created a moral foundation for experimental treatments that "was built upon fairly regularly" throughout the next two centuries, notably by Walter Reed in 1898 and by the Berlin code in 1900. After the Second World War, the Nazi human experimentations motivated the development of international, widely acknowledged codes of research ethics, such as the Nuremberg code (1947) and the World Medical Association Declaration of Helsinki.
+According to Kenneth Pimple, Charles Babbage was the first author to set aside the specific issue of scientific integrity. In the Reflections on the Decline of Science in England, and on Some of its Causes, first published in 1830, Babbage identified four classes of scientific frauds, from outright forgery to varied degrees of arrangements and cooking of the data or the methods.
+Research integrity became a major debated topic in biological sciences after 1970, due to a combination of factors: the development of advanced data analysis methods, the growing commercial relevancy of fundamental research, and the increased focus of federal funding agencies in the context of big science. In 1974, the "painted mouse incident" attracted unprecedented media attention: William Summerlin inked a black dot on a mouse to claim a treatment has been a success. Between 1979 and 1981, several major cases of scientific fraud and plagiarism drew a greater focus on the issue from researchers and policymakers in the United States: as many as four important frauds occurred in the summer of 1980.
+At the time, the "scientific community responded to reports of 'scientific fraud' (as it was often called) by asserting that such cases are rare and that neither errors nor deception can be hidden for long because of science's self-correcting nature". A journalist of Science, William Brad, took the opposite position and made an influential contribution to the issue of research integrity. In an answer to the US House of Representatives Science and Technology subcommittee, he highlighted that "cheating in science was nothing new" but, until recently, "had been handled as an internal affair". In a detailed investigation co-signed with Nicholas Wade, Betrayers of Science, Brad described scientific fraud as a structural problem: "As more cases of frauds broke into public view (…) we wondered if fraud wasn't a quite regular minor feature of the scientific landscape (…) Logic, replication, peer review — all had been successfully defied by scientific forgers, often for extended periods of time." Other early assessments of the systematicity of scientific frauds presented a more nuanced picture. For Patricia Wolff, along with a few obvious manipulations, there were a wide range of grey areas, which were due to the complexity of fundamental research: "the boundaries between egregious self-deception, culpable carelessness, fraud, and just plain error, can be very blurred indeed". Characteristically, the debate led to a re-evaluation of past scientific practices. In 1913, a well-known scientific experiment on electron charge by Robert Millikan was explicitly based on discarding some results that would not agree with the underlying theory: while well received at the time, by the 1980s this work had come to be considered as a textbook example of scientific manipulation.
+
+=== Formalization of research integrity ===
+By the end of the 1980s, the amplification of misconduct scandals and the heightened political and public scrutiny put scientists in a difficult position in the United States and elsewhere: "The tone of the 1988 US congressional oversight hearings, chaired by Rep. John Dingell (D-MI), that investigated how research institutions were responding to misconduct allegations reinforced many scientists' view that both they and scientific research itself were under siege." The main answer was procedural: research integrity has "been codified into numerous codes of conduct field specific, national, and international alike." This policy response largely stemmed from research communities, funders and scientific administrators. In the United States, the United States Public Health Service and the National Science Foundation adopted "similar definitions of misconduct in science" in 1989 and 1991. The concepts of research integrity and its reverse, scientific misconduct were especially relevant from the perspective funding bodies, since it made it possible to "delineate the research-related practices that merit intervention": lack of integrity led not only to unethical but inefficient research and funds have better to be allocated elsewhere.
+After 1990, there was a "veritable explosion of scientific codes of conduct". In 2007 the OECD published a report on best practices for promoting scientific integrity and preventing misconduct in science (Global Science Forum). Such international texts include:
+
+European Charter for Researchers (2005)
+the Singapore statement on research integrity (2010)
+European Code of Conduct for Research Integrity of All European Academies (ALLEA) and the European Science Foundation (ESF) (2011 revised in 2017).
+There are no global estimates of the total number of codes of conduct related to research integrity. A UNESCO project, the Global Ethics Observatory (no longer accessible after 2021), referenced 155 codes of conduct but "this is probably just a fraction of the total number of codes produced in recent years." Codes have been created in highly diverse settings and show a wide variation in scale and ambition. Along with national-scale codes, there are codes for scientific societies, institutions and R&D services. While these normative texts may frequently share a core of common principles, there has been growing concern "over fragmentation, lack of interoperability and varying understandings of central terms can be sensed".
--- a/data/en.wikipedia.org/wiki/Scientific_integrity-2.md
+++ b/data/en.wikipedia.org/wiki/Scientific_integrity-2.md
@ -0,0 +1,22 @@
+---
+title: "Scientific integrity"
+chunk: 3/7
+source: "https://en.wikipedia.org/wiki/Scientific_integrity"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:15:01.659613+00:00"
+instance: "kb-cron"
+---
+
+== Taxonomy and classification ==
+In codes of conduct, the definition of research integrity is usually negative: the collection of norms aims to single out different forms of unethical research and scientific misconduct with varying degrees of gravity.
+The multiplication of codes of conduct has also corresponded with an expansion of scope. While the initial debate was focused on "three deadly sins of scientific and scholarly research: fabrication, falsification and plagiarism", attention has later shifted "to the lesser breaches of research integrity". In 1830, Charles Babbage introduced the first taxonomy of scientific frauds that already encover some forms of questionable research practices : hoaxing (a voluntary fraud "far from justifiable"), forging ("whereas the forger is one who, wishing to acquire a reputation for science, records observations which he has never made"), trimming (which "consists in clipping off little bits here and there from those observations which differ most in excess from the mean" and cooking. Cooking is the main focus of Babbage as an "art of various forms, the object of which is to give to ordinary observations the appearance and character of those of the highest degree of accuracy". It falls done under several sub-cases such as data selection ("if a hundred observations are made, the cook must be very unlucky if he cannot pick out fifty or twenty to do the serving up", model/algorithm selection ("another approved receipt (…) is to calculate them by two different formulae") or use of different constants.
+In the late 20th century, this classification has been greatly expanded and have come to encompass a wider range of deficiencies than intentional frauds. The formalization of research integrity entailed a structural change in the vocabularies and the concept associated with it. By the end of the 1990s, use of the expression "scientific fraud" was discouraged in the United States, in favor a "semi-legal term": scientific misconducts. The scope of scientific misconducts is expansive: along with data fabrication, falsification and plagiarism it includes "other serious deviations" that are demonstrably done in bad faith. The associated concept of questional research practice, first incepted in a 1992 report of the Committee on Science, Engineering, and Public Policy, has an even broader scope, as it also encompass potentially non-intentional research failures (such as inadequacies in the research data management process). In 2016, a study identified as much as 34 questionable research practices or "degree of freedom", that can occur at all the steps of the project (the initial hypothesis, the design of the study, collection of the data, the analysis and the reporting).
+After 2005, research integrity has been additionally redefined through the perspective of research reproducibility and, more specifically, of the "reproducibility crisis". Studies of reproducibility suggest that there is continuum between irreproducibility, questionable research practices and scientific misconducts: "Reproducibility is not just a scientific issue; it is also an ethical one. When scientists cannot reproduce a research result, they may suspect data fabrication or falsification." In this context, ethical debates are less focused on a few highly publicized scandals and more on the suspicion that the standard scientific process is broken and fails to meet its own standard.
+Another type of risk to scientific integrity is reduced political diversity among scientists, which can lead to conformity and political bias.
+
+== Current landscape and issues ==
+
+=== Prevalence of ethical issues ===
+In 2009, a meta-analysis of 18 surveys estimated that less than 2% of scientists "admitted to have fabricated, falsified or modified data or results at least once". Real prevalence may be under-estimated due to self-reporting: regarding "the behaviour of colleagues admission rates were 14.12%". Questionable research practices are more widespread as more than one third of the respondents admit to have done it once. A large 2021 survey of 6,813 respondents in the Netherlands found significantly higher estimate, with 4% of the respondents engaging in data fabrication and more than half of the respondents engaging in questionable research practices. Higher rates can be either attributed to a deterioration of ethic norms or to "the increased awareness of research integrity in recent years". The higher rates of self-declared scientific misconducts are found in the medical and life science, with at much as 10.4% respondents surveyed in the Nerthelands admitting a scientific fraud (either fabrication of falsification of the data).
+Other forms or scientific misconducts or questionable research practices are both less problematic and much more widespread. A 2012 survey of 2,000 psychologists found that "the percentage of respondents who have engaged in questionable practices was surprisingly high", especially in regard to selective reporting. A 2018 survey of 807 researchers in ecology an evolutionary biology showed that 64% "did not report results because they were not statistically significant", 42% have decided to collect additional data "after inspecting whether results were statistically significant" and 51% "reported an unexpected finding as though it had been hypothesised from the start". As they come from self-declared survey, these estimations are likely to be underestimated.
--- a/data/en.wikipedia.org/wiki/Scientific_integrity-3.md
+++ b/data/en.wikipedia.org/wiki/Scientific_integrity-3.md
@ -0,0 +1,26 @@
+---
+title: "Scientific integrity"
+chunk: 4/7
+source: "https://en.wikipedia.org/wiki/Scientific_integrity"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:15:01.659613+00:00"
+instance: "kb-cron"
+---
+
+=== Implementation and assessment of codes of conduct ===
+Several case studies and retrospective analyses have been devoted to the reception of codes of conduct in scientific communities. They frequently highlight a discrepancy between the theoretical norms and the "lived morality of researchers".
+In 2004, Caroline Whitbeck underlined that the enforcement of a few formal rules has overall failed to answer to a structural "erosion or neglect" of scientific trust. In 2009, Schuurbiers, Osseweijer and Kinderler led a series of interviews in the aftermath of the Dutch code of conduct on research integrity, introduced in 2005. Overall, most respondents were unaware of the code and complementary ethical recommendations. While the principles "were seen to reflect the norms and values within science rather well", they seemed to be isolated from the actual work practices, which "may lead to morally complex situations". Respondents were also critical of the underlying individualist philosophy of the code, which shifted the entire blame to individual researchers without taking into account institutional or community-wide issues. In 2015, a survey of "64 faculty members at a large southwestern university" in the United States "yielded similar results": many of the respondents were not aware of the existing ethical guidelines, and the communication process remained poor. In 2019, a case study on Italian universities noted that the proliferation of research codes "has a reactive nature because codes of ethics are drawn up in response to scandals and as a result are punitive and negative, with lists of prohibitions".
+Codes of conduct on research integrity may have a more significant impact on professional identity. Development of research codes has been equated to an internalization of issues related to research integrity within scientific social circles and its close associate with disputed results, which made it a typical form of "knowledge club" governance. In contrast to a wider range of ethical issues that may overlap with more general social debates (such as gender equality), research integrity belongs to a form of professional ethics analogous to the ethical standards applied by journalists or medical practicians. As such, not only does it create a common moral framework but also, incidentally, "justifies the existence of the profession as separate from other professions". While the impact of codes on actual ethical practices remains difficult to assess, they have a more measurable impact on the professionalization of research, by transforming informal norms and customs into a set of predefined principles: "codes in general are supported both by those pursuing them as a vehicle to encourage the greater professionalization of biologists (e.g., an initial stage to introducing professional licensing) and those seeking them to forestall any further regulation."
+
+== Research integrity and open science ==
+In the 2000s and 2010s, scientific integrity was gradually reframed in the context of open science, and increased accessibility to scientific publications. The debate on research reproducibility has significantly contributed to this evolution.
+
+=== Ethics of open science ===
+The underlying ethical principles of open science predates the development of an organized open science movement. In 1973, Robert K. Merton theorized a normative "ethos of science" structured on a "norm of disclosure". This norm "was far from universally accepted" in the early development of scientific communities and has remained "one of the many ambivalent precepts contained in the institution of science." Disclosure was counterbalanced by the limitations of the publication and evaluation process, that tended to slow down the divulgation of research results. In the early 1990s, this norm of disclosure was reframed as norm of "openness" or "open science".
+The early open access and open science movements emerged partly as a reaction against the large corporate model that has come to dominate scientific publishing since the Second World War. Open science was not framed as a radical transformation of scientific communication but as a realization of core underlying principles, already visible at the start of the scientific revolution of the 17th and the 18th century: the autonomy and self-governance of scientific communities and the divulgation of research results.
+Since 2000, the open science movement has expanded beyond access to scientific outputs (publication, data or software) to encompass the entire process of scientific production. The reproducibility crisis has been an instrumental factor in this development, as it moved the debates over the definition open science further from scientific publishing. In 2018, Vicente-Saez and Martinez-Fuentes have attempted to map the common values shared by the standard definitions of open science in the English-speaking scientific literature indexed on Scopus and the Web of Science. Access is no longer the main dimension of open science, as it has been extended by more recent commitments toward transparency, collaborative work and social impact. These diverse conceptual dimensions "encompasses (Graph 5) the emerging trends on Open Science such as open code […] open notebooks, open lab books, science blogs, collaborative bibliographies, citizen science, open peer review, or pre-registration"
+Through this process, open science has been increasingly structured over a consisting set of ethical principles: "novel open science practices have developed in tandem with novel organising forms of conducting and sharing research through open repositories, open physical labs, and transdisciplinary research platforms. Together, these novel practices and organising forms are expanding the ethos of science at universities."
+
+=== Codification of open science ethics ===
+The translation of the ethical values of open science toward applied recommendation was mostly undertaken by institutional and communities initiatives until the 2010s. The TOP guidelines were elaborated in 2014 by a committee for Transparency and Openness Promotion that included "disciplinary leaders, journal editors, funding agency representatives, and disciplinary experts largely from the social and behavioral sciences". The guidelines rely on eight standards, with different levels of compliance. While the standards are modular, they also aim to articulate a consistent ethos of science as "they also complement each other, in that commitment to one standard may facilitate adoption of others.". The highest levels of compliance for each standard include the following requirements:
--- a/data/en.wikipedia.org/wiki/Scientific_integrity-4.md
+++ b/data/en.wikipedia.org/wiki/Scientific_integrity-4.md
@ -0,0 +1,41 @@
+---
+title: "Scientific integrity"
+chunk: 5/7
+source: "https://en.wikipedia.org/wiki/Scientific_integrity"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:15:01.659613+00:00"
+instance: "kb-cron"
+---
+
+In 2018, Heidi Laine attempted to establish a nearly-exhaustive list of "ethical principles associated with open science":
+
+This categorization has to contend with the diversity of approaches and values associated with the open science movement and their ongoing evolutions, as the "term will likely remain as fluid as any other attempt to coin a complex system of practices, values and ideologies in one term". Laine identified a significant variation in the way open science principles have been embedded in four major codes of conduct and statements on research integrity: the Singapore Statement on Research Integrity (2010), the Montreal Statement on Research Integrity in Cross-Boundary Research Collaborations (2013), the Responsible Conduct of Research and Procedures for Handling Allegations of Misconduct in Finland (2012) and the European Code of Conduct for Research Integrity (2017). Access to research publications is recommended in all four codes. Integrations of data sharing and reproducibility practices are less obvious, and vary from a tacit approval to detailed support, in the case of the later European Code of Conduct: "The European code pays data management almost an equal amount of attention as publishing and is also in this sense the most advanced of the four CoCs." Yet, important areas of open science, are consistently ignored, especially regarding the development of open science infrastructure, increased transparency of evaluation or support for citizen science and wider social impact. Overall, Laine found "none of the evaluated CoCs to be in blatant contradiction with the ethical principles of open science, but only the European code of conduct can be said to actively support and give guidance on open science."
+After 2020, new forms of open science code of conduct have explicitly claimed to "foster the ethos of open scientific practices". First adopted in July 2020, the Hong Kong principles for assessing researchers acknowledge open science as one of the five pillars of scientific integrity: "It seems clear that the various modalities of open science need to be rewarded in the assessment of researchers because these behaviors strongly increase transparency, which is a core principle of research integrity."
+
+=== Research integrity and society ===
+While there is still a continuum between the procedural norms of the codes of conduct and the range of values encompassed by open science, open science has significantly altered the setting and the context of the ethical debate. Open scientific productions can be universally shared in theory: their dissemination is not constrained to the classic membership model of the "knowledge club". Implications are wider as well, as potential misuses of scientific publications is no longer limited to professional scientists. The discrepancy was already visible in the late 2000s, although it was framed under "different buzzwords": in a case study on the implementation of the Dutch code of conduct, Schuubiers, Osseweijer and Kinderlerer already identified a "shift in practices" that "goes by many names like Mode 2 science, post-normal science, or post-academic science" that a diverse array of transfrom such as  technological evolution in the management of research, increased involvement of private actors, open innovation or open access. These structural trends were not well covered by the existing codes of conduct.
+In the 1990s and the 2000s, discussions about research integrity have become increasingly professionalized and detached from the public domain. The shift toward open science may potentially contradict this trend, as the range of interesting parties and potential reusers of scientific production has expanded well beyond professional academic circles. In 2018, Heidi Laine underlines that established codes of conduct have not yet taken this decisive step: "The one aspect where even the European code falls short of a full recognition of open science is in crossing the traditional professional borders of the research community, i.e. citizen science, open collaboration and science communication." By not taking into account this new framework, existing codes of conduct risk becoming increasingly out of touch with the reality of scientific practices:
+
+If the ethical aspects of open science continue to be left out of RCR (Responsible Code of Research) guidance and ponderings, the research community risks losses on both fronts: open science as well as RI (Research integrity). Open science is just as much about values and ethics as it is about technology. Most of all it is about the role of science in society. It is perhaps the most all-encompassing value discussion that the research community has ever known, and the research integrity angle and community of experts risks being side-lined.
+The broadened discussion about scientific integrity led to an increased involvement of political institutions and representatives, beyond specialized scientific committee and funders. In 2021, the French government passed a decree on scientific integrity, which called for generalization of open science practices.
+
+== Initiatives ==
+In 2007 the OECD published a report on best practices for promoting scientific integrity and preventing misconduct in science (Global Science Forum).
+Main international texts in this field:
+
+European Charter for Researchers (2005)
+the Singapore statement on research integrity (2010)
+European Code of Conduct for Research Integrity of All European Academies (ALLEA) and the European Science Foundation (ESF) (2011 revised in 2017).
+
+=== In Europe ===
+The European Code of Conduct for Research Integrity, published in 2011 and revised in 2017, develops the concept of scientific integrity along four main lines :
+
+Reliability: concerns the quality and reproducibility of research.
+Honesty: concerns the transparency and objectivity of research.
+Respect: for the human, cultural, and ecological environment of research.
+Accountability: concerns the implications of publishing the research.
+
+=== In the USA ===
+
+==== US Department of Health and Human Services ====
--- a/data/en.wikipedia.org/wiki/Scientific_integrity-5.md
+++ b/data/en.wikipedia.org/wiki/Scientific_integrity-5.md
@ -0,0 +1,51 @@
+---
+title: "Scientific integrity"
+chunk: 6/7
+source: "https://en.wikipedia.org/wiki/Scientific_integrity"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:15:01.659613+00:00"
+instance: "kb-cron"
+---
+
+In a statement made by the US Department of Health and Human Services (HHS), they adopted the definition of Scientific Integrity as stated below. This policy is currently being reviewed and will be officially published in early 2024." Scientific integrity is the adherence to professional practices, ethical behavior, and the principles of honesty and objectivity when conducting, managing, using the results of, and communicating about science and scientific activities. Inclusivity, transparency, and protection from inappropriate influence are hallmarks of scientific integrity."-HHS  To promote a culture of scientific integrity at HHS, they have outlined their policy in seven specific areas:
+Protecting Scientific Processes
+Ensuring the Free Flow of Scientific Information
+Supporting Policymaking Processes
+Ensuring Accountability
+Protecting Scientists
+Professional Development for Government Scientists
+Federal Advisory Committees
+As a result of these areas, open science practices can be promoted to protect against bias, plagiarism, and data fabrication, falsification as well as inappropriate influencing, political interference, and censorship.
+
+===== National Institute of Health =====
+
+The National Institute of Health (NIH) is a branch of the HHS. They act as the nation's medical research agency which focuses on making important discoveries that improve health and save lives. The mission of NIH is to provide a fundamental understanding of the nature and behavior of living systems and applying that understanding to improve health, extend life, and reduce illness and disability. The NIH fosters the definition of Scientific Integrity from the HHS Scientific Integrity Policy draft to ensure their scientific findings are objective, creditable, transparent, and readily available to the public. All NIH staff are expected to:
+
+Foster an organizational Culture of Scientific Integrity
+Protect the Integrity of the Research Process
+Communicate Science with Integrity
+Safeguard Scientific Integrity
+
+== See also ==
+
+Academic careerism
+Academic integrity
+Replication crisis
+Research Integrity Risk Index
+
+== References ==
+
+== Bibliography ==
+
+=== Books & Thesis ===
+Babbage, Charles (1830). Reflections on the Decline of Science in England: And on Some of Its Causes, by Charles Babbage (1830). To which is Added On the Alleged Decline of Science in England, by a Foreigner (Gerard Moll) with a Foreword by Michael Faraday (1831). B. Fellowes.
+Merton, Robert K. (1973). The Sociology of Science: Theoretical and Empirical Investigations. University of Chicago Press. ISBN 978-0-226-52092-6.
+Broad, William J.; Wade, Nicholas (1983). Betrayers of the Truth. Simon and Schuster. ISBN 978-0-671-44769-4.
+Suber, Peter (2012-07-20). Open Access. Cambridge, Mass: The MIT Press. ISBN 978-0-262-51763-8.
+Rentier, Bernard (2019-03-08). Open Science, the challenge of transparency (1er édition ed.). Académie royale de Belgique.
+Pimple, Kenneth D., ed. (2017-05-15). Research Ethics. Routledge. ISBN 978-1-351-90400-1.
+
+=== Reports ===
+Pauly, Gerhard (2021). OSCAR open science code of conduct (Report). Fraunhofer-Gesellschaft zur Förderung der Angewandten Forschung e.V.
+Henriet, M Pierre; Ouzoulias, M Pierre (2021). Promouvoir et protéger une culture partagée de l'intégrité scientifique (Report). Assemblée nationale.
--- a/data/en.wikipedia.org/wiki/Scientific_integrity-6.md
+++ b/data/en.wikipedia.org/wiki/Scientific_integrity-6.md
@ -0,0 +1,41 @@
+---
+title: "Scientific integrity"
+chunk: 7/7
+source: "https://en.wikipedia.org/wiki/Scientific_integrity"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:15:01.659613+00:00"
+instance: "kb-cron"
+---
+
+=== Journal articles ===
+Woolf, Patricia K. (1988). "Deception in Scientific Research". Jurimetrics. 29 (1): 67–95. ISSN 0897-1277. JSTOR 29762108. PMID 11654908.
+Partha, Dasgupta; David, Paul A. (1994-09-01). "Toward a new economics of science". Research Policy. Special Issue in Honor of Nathan Rosenberg. 23 (5): 487–521. doi:10.1016/0048-7333(94)01002-1. ISSN 0048-7333.
+Pimple, Kenneth D. (2002-06-01). "Six domains of research ethics". Science and Engineering Ethics. 8 (2): 191–205. doi:10.1007/s11948-002-0018-1. ISSN 1471-5546. PMID 12092490. S2CID 25084326.
+Whitbeck, Caroline (2004). "Trust and the Future of Research". Physics Today. 57 (11): 48–53. Bibcode:2004PhT....57k..48W. doi:10.1063/1.1839377. ISSN 0031-9228. Retrieved 2022-02-12.
+Ioannidis, John P. A. (2005). "Why Most Published Research Findings Are False". PLOS Medicine. 2 (8): –124. doi:10.1371/journal.pmed.0020124. ISSN 1549-1676. PMC 1182327. PMID 16060722.
+Rappert, Brian (2007). "Codes of conduct and biological weapons: an in-process assessment". Biosecurity and Bioterrorism: Biodefense Strategy, Practice, and Science. 5 (2): 145–154. doi:10.1089/bsp.2007.0003. hdl:10036/32132. ISSN 1538-7135. PMID 17608600.
+David, Paul A. (2008-10-24). "The Historical Origins of 'Open Science': An Essay on Patronage, Reputation and Common Agency Contracting in the Scientific Revolution". Capitalism and Society. 3 (2). doi:10.2202/1932-0213.1040. ISSN 1932-0213. S2CID 41478207. Retrieved 2021-11-11.
+Schuurbiers, Daan; Osseweijer, Patricia; Kinderlerer, Julian (2009). "Implementing the Netherlands code of conduct for scientific practice-a case study". Science and Engineering Ethics. 15 (2): 213–231. doi:10.1007/s11948-009-9114-9. ISSN 1353-3452. PMID 19156537.
+Fanelli, Daniele (2009). "How Many Scientists Fabricate and Falsify Research? A Systematic Review and Meta-Analysis of Survey Data". PLOS ONE. 4 (5): –5738. Bibcode:2009PLoSO...4.5738F. doi:10.1371/journal.pone.0005738. ISSN 1932-6203. PMC 2685008. PMID 19478950.
+John, Leslie K.; Loewenstein, George; Prelec, Drazen (2012-05-01). "Measuring the Prevalence of Questionable Research Practices With Incentives for Truth Telling". Psychological Science. 23 (5): 524–532. doi:10.1177/0956797611430953. ISSN 0956-7976. PMID 22508865. S2CID 8400625.
+Löppönen, Paavo; Vuorio, Eero (2013-02-21). "Tutkimusetiikka Suomessa 1980-luvulta tähän päivään". Tieteessä tapahtuu. 31 (1). ISSN 1239-6540. Retrieved 2022-02-12.
+Resnik, David B.; Rasmussen, Lisa M.; Kissling, Grace E. (2015-09-03). "An International Study of Research Misconduct Policies". Accountability in Research. 22 (5): 249–266. doi:10.1080/08989621.2014.958218. ISSN 0898-9621. PMC 4449617. PMID 25928177.
+Giorgini, Vincent; Mecca, Jensen T.; Gibson, Carter; Medeiros, Kelsey; Mumford, Michael D.; Connelly, Shane; Devenport, Lynn D. (2015). "Researcher Perceptions of Ethical Guidelines and Codes of Conduct". Accountability in Research. 22 (3): 123–138. doi:10.1080/08989621.2014.955607. ISSN 0898-9621. PMC 4313573. PMID 25635845.
+Nosek, B. A.; Alter, G.; Banks, G. C.; Borsboom, D.; Bowman, S. D.; Breckler, S. J.; Buck, S.; Chambers, C. D.; Chin, G.; Christensen, G.; Contestabile, M.; Dafoe, A.; Eich, E.; Freese, J.; Glennerster, R.; Goroff, D.; Green, D. P.; Hesse, B.; Humphreys, M.; Ishiyama, J.; Karlan, D.; Kraut, A.; Lupia, A.; Mabry, P.; Madon, T.; Malhotra, N.; Mayo-Wilson, E.; McNutt, M.; Miguel, E.; Paluck, E. Levy; Simonsohn, U.; Soderberg, C.; Spellman, B. A.; Turitto, J.; VandenBos, G.; Vazire, S.; Wagenmakers, E. J.; Wilson, R.; Yarkoni, T. (2015-06-26). "Promoting an open research culture". Science. 348 (6242): 1422–1425. Bibcode:2015Sci...348.1422N. doi:10.1126/science.aab2374. ISSN 0036-8075. PMC 4550299. PMID 26113702.
+Wicherts, Jelte M.; Veldkamp, Coosje L. S.; Augusteijn, Hilde E. M.; Bakker, Marjan; van Aert, Robbie C. M.; van Assen, Marcel A. L. M. (2016). "Degrees of Freedom in Planning, Running, Analyzing, and Reporting Psychological Studies: A Checklist to Avoid p-Hacking". Frontiers in Psychology. 7: 1832. doi:10.3389/fpsyg.2016.01832. ISSN 1664-1078. PMC 5122713. PMID 27933012.
+Baker, Monya (2016-05-26). "1,500 scientists lift the lid on reproducibility". Nature News. 533 (7604): 452–454. Bibcode:2016Natur.533..452B. doi:10.1038/533452a. PMID 27225100. Retrieved 2020-02-08.
+Resnik, David B.; Shamoo, Adil E. (2017). "Reproducibility and Research Integrity". Accountability in Research. 24 (2): 116–123. doi:10.1080/08989621.2016.1257387. ISSN 0898-9621. PMC 5244822. PMID 27820655.
+Fraser, Hannah; Parker, Tim; Nakagawa, Shinichi; Barnett, Ashley; Fidler, Fiona (2018). "Questionable research practices in ecology and evolution". PLOS ONE. 13 (7): –0200303. Bibcode:2018PLoSO..1300303F. doi:10.1371/journal.pone.0200303. ISSN 1932-6203. PMC 6047784. PMID 30011289.
+Vicente-Saez, Ruben; Martinez-Fuentes, Clara (2018-07-01). "Open Science now: A systematic literature review for an integrated definition". Journal of Business Research. 88: 428–436. doi:10.1016/j.jbusres.2017.12.043. ISSN 0148-2963. S2CID 158229869. Retrieved 2022-04-18.
+Laine, Heidi (2018-12-31). "Open science and codes of conduct on research integrity". Informaatiotutkimus. 37 (4). doi:10.23978/inf.77414. hdl:10138/293054. ISSN 1797-9129.
+Fanelli, Daniele (2018-03-13). "Opinion: Is science really facing a reproducibility crisis, and do we need it to?". Proceedings of the National Academy of Sciences. 115 (11): 2628–2631. doi:10.1073/pnas.1708272114. ISSN 0027-8424. PMC 5856498. PMID 29531051.
+Mion, Giorgio; Broglia, Angela; Bonfanti, Angelo (2019). "Do Codes of Ethics Reveal a University's Commitment to Sustainable Development? Evidence from Italy". Sustainability. 11 (4): 1134. Bibcode:2019Sust...11.1134M. doi:10.3390/su11041134. ISSN 2071-1050.
+Moher, David; Bouter, Lex; Kleinert, Sabine; Glasziou, Paul; Sham, Mai Har; Barbour, Virginia; Coriat, Anne-Marie; Foeger, Nicole; Dirnagl, Ulrich (2020). "The Hong Kong Principles for assessing researchers: Fostering research integrity". PLOS Biology. 18 (7): –3000737. doi:10.1371/journal.pbio.3000737. ISSN 1545-7885. PMC 7365391. PMID 32673304.
+Vicente-Saez, Ruben; Gustafsson, Robin; Van den Brande, Lieve (2020-07-01). "The dawn of an open exploration era: Emergent principles and practices of open science and innovation of university research teams in a digital world". Technological Forecasting and Social Change. 156 120037. doi:10.1016/j.techfore.2020.120037. ISSN 0040-1625.
+Bouter, Lex (2020-08-01). "What Research Institutions Can Do to Foster Research Integrity". Science and Engineering Ethics. 26 (4): 2363–2369. doi:10.1007/s11948-020-00178-5. ISSN 1471-5546. PMC 7417389. PMID 31965429.
+Moher, David; Bouter, Lex; Kleinert, Sabine; Glasziou, Paul; Sham, Mai Har; Barbour, Virginia; Coriat, Anne-Marie; Foeger, Nicole; Dirnagl, Ulrich (2020). "The Hong Kong Principles for assessing researchers: Fostering research integrity". PLOS Biology. 18 (7): –3000737. doi:10.1371/journal.pbio.3000737. ISSN 1545-7885. PMC 7365391. PMID 32673304.
+Gopalakrishna, Gowri; Riet, Gerben ter; Vink, Gerko; Stoop, Ineke; Wicherts, Jelte; Bouter, Lex (2021-07-06). Prevalence of questionable research practices, research misconduct and their potential explanatory factors: a survey among academic researchers in The Netherlands. Retrieved 2022-02-18.
+
+=== Other sources ===
+"Public and Scientists' Views on Science and Society". Pew Research Center Science & Society. 2015-01-29. Retrieved 2021-11-11.
--- a/data/en.wikipedia.org/wiki/Scientometrics-0.md
+++ b/data/en.wikipedia.org/wiki/Scientometrics-0.md
@ -0,0 +1,39 @@
+---
+title: "Scientometrics"
+chunk: 1/2
+source: "https://en.wikipedia.org/wiki/Scientometrics"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:15:03.468548+00:00"
+instance: "kb-cron"
+---
+
+Scientometrics is a subfield of informetrics that studies science through mathematical and quantitative methods, analyzing both scientific output and the dynamics of research. Major research issues in scientometrics include the measurement of the impact of research papers and academic journals, the understanding of scientific citations, and the use of such measurements in policy and management contexts. 
+In practice, there is a substantial overlap between scientometrics and other scientific fields, including information systems, information science, the science of science policy, the sociology of science, and metascience. Critics have argued that overreliance on scientometrics has created a system of perverse incentives, producing a publish or perish environment that leads to low-quality research.
+
+== Historical development ==
+
+Modern scientometrics is mostly based on the work of Derek J. de Solla Price and Eugene Garfield. The latter created the Science Citation Index and founded the Institute for Scientific Information which is heavily used for scientometric analysis. A dedicated academic journal, Scientometrics, was established in 1978. The industrialization of science increased the number of publications and research outcomes and the rise of computers allowed effective analysis of this data. While the sociology of science focused on the behavior of scientists, scientometrics focused on the analysis of publications. Accordingly, scientometrics is also referred to as the scientific and empirical study of science and its outcomes.
+The development of modern scientometrics also had roots in the Soviet Union, emerging from the broader tradition of naukovedenie (science of science). The term scientometrics is a calque of the Russian term naukometriya, introduced by Vasily Nalimov in 1966 and popularized in his 1969 monograph co-authored with Zinaida Mulchenko, Naukometriya: Izuchenie razvitiya nauki kak informatsionnogo protsessa (Scientometrics: The Study of the Development of Science as an Information Process). Soviet approaches often treated science as an informational and system-level process, as reflected in the work of Nalimov and Mulchenko, whereas early Western scientometrics was strongly associated with citation indexing and the development of bibliometric indicators. Zinaida Mulchenko's role in the development of the field has received comparatively less attention in later Western accounts of scientometrics.
+The International Society for Scientometrics and Informetrics founded in 1993 is an association of professionals in the field.
+Later, around the turn of the century, evaluation and ranking of scientists and institutions came more into the spotlights. Based on bibliometric analysis of scientific publications and citations, the Academic Ranking of World Universities ("Shanghai ranking") was first published in 2004 by the Shanghai Jiao Tong University. Impact factors became an important tool to choose between different journals. Rankings such as the Academic Ranking of World Universities and the Times Higher Education World University Rankings (THE-ranking) became an indicator for the status of universities. The h-index became an important indicator of the productivity and impact of the work of a scientist. However, alternative author-level metrics have been proposed.
+Around the same time, the interest of governments in evaluating research for the purpose of assessing the impact of science funding increased. As the investments in scientific research were included as part of the U.S. American Recovery and Reinvestment Act of 2009 (ARRA), a major economic stimulus package, programs like STAR METRICS were set up to assess if the positive impact on the economy would actually occur.
+
+== Methods and findings ==
+Methods of research include qualitative, quantitative and computational approaches. The main focus of studies have been on institutional productivity comparisons, institutional research rankings, journal rankings establishing faculty productivity and tenure standards, assessing the influence of top scholarly articles, and developing profiles of top authors and institutions in terms of research performance.
+One significant finding in the field is a principle of cost escalation to the effect that achieving further findings at a given level of importance grow exponentially more costly in the expenditure of effort and resources. However, new algorithmic methods in search, machine learning and data mining are showing that is not the case for many information retrieval and extraction-based problems.
+More recent methods rely on open source and open data to ensure transparency and reproducibility in line with modern open science requirements. For instance, the Unpaywall index and attendant research on open access trends is based on data retrieved from OAI-PMH endpoints of thousands of open archives provided by libraries and institutions worldwide.
+Recommendations to avoid common errors in scientometrics include: select topics with sufficient data; use data mining and web scraping, combine methods, and eliminate "false positives". It is also necessary to understand the limits of search engines (e.g. Web of Science, Scopus and Google Scholar) which fail to index thousands of studies in small journals and underdeveloped countries.
+
+== Common scientometric indexes ==
+Indexes may be classified as article-level metrics, author-level metrics, and journal-level metrics depending on which feature they evaluate.
+
+=== Impact factor ===
+
+The impact factor (IF) or journal impact factor (JIF) of an academic journal is a measure reflecting the yearly average number of citations to recent articles published in that journal. It is frequently used as a proxy for the relative importance of a journal within its field; journals with higher impact factors are often deemed to be more important than those with lower ones. The impact factor was devised by Eugene Garfield, the founder of the Institute for Scientific Information (ISI).
+
+=== Science Citation Index ===
+
+The Science Citation Index (SCI) is a citation index originally produced by the Institute for Scientific Information (ISI) and created by Eugene Garfield. It was officially launched in 1964. It is now owned by Clarivate Analytics (previously the Intellectual Property and Science business of Thomson Reuters). The larger version (Science Citation Index Expanded) covers more than 8,500 notable and significant journals, across 150 disciplines, from 1900 to the present. These are alternatively described as the world's leading journals of science and technology, because of a rigorous selection process.
+
+=== Acknowledgment index ===
--- a/data/en.wikipedia.org/wiki/Scientometrics-1.md
+++ b/data/en.wikipedia.org/wiki/Scientometrics-1.md
@ -0,0 +1,37 @@
+---
+title: "Scientometrics"
+chunk: 2/2
+source: "https://en.wikipedia.org/wiki/Scientometrics"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:15:03.468548+00:00"
+instance: "kb-cron"
+---
+
+An acknowledgment index (British acknowledgement index) is a method for indexing and analyzing acknowledgments in the scientific literature and, thus, quantifies the impact of acknowledgments. Typically, a scholarly article has a section in which the authors acknowledge entities such as funding, technical staff, colleagues, etc. that have contributed materials or knowledge or have influenced or inspired their work. Like a citation index, it measures influences on scientific work, but in a different sense; it measures institutional and economic influences as well as informal influences of individual people, ideas, and artifacts.
+Unlike the impact factor, it does not produce a single overall metric, but analyzes the components separately. However, the total number of acknowledgments to an acknowledged entity can be measured and so can the number of citations to the papers in which the acknowledgment appears. The ratio of this total number of citations to the total number of papers in which the acknowledge entity appears can be construed as the impact of that acknowledged entity.
+
+== Altmetrics ==
+
+In scholarly and scientific publishing, altmetrics are nontraditional bibliometrics proposed as an alternative or complement to more traditional citation impact metrics, such as impact factor and h-index. The term altmetrics was proposed in 2010, as a generalization of article level metrics, and has its roots in the #altmetrics hashtag. Although altmetrics are often thought of as metrics about articles, they can be applied to people, journals, books, data sets, presentations, videos, source code repositories, web pages, etc. Altmetrics use public APIs across platforms to gather data with open scripts and algorithms. Altmetrics did not originally cover citation counts, but calculate scholar impact based on diverse online research output, such as social media, online news media, online reference managers and so on. It demonstrates both the impact and the detailed composition of the impact. Altmetrics could be applied to research filter, promotion and tenure dossiers, grant applications and for ranking newly published articles in academic search engines.
+
+== Criticisms ==
+Critics have argued that overreliance on scientometrics has created a publish or perish environment with perverse incentives that lead to low-quality research.
+
+== In popular culture ==
+The main character in Michael Frayn's  novel Skios is a Professor of Scientometrics.
+
+== See also ==
+
+=== Journals ===
+Scientometrics
+Journal of the American Society for Information Science and Technology
+Journal of Informetrics
+
+== References and footnotes ==
+
+== External links ==
+Harnad, S. (2009). "Open Access Scientometrics and the UK Research Assessment Exercise". Scientometrics. 79 (1): 147–156. arXiv:cs/0703131. CiteSeerX 10.1.1.561.7204. doi:10.1007/s11192-009-0409-z. S2CID 3183215.
+Harnad, S (2008). "Validating Research Performance Metrics Against Peer Rankings". Ethics in Science and Environmental Politics. 8 (11): 103–107. doi:10.3354/esep00088.
+The Places & Spaces: Mapping Science exhibit at the American Museum of Science and Energy, September 7, 2007 – January 7, 2008.
+Over-optimization of academic publishing metrics: observing Goodhart's Law in action GigaScience, Volume 8, Issue 6, June 2019
--- a/data/en.wikipedia.org/wiki/Statcheck-0.md
+++ b/data/en.wikipedia.org/wiki/Statcheck-0.md
@ -0,0 +1,36 @@
+---
+title: "Statcheck"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Statcheck"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:15:04.692667+00:00"
+instance: "kb-cron"
+---
+
+Statcheck is an R package designed to detect statistical errors in peer-reviewed psychology articles by searching papers for statistical results, redoing the calculations described in each paper, and comparing the two values to see if they match. It takes advantage of the fact that psychological research papers tend to report their results in accordance with the guidelines published by the American Psychological Association (APA). This leads to several disadvantages: it can only detect results reported completely and in exact accordance with the APA's guidelines, and it cannot detect statistics that are only included in tables in the paper. Another limitation is that Statcheck cannot deal with statistical corrections to test statistics, like Greenhouse–Geisser or Bonferroni corrections, which actually make tests more conservative. Some journals have begun piloting Statcheck as part of their peer review process. Statcheck is free software published under the GNU GPL v3.
+
+
+== Validity ==
+In 2017, Statcheck's developers published a preprint paper concluding that the program accurately identified statistical errors over 95% of the time. This validity study comprised more than 1,000 hand-checked tests among which 5.00% turned out to be inconsistent. The study found that Statcheck recognized 60% of all statistical tests.  A reanalysis of these data found that if the program flagged a test as inconsistent, the decision was correct in 60.4% of cases. Reversely, if a test was truly inconsistent, Statcheck flagged it in an estimated 51.8% of cases (this estimate included the undetected tests and assumed that they had the same rate of inconsistencies as the detected tests). Overall, Statcheck's accuracy was 95.9%, half a percentage point higher than the chance level of 95.4% expected when all tests are simply taken at face value. Statcheck was conservatively biased (by about one standard deviation) against flagging tests.
+More recent research has used Statcheck on papers published in Canadian psychology journals, finding similar rates of statistical reporting errors as the original authors based on a 30-year sample of such articles. The same study also found many typographical errors in online versions of relatively old papers, and that correcting for these reduced the estimated percent of tests that were erroneously reported.
+
+
+== History ==
+Statcheck was first developed in 2015 by Michele Nuijten of Tilburg University and Sacha Epskamp of the University of Amsterdam. Later that year, Nuijten and her colleagues published a paper using Statcheck on over 30,000 psychology papers and reported that "half of all published psychology papers [...] contained at least one p-value that was inconsistent with its test". The study was subsequently written up favorably in Nature. In 2016, Nuijten and Epskamp both received the Leamer-Rosenthal Prize for Open Social Science from the Berkeley Initiative for Transparency in the Social Sciences for creating Statcheck.
+In 2016, Tilburg University researcher Chris Hartgerink used Statcheck to scan over 50,000 psychology papers and posted the results to PubPeer; they subsequently published the data they extracted from these papers in an article in the journal Data. Hartgerink told Motherboard that "We're checking how reliable is the actual science being presented by science". They also told Vox that they intended to use Statcheck to perform a function similar to a spell checker software program. Hartgerink's action also sent email alerts to every researcher who had authored or co-authored a paper that it had flagged. These flaggings, and their posting on a public forum, proved controversial, prompting the German Psychological Society to issue a statement condemning this use of Statcheck. Psychologist Dorothy V.M. Bishop, who had two of her own papers flagged by Statcheck, criticized the program for publicly flagging many papers (including one of her own) despite not having found any statistical errors in it. Other critics alleged that Statcheck had reported the presence of errors in papers that did not actually contain them, due to the tool's failure to correctly read statistics from certain papers.
+Journals that have begun piloting the use of Statcheck as part of their peer review process include Psychological Science, the Canadian Journal of Human Sexuality, and the Journal of Experimental Social Psychology. The open access publisher PsychOpen has also used it on all papers accepted for publication in their journals since 2017.
+
+
+== See also ==
+Abuse of statistics
+Misuse of p-values
+Metascience
+
+
+== References ==
+
+
+== External links ==
+Official website
+statcheck.io
--- a/data/en.wikipedia.org/wiki/Why_Most_Published_Research_Findings_Are_False-0.md
+++ b/data/en.wikipedia.org/wiki/Why_Most_Published_Research_Findings_Are_False-0.md
@ -0,0 +1,399 @@
+---
+title: "Why Most Published Research Findings Are False"
+chunk: 1/2
+source: "https://en.wikipedia.org/wiki/Why_Most_Published_Research_Findings_Are_False"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:15:05.983985+00:00"
+instance: "kb-cron"
+---
+
+"Why Most Published Research Findings Are False" is a 2005 essay written by John Ioannidis, a professor at the Stanford School of Medicine, and published in PLOS Medicine. It is considered foundational to the field of metascience.
+In the paper, Ioannidis argued that a large number, if not the majority, of published medical research papers contain results that cannot be replicated. In simple terms, the essay states that scientists use hypothesis testing to determine whether scientific discoveries are significant. Statistical significance is formalized in terms of probability, with its p-value measure being reported in the scientific literature as a screening mechanism. Ioannidis posited assumptions about the way people perform and report these tests; then he constructed a statistical model which indicates that most published findings are likely false positive results.
+While the general arguments in the paper recommending reforms in scientific research methodology were well-received, Ionnidis received criticism for the validity of his model and his claim that the majority of scientific findings are false. Responses to the paper suggest lower false positive and false negative rates than what Ioannidis puts forth.
+
+== Argument ==
+Suppose that in a given scientific field there is a known baseline probability that a result is true, denoted by 
+  
+    
+      
+        
+          P
+        
+        (
+        
+          True
+        
+        )
+      
+    
+    {\displaystyle \mathbb {P} ({\text{True}})}
+  
+. When a study is conducted, the probability that a positive result is obtained is 
+  
+    
+      
+        
+          P
+        
+        (
+        +
+        )
+      
+    
+    {\displaystyle \mathbb {P} (+)}
+  
+. Given these two factors, we want to compute the conditional probability 
+  
+    
+      
+        
+          P
+        
+        (
+        
+          True
+        
+        ∣
+        +
+        )
+      
+    
+    {\displaystyle \mathbb {P} ({\text{True}}\mid +)}
+  
+, which is known as the positive predictive value (PPV). Bayes' theorem allows us to compute the PPV as:
+  
+    
+      
+        
+          P
+        
+        (
+        
+          True
+        
+        ∣
+        +
+        )
+        =
+        
+          
+            
+              (
+              1
+              −
+              β
+              )
+              
+                P
+              
+              (
+              
+                True
+              
+              )
+            
+            
+              (
+              1
+              −
+              β
+              )
+              
+                P
+              
+              (
+              
+                True
+              
+              )
+              +
+              α
+              
+                [
+                
+                  1
+                  −
+                  
+                    P
+                  
+                  (
+                  
+                    True
+                  
+                  )
+                
+                ]
+              
+            
+          
+        
+      
+    
+    {\displaystyle \mathbb {P} ({\text{True}}\mid +)={(1-\beta )\mathbb {P} ({\text{True}}) \over {(1-\beta )\mathbb {P} ({\text{True}})+\alpha \left[1-\mathbb {P} ({\text{True}})\right]}}}
+  
+where 
+  
+    
+      
+        α
+      
+    
+    {\displaystyle \alpha }
+  
+ is the type I error rate (false positives) and 
+  
+    
+      
+        β
+      
+    
+    {\displaystyle \beta }
+  
+ is the type II error rate (false negatives); the statistical power is 
+  
+    
+      
+        1
+        −
+        β
+      
+    
+    {\displaystyle 1-\beta }
+  
+. It is customary in most scientific research to desire 
+  
+    
+      
+        α
+        =
+        0.05
+      
+    
+    {\displaystyle \alpha =0.05}
+  
+ and 
+  
+    
+      
+        β
+        =
+        0.2
+      
+    
+    {\displaystyle \beta =0.2}
+  
+. If we assume 
+  
+    
+      
+        
+          P
+        
+        (
+        
+          True
+        
+        )
+        =
+        0.1
+      
+    
+    {\displaystyle \mathbb {P} ({\text{True}})=0.1}
+  
+ for a given scientific field, then we may compute the PPV for different values of 
+  
+    
+      
+        α
+      
+    
+    {\displaystyle \alpha }
+  
+ and 
+  
+    
+      
+        β
+      
+    
+    {\displaystyle \beta }
+  
+:
+
+However, the simple formula for PPV derived from Bayes' theorem does not account for bias in study design or reporting. Some published findings would not have been presented as research findings if not for researcher bias. Let 
+  
+    
+      
+        u
+        ∈
+        [
+        0
+        ,
+        1
+        ]
+      
+    
+    {\displaystyle u\in [0,1]}
+  
+ be the probability that an analysis was only published due to researcher bias. Then the PPV is given by the more general expression:
+  
+    
+      
+        
+          P
+        
+        (
+        
+          True
+        
+        ∣
+        +
+        )
+        =
+        
+          
+            
+              
+                [
+                
+                  1
+                  −
+                  (
+                  1
+                  −
+                  u
+                  )
+                  β
+                
+                ]
+              
+              
+                P
+              
+              (
+              
+                True
+              
+              )
+            
+            
+              
+                [
+                
+                  1
+                  −
+                  (
+                  1
+                  −
+                  u
+                  )
+                  β
+                
+                ]
+              
+              
+                P
+              
+              (
+              
+                True
+              
+              )
+              +
+              
+                [
+                
+                  (
+                  1
+                  −
+                  u
+                  )
+                  α
+                  +
+                  u
+                
+                ]
+              
+              
+                [
+                
+                  1
+                  −
+                  
+                    P
+                  
+                  (
+                  
+                    True
+                  
+                  )
+                
+                ]
+              
+            
+          
+        
+      
+    
+    {\displaystyle \mathbb {P} ({\text{True}}\mid +)={\left[1-(1-u)\beta \right]\mathbb {P} ({\text{True}}) \over {\left[1-(1-u)\beta \right]\mathbb {P} ({\text{True}})+\left[(1-u)\alpha +u\right]\left[1-\mathbb {P} ({\text{True}})\right]}}}
+  
+The introduction of bias will tend to depress the PPV; in the extreme case when the bias of a study is maximized, 
+  
+    
+      
+        
+          P
+        
+        (
+        
+          True
+        
+        ∣
+        +
+        )
+        =
+        
+          P
+        
+        (
+        
+          True
+        
+        )
+      
+    
+    {\displaystyle \mathbb {P} ({\text{True}}\mid +)=\mathbb {P} ({\text{True}})}
+  
+. Even if a study meets the benchmark requirements for 
+  
+    
+      
+        α
+      
+    
+    {\displaystyle \alpha }
+  
+ and 
+  
+    
+      
+        β
+      
+    
+    {\displaystyle \beta }
+  
+, and is free of bias, there is still a 36% probability that a paper reporting a positive result will be incorrect; if the base probability of a true result is lower, then this will push the PPV lower too. Furthermore, there is strong evidence that the average statistical power of a study in many scientific fields is well below the benchmark level of 0.8.
+Given the realities of bias, low statistical power, and a small number of true hypotheses, Ioannidis concludes that the majority of studies in a variety of scientific fields are likely to report results that are false.
+
+=== Corollaries ===
+In addition to the main result, Ioannidis lists six corollaries for factors that can influence the reliability of published research.
+Research findings in a scientific field are less likely to be true, 
+
+the smaller the studies conducted.
+the smaller the effect sizes.
+the greater the number and the lesser the selection of tested relationships.
+the greater the flexibility in designs, definitions, outcomes, and analytical modes.
+the greater the financial and other interests and prejudices.
+the hotter the scientific field (with more scientific teams involved).
+Ioannidis has added to this work by contributing to a meta-epidemiological study which found that only 1 in 20 interventions tested in Cochrane Reviews have benefits that are supported by high-quality evidence. He also contributed to research suggesting that the quality of this evidence does not seem to improve over time.
--- a/data/en.wikipedia.org/wiki/Why_Most_Published_Research_Findings_Are_False-1.md
+++ b/data/en.wikipedia.org/wiki/Why_Most_Published_Research_Findings_Are_False-1.md
@ -0,0 +1,37 @@
+---
+title: "Why Most Published Research Findings Are False"
+chunk: 2/2
+source: "https://en.wikipedia.org/wiki/Why_Most_Published_Research_Findings_Are_False"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:15:05.983985+00:00"
+instance: "kb-cron"
+---
+
+== Reception ==
+Despite skepticism about extreme statements made in the paper, Ioannidis's broader argument and warnings have been accepted by a large number of researchers. The growth of metascience and the recognition of a scientific replication crisis have bolstered the paper's credibility, and led to calls for methodological reforms in scientific research.
+In commentaries and technical responses, statisticians Goodman and Greenland identified several weaknesses in Ioannidis' model.  Ioannidis's use of dramatic and exaggerated language that he "proved" that most research findings' claims are false and that "most research findings are false for most research designs and for most fields" [italics added] was rejected, and yet they agreed with his paper's conclusions and recommendations. 
+Biostatisticians Jager and Leek criticized the model as being based on justifiable but arbitrary assumptions rather than empirical data, and did an investigation of their own which calculated that the false positive rate in biomedical studies was estimated to be around 14%, not over 50% as Ioannidis asserted. Their paper was published in a 2014 special edition of the journal Biostatistics along with extended, supporting critiques from other statisticians. Leek summarized the key points of agreement as: when talking about the science-wise false discovery rate one has to bring data; there are different frameworks for estimating the science-wise false discovery rate; and "it is pretty unlikely that most published research is false", but that probably varies by one's definition of "most" and "false".  
+Statistician Ulrich Schimmack reinforced the importance of the empirical basis for models by noting the reported false discovery rate in some scientific fields is not the actual discovery rate because non-significant results are rarely reported. Ioannidis's theoretical model fails to account for that, but when a statistical method ("z-curve") to estimate the number of unpublished non-significant results is applied to two examples, the false positive rate is between 8% and 17%, not greater than 50%.
+
+== Causes of high false positive rate ==
+Despite these weaknesses there is nonetheless general agreement with the problem and recommendations Ioannidis discusses, yet his tone has been described as "dramatic" and "alarmingly misleading", which runs the risk of making people unnecessarily skeptical or cynical about science.
+A lasting impact of this work has been awareness of the underlying drivers of the high false positive rate in clinical medicine and biomedical research, and efforts by journals and scientists to mitigate them. Ioannidis restated these drivers in 2016 as being: 
+
+Solo, siloed investigator limited to small sample sizes
+No preregistration of hypotheses being tested
+Post-hoc cherry picking of hypotheses with best P values
+Only requiring P < .05
+No replication
+No data sharing
+
+== References ==
+
+== Further reading ==
+Carnegie Mellon University, Statistics Journal Club: Summary and discussion of: "Why Most Published Research Findings Are False"
+Applications to Economics: De Long, J. Bradford; Lang, Kevin. "Are all Economic Hypotheses False?" Journal of Political Economy. 100 (6): 1257–1272, 1992
+Applications to Social Sciences: Hardwicke, Tom E.; Wallach, Joshua D.; Kidwell, Mallory C.; Bendixen, Theiss; Crüwell Sophia and Ioannidis, John P. A.  "An empirical assessment of transparency and reproducibility-related research practices in the social sciences (2014–2017)." Royal Society Open Science. 7: 190806, 2020.
+
+== External links ==
+YouTube video(s) from the Berkeley Initiative for Transparency in the Social Sciences, 2016, "Why Most Published Research Findings are False" (Part I, Part II, Part III)
+YouTube video of John Ioannidis at Talks at Google, 2014 "Reproducible Research: True or False?"
--- a/data/en.wikipedia.org/wiki/Yuasa_Phenomenon-0.md
+++ b/data/en.wikipedia.org/wiki/Yuasa_Phenomenon-0.md
@ -0,0 +1,26 @@
+---
+title: "Yuasa Phenomenon"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Yuasa_Phenomenon"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:15:07.343246+00:00"
+instance: "kb-cron"
+---
+
+The Yuasa Phenomenon, named after Japanese physicist and science historian Mitsutomo Yuasa (sometimes referred to as Mintomo Yuasa), suggests that, in the modern era, the world center of scientific activity (defined as producing more than 25% of the world's scientific achievements) moves from one country to another about every 80–100 years.
+
+
+== Analysis ==
+Analyzed data indicate that the "modern world science centre has shifted from Italy (1504–1610) to the United Kingdom (1660–1750), to France (1760– 1840), to Germany (1875–1920), and to the United States (1920 to the present)."
+This phenomenon and its study methodology are an emerging scientometrics study area. Indicators  point to China's rise as a world center of scientific activity. This phenomenon is also described by other names, including "the Bernal—Yuasa phenomenon".
+Shigeo Minowa links Yuasa's finding to Joseph Ben-David's movements of Centers of Learning.
+Ben-David's Centers of Learning migration observations are discussed in various works.
+
+
+== See also ==
+Historic recurrence
+History of science
+
+
+== References ==