Scrape wikipedia-science: 74 new, 846 updated, 948 total (kb-cron)

2026-05-04 20:49:15 -07:00 · 2026-05-04 20:49:15 -07:00 · 4a921fdd56
commit 4a921fdd56
parent 0ade08bba7
29 changed files with 802 additions and 1 deletions
--- a/_index.db
+++ b/_index.db
--- a/data/en.wikipedia.org/wiki/Aled_Edwards-0.md
+++ b/data/en.wikipedia.org/wiki/Aled_Edwards-0.md
@ -0,0 +1,55 @@
+---
+title: "Aled Edwards"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Aled_Edwards"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:49:06.519735+00:00"
+instance: "kb-cron"
+---
+
+Aled Morgan Edwards   (born June 1, 1962) is the founder and Chief Executive of the Structural Genomics Consortium, a charitable public-private partnership. He is Professor of Medical Genetics and Medical Biophysics at the University of Toronto, Visiting Professor of Chemical Biology at the University of Oxford, and adjunct professor at McGill University.
+
+
+== Early life ==
+Born in Holyhead, Wales, Edwards moved to Canada in 1965 with his parents Undeg and Iwan Edwards, a choral conductor who was awarded the Order of Canada in 1995 for his contributions to Canadian music.
+
+
+== Education ==
+Edwards earned his bachelor's degree (1983) and his Ph.D. (1988) in biochemistry from McGill University supervised by Peter Braun.  He carried out post-doctoral studies at Stanford University in the laboratory of Roger Kornberg, where he first crystallized RNA polymerase II, a structure for which Kornberg was awarded the Nobel Prize in Chemistry in 2006.
+
+
+== Research contributions ==
+From 1992 to 1997, while a professor at McMaster University, Edwards became interested in developing structural biology methods and was among the first to use mass spectrometry to identify regions of proteins prone to crystallization.  He used this technique to facilitate the crystallography of key proteins involved in DNA replication and repair before becoming interested in applying this and other methods to carry out structural biology on a proteome scale.
+In 1997, now at the University of Toronto, Edwards, together with his colleague Cheryl Arrowsmith, collaborated to launch one of the first projects in structural genomics, and soon published one of the papers that defined this new field. As a central player in the Protein Structure Initiative, their Toronto team contributed to more than a thousand new microbial protein structures over the next decade, developed new crystallization methods, and used structural methods to de-orphanize several nuclear receptors and study ion transport across membranes.
+As of 2019, Edwards has co-contributed more datasets to the Protein Data Bank than any other scientist.
+
+
+== Business activities ==
+
+In the late 1990s, Edwards co-founded and served as CEO of Borealis Biosciences and Chalon Biotech, which he merged to form Affinium Pharmaceuticals, a Toronto-based company. Affinium developed afabicin, a novel narrow-spectrum antibiotic that is now in clinical development at DebioPharm.  Several other companies have been spun out of his research programs, including Harbinger Biotechnology and Engineering, which was acquired by Epiphyte3, and 1DegreeBio, which was acquired by LabX Media Group.  
+More recently, Edwards founded M4K Pharma, a company that is developing a brain-penetrant drug targeting the ALK2 kinase, in order to treat children with incurable diffuse-intrinsic pontine glioma (DIPG).  The novel open science business model being developed by M4K Pharma allows its science to be disclosed on an ongoing basis, and for any approved drug to be priced affordably. Edwards serves on the Board of the Agora Open Science Trust.
+
+
+== Open science and science policy ==
+
+
+=== Open Science ===
+Edwards is considered one of the pioneers of open science, particularly as it applied to biomedicine and drug discovery.  Since 2003, all human protein structural information derived from the SGC has been placed into the public domain, prior to publication and without restriction on use.  In 2007, he, together with Richard Gold, created the SGC Open Science Principles, under which the SGC became the first biomedical research organization to adopt open science principles that mandated sharing and eschewed patenting on any activity, including novel chemistry. In 2016, Edwards both spearheaded the Open Lab Notebook initiative, which now comprises over 20 scientists sharing their experiments as they are done, as well as collaborated with Guy Rouleau at the Montreal Neurological Institute (The Neuro) to create the concept of an open research institute. That collaboration led to the formation of the Tanenbaum Open Science Institute, and to the broader institutional commitment of The Neuro to open science. In 2017, Edwards and colleagues conceptualized an open trust mechanism to share research reagents, and the SGC has been using this mechanism since. Edwards also helped launch YCharOS Inc., an open science antibody characterization agency. In 2018, Edwards, Max Morgan and Owen Roberts launched the world's first open science drug discovery company, M4K (Meds for Kids) Pharma, and developed a business model that is consistent with open science and affordable pricing. In 2019, Morgan and Edwards launched M4ND Pharma, to tackle neurological diseases using open drug discovery approaches.  For his leadership role in promoting open access drug discovery, Edwards was named a Senior Ashoka Fellow in 2015.
+
+
+=== Science and innovation policy ===
+On the 10th anniversary of the publication of the draft sequence of the human genome, Edwards and colleagues were asked for their perspective. Their "Roads not Taken" paper, which quantified the "under-studied" parts of the human genome, has led to a number of funding initiatives, including the Illuminating the Druggable Genome initiative at the NIH, and presaged a number of studies that examine the unintended consequences of the peer-review system.
+Edwards is a champion of science reproducibility, focusing considerable attention on the quality of research reagents and the need for transparent standards. 
+Edwards has also contributed to science communication; he served as the scientific advisor for the Gemini Award-winning television series ReGenesis.
+
+
+=== Awards ===
+Dr Edwards was recently named as an Officer of the Order of Canada for "advancing Canada's global reputation as a leader in open science research by founding the groundbreaking Structural Genomics Consortium." He was elected a Fellow of the Royal Society in 2024. 
+
+
+== Personal ==
+Edwards has been married to Elizabeth Edwards since 1985, and they have three children and five grandchildren.  Elizabeth is director of BioZone and Professor of Chemical Engineering and Applied Chemistry at the University of Toronto, an Officer of the Order of Canada, and winner of Canada's 2016 Killam Prize for Engineering, among other awards.  She is the daughter of Leonhard and Jeanne Wolfe, who was awarded the Order of Canada in 2009 for her contributions to Canadian planning. Aled's brother, Owain Edwards, is an entomologist at CSIRO in Australia.
+
+
+== References ==
--- a/data/en.wikipedia.org/wiki/Dataverse-0.md
+++ b/data/en.wikipedia.org/wiki/Dataverse-0.md
@ -0,0 +1,72 @@
+---
+title: "Dataverse"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Dataverse"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:49:03.930930+00:00"
+instance: "kb-cron"
+---
+
+The Dataverse is an open source web application to share, preserve, cite, explore and analyze research data. Researchers, data authors, publishers, data distributors, and affiliated institutions all receive appropriate credit via a data citation with a persistent identifier (e.g., DOI, or handle).
+A Dataverse repository hosts multiple dataverses. Each dataverse contains dataset(s) or other dataverses, and each dataset contains descriptive metadata and data files (including documentation and code that accompany the data). 
+In 2019, Dataverse won the Duke's Choice Award for university and higher education.
+
+
+== Background ==
+The Dataverse Project is housed and developed by the Dataverse Team at the Institute for Quantitative Social Science (IQSS) at Harvard University. Coding of the Dataverse (previously known as Dataverse Network) software began in 2006 under the leadership of Mercè Crosas and Gary King. The earlier Virtual Data Center (VDC) project, which spanned 1999-2006, was organized by Micah Altman, Gary King, and Sidney Verba as a collaboration between the Harvard-MIT Data Center (now part of IQSS) and the Harvard University Library. Precursors to the VDC date to 1987, comprising such entities as a stand-alone software guide to local data, preweb software, and tools to transfer cataloging information by FTP to other sites across campus automatically at designated times.
+
+
+== Installations ==
+
+
+=== Harvard Dataverse ===
+A collaboration with the Institute for Quantitative Social Science (IQSS), the Harvard Library, and Harvard University Information Technology (HUIT): the Harvard Dataverse is a repository for sharing, citing, analyzing, and preserving research data. It is open to all scientific data from all disciplines worldwide.
+
+
+=== Dataverse in Europe ===
+Dataverse is also installed in the countries of the European Union to preserve data collected by research communities of Netherlands, Germany, France and Finland. The largest Dataverse repository is called DataverseNL and located in the Netherlands providing data management services for 11 Dutch Universities. A similar service is established in Norway (cf. DataverseNO).
+
+
+=== Dataverse in Canada ===
+In Canada, Borealis is a national instance of the Dataverse repository hosted by OCUL's Scholars Portal at the University of Toronto. Borealis allows institutions to offer a Dataverse service without operating and maintaining the software themselves. Most academic institutions offering a Dataverse service in Canada subscribe to the Borealis service. The associated community of practice is organized through the Digital Research Alliance of Canada's Network of Experts via the Dataverse North Expert Group, a coordination, collaboration and communication instance.
+
+
+=== Dataverse installations around the world ===
+There are several other Dataverse repositories installed in Universities and organizations around the world. Here is a list of some Dataverse repositories:
+
+The Austrian Social Science Data Archive (AUSSDA)
+Odum Institute
+Dutch Universities (dataverse.nl operated by DANS)
+Fudan University
+University of Alberta Libraries
+Department of Cross Cultural and Regional Studies, University of Copenhagen (ToRS)
+ABACUS - British Columbia Research Libraries' Data Services
+Borealis, the Canadian Dataverse Repository - Scholars Portal - Ontario Council of University Libraries (OCUL)
+HeiDATA - Heidelberg University
+DataverseNO (Norwegian universities)
+CIRAD Dataverse (France)
+DataSuds (France)
+The Australian Data Archive
+Florida International University (Research Data Portal)
+
+
+== APIs and interoperability ==
+The Dataverse currently has multiple open APIs available, which allow for searching, depositing and accessing data.
+
+
+== Alternatives and similar projects ==
+DSpace is often compared with Dataverse and is used for storing scientific data. CKAN provides similar functions and is widely used for open data.
+
+
+== See also ==
+Data citation
+Data sharing
+
+
+== References ==
+
+
+== External links ==
+Official website 
+dataverse on GitHub
--- a/data/en.wikipedia.org/wiki/Economics_of_open_science-0.md
+++ b/data/en.wikipedia.org/wiki/Economics_of_open_science-0.md
@ -0,0 +1,23 @@
+---
+title: "Economics of open science"
+chunk: 1/15
+source: "https://en.wikipedia.org/wiki/Economics_of_open_science"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:49:05.253185+00:00"
+instance: "kb-cron"
+---
+
+The economics of open science describe the economic aspects of making a wide range of scientific outputs (e.g., publications, data, software) to all levels of society.
+Open science involves a plurality of economic models and goods. Historically, academic journals and other academic institutions, such as learned societies, have favored a knowledge club or a toll access model: publications are managed as a community service for the selected benefit of academic readers and authors. During the second half of the 20th century, the "big 5" largest publishers (Elsevier, Springer, Wiley, Taylor & Francis, and the American Chemical Society) have partly absorbed or outcompeted non-profits structure and applied an industrial approach to scholarly publishing.
+The development of the web shifted the focus of scholarly communication from publication to a large variety of outputs (data, software, metrics). It also challenged the values and the organization of existing actors with the development of an international initiatives in favor of open access and open science. While initially distanced by new competitors, the main commercial publishers have started to flip to author-pay models after 2000, funded through article processing charges and the negotiation of transformative deals. Actors like Elsevier or Wiley have diversified their activities from journal ownership to data analytics by developing a vertical integration of tools, database and metrics monitoring academic activities. The structuration of a global open science movement, the enlargement of scientific readership beyond professional researchers and increasing concerns for the sustainability of key infrastructures has enabled the development of open science commons. Journals, platforms, infrastructures and repositories have been increasingly structured around a shared ecosystem of services and self-governance principles.
+The costs and benefits of open science are difficult to assess due to the coexistence of several economic models and the untraceability of open diffusion. Open publishing is less costly overall than subscription models, on account of reduced externalities and economies of scale. Yet the conversion of leading publishers to open science has entailed a significant increased in article processing charges, as the prestige of well-known journals make it possible to extract a high consent to pay. Open science brings significant efficiency gain to academic research, especially regarding bibliographic and data search, identification of previous findings and text and data mining projects. Theses benefits extend to non-academic research, as open access to data and publications eases the development of new commercial services and products. Although the overall economic and social impact of open science could be high, it has been hardly estimated.
+The development of open science has created new forms of economic regulations of scientific publishing, as funders and institutions has come to acknowledged that this sector no longer operated in normal market conditions. International coordinations like the cOAlitionS attempt to set up global rules and norms on to manage the transition to open science.
+
+== Economic models ==
+Debates on the economic theory of open science have been largely influenced by the classic typology of economic goods between Private goods, Public goods, Club goods and Common-pool resources. According to a common definition matrix gradually developed by Paul Samuelson, Ricard Musgrave and Elinor Ostrom, private goods and club goods are exclusive (they cannot be freely shared and are exclusively used by owners or members), while private goods and common goods are rivalrous (they cannot be consumed simultaneously).
+In theory, the outputs of open science could be defined as public goods: they are not exclusive (free-licensed publications, data or software can be shared without restriction) and they are not substractive (they can be indefinitely copied). In 2017 an OECD report underlined that research data "exhibit public good characteristics" as "it is not exhausted in consumption (i.e. it can be consumed many times without being diminished), and it may be inefficient to exclude potential users". For Elinor Ostrom and Charlotte Hess this approach does not fit with the actual uses and constraints of knowledge online. Like shared natural resources, the outputs of open science can be polluted, exhausted or enclosed: "The parallel, yet contradictory trends, where, on the one hand, there is unprecedented access to information through the Internet but where, on the other, there are ever-greater restrictions on access (...) indicate the deep and perplexing characteristics of this resource". Additionally, in contrast to other forms of knowledge commons, open science actors continue to enforce exclusion rules for the creation, curation and administration of resources: "the scientific and scholarly commons furnishes information input into a scientific discovery but the Mertonian norms of priority award the property rights in the claim to whoever is first to publish."
+The leading definitions of open access and open science are sufficiently ambiguous to allow for a plurality of allocation systems: "open access is a boundary object that does not refer to a common set of practices, assumptions or principles." Consequently, uses and models of open science can span the entire typology of economic goods:
+
+The coexistence of the differing economic models of open science remains an evolving process. Competing narratives of the future of open access involve all the potential axis of open science goods: they include the disruption of legacy scientific publisher by new competitors, the transformation of private scientific goods into public goods and the rehabilitation of community-led governance. Nikos Koutras has argued for a structural inflexion of the role of commercial publishers, which would act more as editorial service than gatekeeper, as "it is feasible for authors to not rely on [them]".
+Models of open science are embedded into wider socio-economic structures. North-South inequalities remain a major structural factor, that affect not only the access and use of open science output, but also the way the discourses and representations on open science.
--- a/data/en.wikipedia.org/wiki/Economics_of_open_science-1.md
+++ b/data/en.wikipedia.org/wiki/Economics_of_open_science-1.md
@ -0,0 +1,20 @@
+---
+title: "Economics of open science"
+chunk: 2/15
+source: "https://en.wikipedia.org/wiki/Economics_of_open_science"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:49:05.253185+00:00"
+instance: "kb-cron"
+---
+
+=== Open science club ===
+The economic theory of club goods was originally developed in the 1965 by James Buchanan to complement the distinction between private and public goods. While clubs are private organizations they also manage the allocation of the resources between the individual members, in a similar manner to a public service. Membership criteria are a fundamental feature of clubs and affect their efficiency: "The central question in a theory of clubs is that of determining the membership margin, so to speak, the size of the most desirable cost and consumption sharing arrangement."
+
+==== Definitions of knowledge club ====
+
+Before the Second World War, academic publishing was mostly characterized by a wide range of community-driven  scholarly structures with little concerns for profitability. They relied on informal community norms rather than commercial regulations. Theses structures have been described as knowledge clubs: "until the second part of the twentieth century, most journals could be assimilated to a club model".
+While managed by a community and publicly available, knowledge club are demic and mostly used to the benefit of their members. As a defining feature of the club model, scientific authors are not paid for their publications: "ever since the first scientific journals were founded in 1665 in London and Paris, journals have not paid authors for articles." Acknowledgment and recognition by the relevant community of peers is the main incentive: "intangible rewards (made nearly tangible in tenure and promotion) compensate scholars for relinquishing royalties on their journal articles".
+Users and consumers of club goods are basically the same population as the core readers of scientific journals are also their core contributors: "the set of potential producers and the set of incumbent consumers are the same set". Determination of relevant membership and exclusion criteria plays a fundamental role in the management of the club. In contrast with other forms of clubs (such as Health clubs), membership criteria of knowledge clubs are not enforced strictly but stem from widespread conventions: it "happens quite naturally (i.e. culturally) in scholarly knowledge clubs by simple cost of access in time and language." As there are no formal process of adhesion, knowledge club can be joined by non-reliable members, so long as they are willing to devote the necessary time to demonstrate they adhere to common cultural values and customs: "Hostile pranks, such as the Sokal hoax/fraud, demonstrate that clubs may be hoodwinked by outsiders who apparently 'speak their language' but are in fact using it to challenge their knowledge."
+The concept of knowledge club has highlighted the continuities between scientific publications and other form of restrictive associations. Journals are strongly embedded in wider institutional networks and communities and cannot be dissociated from it: "More specialized journals appeared in the 18th and 19th centuries, most of which were published by learned societies. Only at the end of the 19th century did university presses gain importance as publishers of scholarly journals". While in their daily management knowledge clubs are not strictly separated from other economic actors, the interests of the community takes precedence over any other economic incentive: "we see a journal as a club in which access to these services is internalised as a membership benefit. While the services might still be outsourced, in practice it can be seen that such a shift potentially has substantial political and economic consequences as to how we see the relations among players." As adhesion to the club is not exclusionary, researchers are usually part of a complex network of clubs: "Membership of an academic institution with the relevant benefits, including access to subscription content, is another parallel club (...) Further work will be needed to define those situations which are better analysed as complex clubs, with differential membership contributions, and those situations where multiple clubs are interacting."
+Community-lead journals have been progressively acquired or outcompeted by large international publishers after the Second World War: "The small society presses, struggling to cope with growing scale, were supported and then largely supplanted by the 'Big 5' commercial presses". While the knowledge club has receded, some of its conventions have persisted: "academic journals have retained their club-like qualities through blind peer review (even more through open review), and via editorial boards that are carefully constructed to 'send the right signals' in order to build prestige and quality assurance". The evaluation of scientific journal remained largely performed as a community service, with researchers submitting peer-reviews for free. Journals continued to be officially managed by editorial committee, although in a context of ownership by a large industrial structure, their authority and their ability to set the policy of the publication is limited. Both the authors and the audience of academic publication have primarily non-commercial incitations: "When publishing articles in academic journals, most scholars are predominantly motivated by curiosity, priority and the expected gain in reputation, and much less so by any monetary rewards for the actual publications."
--- a/data/en.wikipedia.org/wiki/Economics_of_open_science-10.md
+++ b/data/en.wikipedia.org/wiki/Economics_of_open_science-10.md
@ -0,0 +1,17 @@
+---
+title: "Economics of open science"
+chunk: 11/15
+source: "https://en.wikipedia.org/wiki/Economics_of_open_science"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:49:05.253185+00:00"
+instance: "kb-cron"
+---
+
+=== Transaction costs ===
+In a market economy, the diffusion of private goods is usually assorted with transactions costs. Theses costs cover all the services required to manage the commoditization of a product, both from the side of the producer as from the side of a consumer. In a wider sense, it encompasses all the work and time allocated all the stakeholders of the transactions to perform it, such as benchmarks, negotiations or contractualizations.
+Savings on transaction costs are frequently quoted as a significant advantage of non-excludable resource system (common good, public good) over private markets. Since access to the resource is weakly restrained and conditioned by unformal rules, its allocation is less costly overall: "a community could (...) produce better quality outcomes at lower economic costs, because of lower transaction costs, than alternative institutional systems involving private property" According to a 2017 OECD report, market allocation of open knowledge outputs is not sufficiently efficient: as the price of a public good converges to zero, any attempt at setting up a price will result in excessive exclusion and reduced collective benefits.
+
+The market may not be the best mechanism for the allocation of a public good, as any price above the marginal cost (of copying and distribution) will reduce net welfare – by locking out users and uses that do not have the capacity to pay or are not willing to pay. For digital information made available online the marginal cost is very low – close to zero. However, a price set at zero or close to zero will not be sufficient to cover full costs (…) To be sustainable, data repositories need to generate sufficient revenue to cover their costs, but setting a price above the marginal cost of copying and distribution will reduce net welfare.
+All the models of open science have a direct impact on one sub-sample of transaction costs: exclusion costs. There is no need to maintain systems of enforce rules to ensure that a publication will not be used by an unauthorized reader: "This exclusion costs the excluder money. One cost is digital-rights management or DRM, the software lock that opens for authorized users and blocks access to the unauthorized. A second cost is writing and enforcing the licensing agreement that binds subscribers." In addition to DRM system, large commercial publishers have also developed intrusive methods to track subsequent usages of a publication.
+Non-commercial Open Science journals and open infrastructures can mitigate a larger range of services and costs. Author-paid journals still have to maintain transactional activities, as the management of article-processing charges becomes a core business activity. Moreover, the real cost of large commercial agreements with leading publisher is not well documented as the proceedings of big deals are not public. Even in the context of journal flipping, the negotiation of complex licenses may represent a significant time investments from library and research institution.
--- a/data/en.wikipedia.org/wiki/Economics_of_open_science-11.md
+++ b/data/en.wikipedia.org/wiki/Economics_of_open_science-11.md
@ -0,0 +1,14 @@
+---
+title: "Economics of open science"
+chunk: 12/15
+source: "https://en.wikipedia.org/wiki/Economics_of_open_science"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:49:05.253185+00:00"
+instance: "kb-cron"
+---
+
+=== Access costs ===
+Access to pre-existent research outputs such as publications, datasets or code is a major condition of research. The subscription model used to rely on the assumptions that researchers only need to access a few specialized publications. In practice, research is more unpredictible and frequently rely crossing methods and observations from different fields or even different disciplines. In 2011, a survey of JISC showed that 68% of UK researchers felt they did have a sufficiently wide access to journal and conference papers. Non-academic professional audiences experience difficulties in accessing directly relevant research: "a quarter of those in industry/commerce described their current level of access as fairly/very difficult" as "the main barrier was unwillingness to pay". A survey of business use of research in Denmark highlighted a large range of strategies to avoid the high cost of pay-per-view, especially through collaboration with academics that have institutional access. The impact of open access is amplified by the high degree of internationalization of research: "83% of the economic return on cancer research is drawn from research from non-UK sources" which are less likely to be accessible in a subscription-based model.
+Beyond the extended coverage, Open science can enhance the efficiency of bibliographic search: "It can take longer for people to access closed research outputs than when access is open." Access to the full text is also a common bibliographic search strategies, since it will often reference other relevant publications. A case studies on knowledge workers shows that restrictions to access translate in significant costs in work-time: "knowledge-based SME employees spent on average 51 min to access the last research article they had difficulty accessing, and this rose to 63 min for university researchers."
+The open science movement has also entailed the diffusion of new resources: open research data and software. In these cases, access was not limited or constrained by high prices but generally non-existent, as they were at most shared across research teams or institutions. Economic estimates of the impact of opening new research output are consequently more difficult, as there is no prior market. Several studies Houghton and Beagrie on the commercial use of major open data portals (Economic and Social Data Service, Archaeology Data Service, British Atmospheric Data Service and the European Bioinformatics Institute) attempted to circumvent the issue by estimating the "willingness to pay", as a proxy for the positive economic impact: how much would the company would agree to pay if the service became only accessible through subscriptions. In all the case, this "consumer surplus" was much higher than the cost needed to run the service (for instance, £21m per year for the Economic and Social Data Service against an operating cost of £3m, or £322m per year for the European Bioinformatics Institute against an operating cost of £47). For large repositories of data or publication, the consumer surplus may be even more significant on a long-term basis, as the value of the infrastructure and the potential benefits it brings becomes more important as the range of hosted outputs continues to expand: "data archives are appreciating rather than depreciating assets. Most of the economic impact is cumulative and it grows in value over time, whereas most infrastructure (such as ships or buildings) has a declining value as it ages. Like libraries, data collections become more valuable as they grow and the longer one invests in them, provided that the data remain accessible, usable, and used."
--- a/data/en.wikipedia.org/wiki/Economics_of_open_science-12.md
+++ b/data/en.wikipedia.org/wiki/Economics_of_open_science-12.md
@ -0,0 +1,13 @@
+---
+title: "Economics of open science"
+chunk: 13/15
+source: "https://en.wikipedia.org/wiki/Economics_of_open_science"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:49:05.253185+00:00"
+instance: "kb-cron"
+---
+
+=== Research efficiency ===
+Impacts of open science on research efficiency stem from the benefits of enhanced access to previous work. Due to the complexity of bibliography search, closed subscription system "can lead to high levels of duplication—that is, where separate teams work on the same thing unbeknownst to each other." The issue is not limited to academic research, but affect industrial R&D as well: "an analysis of pharmaceutical patents by 18 large companies showed that 86% of target compounds were investigated by two or more companies" Non-publication of data or intermediary results can also have cascading effects on the overall quality of research. Meta-analysis rely on the reproductibility of pre-existent observations and experiments to identify the scientific consensus on a specific topic or field of research. They can be affected by statistical errors and bias, as well as the pre-selection of statistically significant results. Extensive opening of final and intermediary data sources makes it easier to spot potential mistakes.
+Text and data mining projects have more recently become a major focus of studies on potential gain of research efficiency. In contrast with the standard procedures of state of the art, text mining projects process very large corpus and are bound to be limited by the available collections in academic libraries. Additionally, special authorization has to be given from the publishers unless the corpus is published in a free license, as the proper use of automated analyses require making copies accessible among project members. Access procedures may represent a significant investment for text mining projects: "As well as the costs and time required to reach such agreements, it also introduces significant uncertainty into such projects as it is possible that some agreements may not be reached". In 2021, a quantitative analysis of text and data mining research showed that "there is strong evidence that the share of DM research in total research output increases, where researchers do not need to acquire specific consent by rights holders". The restrictive effect of the lack of open access or text and data mining exception is sufficiently noticeable to highlight "an adverse net effect of IP on innovation, in the sense that there is strong evidence for stricter copyright hindering the wide adoption of novel ways to build on copyright works and generate derivative works." In 2012, a JISC report estimated that a facilitated use of text and data mining tools, notably in the context of bibliographic search, could generate significant gains of productivity: "if text mining enabled just a 2% increase in productivity – corresponding to only 45 minutes per academic per working week (...) this would imply over 4.7 million working hours and additional productivity worth between £123.5m and £156.8m in working time per year."
--- a/data/en.wikipedia.org/wiki/Economics_of_open_science-13.md
+++ b/data/en.wikipedia.org/wiki/Economics_of_open_science-13.md
@ -0,0 +1,15 @@
+---
+title: "Economics of open science"
+chunk: 14/15
+source: "https://en.wikipedia.org/wiki/Economics_of_open_science"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:49:05.253185+00:00"
+instance: "kb-cron"
+---
+
+=== Economic and social development ===
+The potential economic impact for open science research outputs is significant. Innovation commons are a major, although overlooked, source of economic growth: "The innovation commons is the true origin of innovation. It is the source from which the subsequent markers of innovation emerge — the entrepreneurial actors, the innovating firms, the new markets, and so on." Recent developments like the growth of data analytics services across a large variety of economic sectors have created further needs for research data: "There are many other values (...) that are promoted through the longterm stewardship and open availability of research data. The rapidly expanding area of artificial intelligence (AI) relies to a great extent on saved data." In 2019, the combined data market of the 27 countries of the European Union and the United Kingdom was estimated at 400 billion euros and had a sustained growth of 7.6% per year. although no estimation was given of the specific value of research data, research institutions were identified as important stakeholders in the emerging ecosystem of "data commons".
+In 2011, a JISC report estimated that there was 1.8 million knowledge workers in the United Kingdom working in R&D, IT, engineering services most of whom being "unaffiliated, without corporate library or information center support." Among a representative set of English knowledge workers, 25% stated that access to the literature was fairly difficult or very difficult and 17% had a recent access problems that has never been resolved. A 2011 survey of Danese business highlighted a significant dependence of R&D to academic research: "Forty-eight per cent rated research articles as very or extremely important". Consequently, lack or difficulty of access affects the development of commercial services and products: "It would have taken an average of 2.2 years longer to develop or introduce the new products or processes in the absence of contributing academic research. For new products, a 2.2 years delay would cost around DKK 36 million per firm in lost sales, and for new processes it would cost around DKK 211 000 per firm in lost savings." Research data repositories have also experimented with efficient data management workflows that can become a valuable inspiration for commercial structures: "properly designed data commons can serve to R&D processes as an active and accessible repository for research data".
+Estimations of the global business impact of open science are challenged by another positive economic factor of open science: weak or even inexistent transactions costs. Commercial uses of open publications, data or software occurs unformaly and is hardly identifiable: "use of open science outputs (e.g., by firms) often leaves no obvious trace, so most evidence of impacts is based on interviews, surveys, inference based on existing costs, and modelling approaches." Concrete impact of open science on commercial products and activities has been measured at the scale of a few major projects. The Human Genome Project made all the progressively available results of human sequencing within 24 of discovery from 1990 to 2003. A retrospective assessment showed a very high return cost on investment: "a $3.8 billion project drove $796 billion in economic impact [and] created 310,000 jobs". Another case study focused on the incidence of opening data on a pharmaceutical compound, JQ1: 105 patents have been filed in the following years, in comparison with less than 30 for similar compounds.
+Social impact has become an important focus of open science infrastructure in the late 2010s. Access to non-academic audiences has created a new potential justification for their funding and maintenance. Potential groups that may benefit from open access "include citizen scientists, medical patients and their supporting networks, health advocates, NGOs, and those who benefit from translation and transformation (e.g., sight-impaired people)."
--- a/data/en.wikipedia.org/wiki/Economics_of_open_science-14.md
+++ b/data/en.wikipedia.org/wiki/Economics_of_open_science-14.md
@ -0,0 +1,19 @@
+---
+title: "Economics of open science"
+chunk: 15/15
+source: "https://en.wikipedia.org/wiki/Economics_of_open_science"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:49:05.253185+00:00"
+instance: "kb-cron"
+---
+
+== Economic regulation of open science ==
+Economic regulation of scientific publishing has long be stuck in a "collective action dilemma", due to the lack of coordinations between all stakeholders: "To truly reduce their costs, librarians would have to build a shared online collection of scholarly resources jointly managed by the academic community as a whole, but individual academic institutions lack the private incentives necessary to invest in a shared collection."
+Although it was initially expected by some economists, that the academic publishing markets would be structurally disrupted by new open access competitors, change was mostly driven by scientific communities, scientific institutions and, lately, coordinations of funders. In the 2000s, forms of regulation appear at a local scale to solve obvious market failure in the management of open science outuput. Development of research data repositories has been sustained by the implementation of "government or funder mandates for open data that require the producers of data to make them openly accessible". Mandates have been less easily expanded to other scientific outputs such as publications, that could not be covered by open data programs and were already invested by large commercial structures. In the great recession, scientific institutions and libraries had to balanced significantly reduced budgets, which entailed a first wave of big deal cancellation as well as "promoted the search for alternatives to this model". This specific context created a precedent for a secondary wave of big deal cancellation, no longer solely motivated by fund cuts but also by "the advance of open science"<ref>
+In the early 2010s, leading publishers had come under heightened pressure to convert to open access. Along with the mobilization of researchers, the realization that academic publishing no longer operated in normal market conditions has redefined the position of scientific funders and policy-makers: [For Robert-Jan Smits], "if we really want OA to become a reality, we just have to make it obligatory, I thought: no more friendly requests, but rules and implications." Freedom of Information Requests has come to unveil the real cost of big deals in several countries. On July 17, 2012 the European Union issued a recommendation on Access to and Preservation of Scientific Information that called to "define clear policies" on open access. This approach "was a major shift compared with the previous EU 7th Framework Programme (2007–13), which had defined OA merely as a pilot action in select areas." It initiated a new cycle of regulatory policies of large academic publishers. The Horizon 2020 research program made open access a requirement for funding.
+The Plan S was originally "a simple plan" mostly addressed to funding agencies: "any researcher who receives a grant from one of them must only publish in an OA journal under a CC BY licence". The early draft included a mechanism to cap the price of Article-Processing charges that was finally not retained in the final version. The first official version released in September 2018 favored "transformative agreements, where subscription costs are offset by publication costs, can help to accelerate the transition to open access." While criticized for its bias in favor of commercial open access, and the perpetuation of high publishing costs, the Plan S has facilitated the creation of a global coordination in negotiations with large publishers.
+
+== References ==
+
+== Bibliography ==
--- a/data/en.wikipedia.org/wiki/Economics_of_open_science-2.md
+++ b/data/en.wikipedia.org/wiki/Economics_of_open_science-2.md
@ -0,0 +1,24 @@
+---
+title: "Economics of open science"
+chunk: 3/15
+source: "https://en.wikipedia.org/wiki/Economics_of_open_science"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:49:05.253185+00:00"
+instance: "kb-cron"
+---
+
+==== Knowledge club and open science ====
+The OA Diamond Study states that non-commercial journals in open access "maintain a secular tradition of "club" journals, set up for the uses and interests of a specific closed community of knowledge." While access to read is no longer a material condition for membership, non-commercial journals are still largely managed for the benefits of a community. They are "still strongly embedded in institutional environments (from a legal and governance perspective)."
+Club journals have gained a new relevance with the development of electronic publishing and played a fundamental role in the early development of online open science in the early 1990s. Pioneers of open access electronic publishings were non-commercial and community-driven initiatives that built up on a trend of grassroot publishing innovation in the social sciences and the humanities: "In the late '80s and early '90s, a host of new journal titles launched on listservs and (later) the Web. Journals such as Postmodern Cultures, Surfaces, the Bryn Mawr Classical Review and the Public-Access Computer Systems Review were all managed by scholars and library workers rather than publishing professionals."
+The development of the web and free editing tools for scientific publications like Open Journal System made it possible to run non-commercial journals and apply automated editorial routines at a limited cost: "model made economic sense as outsourced specialisation, but technological change has upended that logic by dramatically lowering the cost of in-house production." Equipment in personal computers has additionally reinforced the free exchange of services among club members, as new contributions could be made beyond peer review evaluation: "The wide availability of desktop hardware and software enabled new capabilities among authors, and an expectation from publishers that authors would self-manage much of the layout and editing of articles." Unless they had largely transformed their publication in a commercial activity, knowledge clubs and historical scientific societies have embraced open access: since 2014, the Royal Society has published the Royal Society Open Science journal.
+New forms of research infrastructure developed in the context of open science have also retained some features of a "club model". While they create and manage a common or a public goods, research data repositories are frequently developed for the selected benefit and prestige of a few institutions: "In the case of research data repositories, such a "privileged" situation may arise for research funders, research centres, universities, or a disciplinary group or society that may gain recognition and further funding from supporting or hosting an open data repository." Yet, a full membership or club model, that would also include restrictions to access, remains rare among research data repositories as "it is difficult to develop and maintain a group large enough to cover costs, affecting both scale and sustainability."
+Despite the continuities, the articulation between the historical values of the club (exclusion, internal management, centrality) and the new values of open science remain challenging. Scientific clubs have long maintained an ambiguous position on the value of openness. While openness was held as fundamental scientific principle, that makes it possible to have a free exchange of ideas, knowledge club have also relied on structural mechanism of exclusion: "A claim of openness, and a narrative that this openness sits at the core of the value system, that is not quite realized in practice. The building of institutions that seek to enhance openness – the Royal Society holding formalized meetings, open to members, in the place of private demonstrations – that are nonetheless exclusive (...) Yet what is passed down to us today, is less that exclusive gentleman's club and more the core values that it sought to express."
+Recent developments of open science and citizen science creates a new source of tension that is still not resolved: "There are profound challenges to adapting our institutions to interact productively with differing knowledge systems, but we are perhaps for the first time well placed to do so." According to Samuel Moore, the main discourses on open science and scientific commons continues to encover exclusionary practices reminiscent of historical knowledge clubs : "many uses of the term commons in scholarly communications are themselves ill- or un-defined and intend to evoke a kind of participatory, inclusive or freely accessible resource."
+
+=== Open Science market ===
+
+==== Scientific publishing: a hybrid market ====
+In Western Europe and North America, direct ownership of journals by academic communities and institutions started to wane in the 1950s. The historical model of scientific periodicals seemed unable to keep up with the quickly increasing volume of publication in the context of big science. In 1959, Robert Maxwell created one of the first giant of scientific publishing, Pergamon and through the following decades acquired hundreds of journal to small university press and scientific societies. While theses journals were not very profitable individually, with such a high concentration made Pergamon became "too big to fail" and was able to impose its own conditions to academic libraries and other potential customers. This approach was applied as well by Springer and Elsevier. Scientific publishing in the second half of the 20th century has been described as a two-sided market with "significant network externalities" since "authors prefer to publish in academic journals with the largest readership, and readers prefer the journals with the best authors".
+Due to high market concentration, scholarly journal are not "subtractable": they cannot be replaced by equivalent product on the market, which hinders  competition. The development of bibliometric index has reinforced this locked-in process, as highly quoted journals will receive more submissions.
+By the 1980s, the CEO of Elevier, Pierre Vinken aimed for an annual growth rate of 20%, mostly through an uncontrolled raise of subscription prices. From 1985 to 2010, the budget allocated by American Research Libraries to periodicals increased five-fold.
--- a/data/en.wikipedia.org/wiki/Economics_of_open_science-3.md
+++ b/data/en.wikipedia.org/wiki/Economics_of_open_science-3.md
@ -0,0 +1,15 @@
+---
+title: "Economics of open science"
+chunk: 4/15
+source: "https://en.wikipedia.org/wiki/Economics_of_open_science"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:49:05.253185+00:00"
+instance: "kb-cron"
+---
+
+The small society presses, struggling to cope with growing scale, were supported and then largely supplanted by the 'Big 5' commercial presses: Elsevier (which acquired Pergamon in 1991), Wiley, Springer, Taylor & Francis and Sage. These newly-empowered players brought an industrial approach to the publication and dissemination process, for the first time realising the benefits that these specialised capital and skills could provide by operating at a scale that was unprecedented to that date.
+As it became a private good, the scientific journal was also transformed into an industrial product, with an increased standardization of publishing norms, peer-review process or copyrights. In contrast smaller scientific publishers participate to a more typical publishing market, with a persistent competition. Four out of the five most costly journals "are published by big five publishers".
+With the conversion to electronic journals, scientific publishing became a hybrid market: along with individual subscriptions, leading publishers introduced big deals or pluri-annual licenses to large bundle of journal titles. Big deals were typically negotiated with national networks of research libraries and academic institutions. Big deals proved advantageous to publishers as well as they limited the "administrative costs" of managing a large number of contracts between journals and buyers. The bundling of thousands of journals titles made it possible to ensure the commercial viability of journals that would have had a limited success. In 2011, David Colquhoun showed that 60% of the journals included in the Elsevier licenses granted to the University College were accessed less than 300 times per year and 251 journals were not even accessed once. Even though the inflation of individual subscription cost has slowed, the total amount allocated to scientific periodicals has continued to rise: "While the North-American research libraries spent about a third more on journals than on monographs in 1987, this ratio had risen to about four to one by 2011."
+Following the generalization of the big deal model, the main transactions between large publishers and scientific institutions no longer operated under normal market conditions with fixed public prices: "An optimal pricing strategy when bundling electronic information still does not exist (...) Prices will be determined in bilateral negotiations and every library pays a different price according to its institutional willingness to pay." Opting out of a big deal is a nearly impossible choice for major scientific institutions as "big deals of different publishers are complementary and not substitutes". Big deal licenses are usually covered by non-disclosure agreements, so that prices can be determined on the basis of the financial capacity of the buyer: "these practices give publishers pricing flexibility that allows them to try to charge the highest price that each institution is willing to pay and make it hard for new publishers to compete." For Jason Potts et al., this deviation from the market norms shows that the market model is fundamentally less efficient than the knowledge club in the context of scientific publishing. It creates more ecosystemic costs than the direct management of the journals and other scientific outputs by the community: ""If our argument is that clubs and communities are capable of acting together to solve collective action provisioning problems in ways that are more efficient than either markets or the state, then the dissolution of clubs, or their inability to coordinate, will lead to inefficient or non-existent provisioning." Due to increased concentration, large publishers also became a powerful lobby: in the United States, Elsevier had long influenced some key policy issues relating to the economy of publishing and was able to significantly slow down the transition to open access.
+The scientific market is structured by large scale inequalities. Overpriced subscriptions and paywalls have been "a major barrier to progress in developing countries"' While leading publishers have initiated programs at a reduced costs for developing countries, their impact has been limited by the complexity of the subscriptions procedures: "the library or consortia have to go through a procedure to request the discount, assuming they know it exists and can navigate the bureaucracy involved."
--- a/data/en.wikipedia.org/wiki/Economics_of_open_science-4.md
+++ b/data/en.wikipedia.org/wiki/Economics_of_open_science-4.md
@ -0,0 +1,21 @@
+---
+title: "Economics of open science"
+chunk: 5/15
+source: "https://en.wikipedia.org/wiki/Economics_of_open_science"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:49:05.253185+00:00"
+instance: "kb-cron"
+---
+
+==== From readers to authors: hybrid and full open access ====
+The early developments of Open science and the open sharing of scientific publication on the web was at first a challenge for leading commercial publishers: the executive board of Elsevier "had failed to grasp the significance of electronic publishing altogether, and therefore the deadly danger that it posed—the danger, namely, that scientists would be able to manage without the journal". Initial experiments of scientific journals were mostly led by non-commercial initiatives.
+Commercially viable open access journals have appeared in the late 1990s, under the leadership of new competitors such as the Public Library of Science (PLOS), MDPI or Hindawi. They relies generally on the author-pay model: the journal provides an editorial service to the author of a publication and charge an article-processing charge to cover the editing costs. This practice is anterior to open access, as subscription-based journals held by scientific societies occasionally charged for additional services (such as photographies in color). For Peter Suber, this economic model is similar to broadcast television: "If advertisers can pay all the costs of production, then a TV studio can broadcast a show without charging viewers. In the case of scholarly research articles, the model works because authors are willing to relinquish royalties to get their message across and a growing number of institutions that employ researchers or fund research are willing to consider the cost of dissemination".
+After 2000 large commercial publishers started to adopt a hybrid business model also termed open choice: authors have either the possibility to submit for free a paywalled articles or pay for a free versions. This practice has been increasingly criticized as double dipping. Due to the complexity of the publishing market, largely structured around big deal license, scientific publishers were in a position of "collecting money twice" Supervisors of open access policies have become more skeptical of the transitory nature of hybrid models. According to the lead coordinator of Plan S, Robert-Jan Smits "when I then asked [the large publishers] when this transition would be completed, they were silent. The reason for this was clear: they saw hybrids as a way to continue the status quo." To overcome this risk, the Plan S stated that transformative journals will no longer be compliant after a transition period that ends in 2023.
+While they were originally devised for a subscription-based model, big deals have been repurposed as large scale agreement to generalize commercial open access. Since subscription costs were already bundled at a national level, they could be repurposed as publications licenses or Article-Processing Charges licenses as part of a journal flipping. In 2015, the Max Planck Society issued a White Paper on the economic cost of the transformation to open access: "All the indications are that the money already invested in the research publishing system is sufficient to enable a transformation that will be sustainable for the future." In this context, the APC become the default business model for all journals: "Our own data analysis shows that there is enough money already circulating in the global market – money that is currently spent on scientific journals in the subscription system and that could be redirected and re-invested into open access business models to pay for APCs." The economic debate over journal flipping has largely evacuated the issue of potential savings: negotiations aim rather to ensure a global conversion to open science at the same costs as existing subscriptions licenses. Several national negotiations with big publishers attempted to implement the journal flipping approach with a limited success.
+Commercial open access models based on article processing charges create new structural inequalities, no longer in terms of access to read but access to publish. High APC prices mean that in practices global south authors are bound to be cut off from major journals: "APCs also present a problem for researchers in the Global South, who typically have much smaller budgets to work with than their northern counterparts."
+
+==== From publishing to analytics: diversification of revenue streams ====
+
+In parallel with the open science movement, leading scientific publishers have diversified their activities beyond publishing and moved "from a content-provision to a data analytics business". The ubiquituous use of digital technologies in research activities and in the institutional management of science and highed education has been "creating new income streams". Large publishers have been well positioned in this new market, as they already have know-how, the infrastructures and  intellectual property on a large range of scientific outputs. Additionally, they have the necessary resources for long-term investments thanks to the accumulated high margins of journal subscriptions. Negotiation of "big deal" additionally created a favorable framework, as access to subscriptions or APCs could be easily tied with exclusive contracts on other databases and tools.
+In the 2010s, leading publishers developed or acquired new key infrastructures for the management scientific and pedagogic activities: "Elsevier has acquired and launched products that extend its influence and its ownership of the infrastructure to all stages of the academic knowledge production process". In the past two decades, there has been 340 merging and acquisitions for Elsevier, 240 for Informa (Taylor & Francis) and 80 for Wiley.  While most of theses transactions were linked to academic content, until the 2010s, it has quickly been expanded to academic "services" and other data analytics tools. Although all leading publishers attempt a vertical integration of existing and new services it can take different shapes. For instance there is "a clear attempt by Wiley to enhance its control over the university decision-making process in education, as Elsevier has for academic knowledge production." By 2019, Elsevier has either acquired or built a large portofolio platforms, tools, databases and indicators covering all aspects and stages of scientific research:
--- a/data/en.wikipedia.org/wiki/Economics_of_open_science-5.md
+++ b/data/en.wikipedia.org/wiki/Economics_of_open_science-5.md
@ -0,0 +1,17 @@
+---
+title: "Economics of open science"
+chunk: 6/15
+source: "https://en.wikipedia.org/wiki/Economics_of_open_science"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:49:05.253185+00:00"
+instance: "kb-cron"
+---
+
+the largest supplier of academic journals is also in charge of evaluating and validating research quality and impact (e.g., Pure, Plum Analytics, Sci Val), identifying academic experts for potential employers (e.g., Expert Lookup5), managing the research networking platforms through which to collaborate (e.g., SSRN, Hivebench, Mendeley), managing the tools through which to find funding (e.g., Plum X, Mendeley, Sci Val), and controlling the platforms through which to analyze and store researchers' data (e.g., Hivebench, Mendeley).
+Since it has expanded beyond publishing, the vertical integration of privately owned infrastructures has become extensively integrated to daily research activities: "the privatised control of scholarly infrastructures is especially noticeable in the context of 'vertical integration' that publishers such as Elsevier and SpringerNature are seeking by controlling all aspects of the research lifecycle, from submission to publication and beyond". In contrast with publication, which was an outsourced business separated from institutional and community activities, the new services developed by large publishers are embedded in the infrastructure of universities and create potentially stronger dependency links: "Pure embeds Elsevier within the university workflow process through its abilities to manage research at the university level, including the provision of a dashboard to facilitate decision making by university research administrators (Elsevier, "Features")."
+Metrics and indicators are key components of vertical integration: "Elsevier's further move to offering metrics-based decision making is simultaneously a move to gain further influence in the entirety of the knowledge production process, as well as to further monetize its disproportionate ownership of content". While depency subscription journals has been fragilized by the open science movement, metrics can create a new locked-in situation for scientific institutions: "For universities keen on raising or maintaining their rankings, publishing in Elsevier high impact journals may help them gain the advantage (...) vertical integration and the promotion of citation metrics and algorithmic recommendations may, in fact, constitute rent-seeking behavior designed to increase." Consequently, a shift of leading publishers to data analytics is not incompatible with the parallel development of a large APC market for open science publishing. For Samuel Moore, it is even "incentivised by the governmental policies for OA through APCs, repository services" as it create new "need to track compliance".
+The emerging open science market has been compared with the business models of social networks, search engines and other forms of platform capitalism. While content access is free, it is indirectly paid through data extraction and surveillance. "If the primary negative manifestation of market power in the publishing sector is high paywall price (lack of access, therefore), the result of monopolistic competition in academic data analytics will be the combination of dependence and surveillance that we might associate with, e.g., Facebook." Increasing similarities with other digital platforms may have contributed to the increased regulations on the academic publishing market in Europe in the 2010s: "It's why Facebook, Apple and Google are now dominant: once they are controlling X per cent of the market, it's almost impossible for a competitor to come up."
+
+=== Open Science Commons ===
+The concept of commons was originally developed to describe the management of "a resource shared by a group of people" and the establishment of common governance rules to ensure that the resource is not overused or polluted (which would result in a tragedy of the commons).</ref> Similarly to club goods, common goods are used and maintained by a community. Yet, the membership is no longer exclusive: "toll goods (also called club goods) share with private goods the relative ease of exclusion". Typical forms of commons include shared natural resources like timber, berries or fishes. They are managed by unformal local associations. Governance rules are neither rigid nor pre-existing but have to be adapted to the specific requirements of the resource and the local environment: "One of the central findings was that an extremely rich variety of specific rules were used in systems sustainable over a long time period. No single set of specific rules, on the other hand, had a clear association with success." Common goods are also differentiated to Public goods (such as air or radio waves) due their subtractibility: natural resources can be depleted and rules have to be put in place to ensure they will not be overused.
--- a/data/en.wikipedia.org/wiki/Economics_of_open_science-6.md
+++ b/data/en.wikipedia.org/wiki/Economics_of_open_science-6.md
@ -0,0 +1,22 @@
+---
+title: "Economics of open science"
+chunk: 7/15
+source: "https://en.wikipedia.org/wiki/Economics_of_open_science"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:49:05.253185+00:00"
+instance: "kb-cron"
+---
+
+==== The emergence of digital knowledge commons ====
+Until the 1990s open knowledge was not considered as a commons in economic theory but as a "classic example of a pure public good, a good available to all and where one person's use does not subtract from another's use". For Elinor Ostrom and Charlotte Hess, this framework is no longer viable as the principle of non-excludability has been significantly weakened: "new technologies can enable the capture of what were once free and open public goods (...) Knowledge, which can seem so ubiquitous in digital form, is, in reality, more vulnerable than ever before" In a scientific context, examples of new enclosures of public goods may include all the surveillance data systems put in place by Elsevier, Springer or Academic social networks that capturate activities such as social interactions, reference collections. The uncontrolled development of the early web highlighted the need for common management of knowledge resources: "People started to notice behaviors and conditions on the web — congestion, free riding, conflict, overuse, and pollution — that had long been identified with other types of commons."
+The open access of the 1990s and early 2000s movement aimed to ensure that science will be a public good, freely usable to all. The unlimited potential circulation of online content has transformed historical forms of non-commercial open access as a commons, at least from a reader's point of view: "By definition, OA literature excludes no one, or at least no one with an Internet connection. By contrast, non-OA electronic journals try very hard to exclude nonsubscribers from reading the articles".
+While "knowledge commons is not synonymous with open access", the process of making open access a reality has also incidentally created a global "community network of the open-access movement": decisions had to be made regarding a commonly accepted definition of open access, free licenses and potential exclusions of non-open access initiatives that are embodied in the Budapest Open Access Initiative.
+The early open science infrastructures aimed to ensure the circulation of scientific publications as a common good. Archives or institutional repositories were conceived as local or global community services. In August 1991, Paul Ginsbarg created the first inception of the arXiv project at the Los Alamos National Laboratory in answer to recurring storage issue of academic mailboxes on account of the increasing sharing of scientific articles. Repositories embody numerous characteristics of common resources under the definition of Elinor Ostrom: they maintain and protect a scientific resources, they implement weak requirements for membership (submissions are not peer-reviewed) and they prime coordination and shared management over competition. By the early 2000s, numerous repositories strived to "comply with the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) of 1999, which ensures the interoperability of different repositories for the purpose of locating their contents".
+
+==== Enclosures of open science commons ====
+Archive repositories and other forms of open science infrastructures have been originally individual initiatives. As such, their status as a scientific commons were not always institutionalized and they were hardly protected against potential privatizations: "one recent strategy of traditional commercial journal publishers to avoid negative externalities from green OA is to acquire successful OA repositories". The acquisition of Digital Commons and SSRN by Elsevier has highlighted the lack of reliability of critical scientific infrastructure for open science, which creates the conditions of a tragedy of the commons. The SPARC report on European Infrastructures underlines that "a number of important infrastructures at risk and as a consequence, the products and services that comprise open infrastructure are increasingly being tempted by buyout offers from large commercial enterprises. This threat affects both not-for-profit open infrastructure as well as closed, and is evidenced by the buyout in recent years of commonly relied on tools and platforms such as SSRN, bepress, Mendeley, and GitHub." Weak definitions of scientific commons, and of the requirements and expectations of commons governance, may have facilitated this take-over: "many self-described 'commons' projects for open access publishing simply restate the values of commercial publishing (...) while relying on the language of a more progressive politics."
+In contrast with the consolidation of privately owned infrastructure, the open science movement "has tended to overlook the importance of social structures and systemic constraints in the design of new forms of knowledge infrastructures." It remained mostly focused to the content of scientific research, with little integration of technical tools and few large community initiatives. "common pool of resources is not governed or managed by the current scholarly commons initiative. There is no dedicated hard infrastructure and though there may be a nascent community, there is no formal membership." In 2015, the Principles for open science infrastructures underlined the discrepancy between the increasing openness of scientific publications or datasets and the closeness of the infrastructure that control their circulation.
+
+Over the past decade, we have made real progress to further ensure the availability of data that supports research claims. This work is far from complete. We believe that data about the research process itself deserves exactly the same level of respect and care. The scholarly community does not own or control most of this information. For example, we could have built or taken on the infrastructure to collect bibliographic data and citations but that task was left to private enterprise.
+The fragility of open science commons until the 2010s contrasts with the dynamics of contributive projects beyond the scope of research and scientific activities. Wikipedia, Open Street Map or Wikidata are open communities with a low threshold of admission and membership that will come to tipify the online knowledge commons. Their management is analogous to natural common-pool resource system, where local uses and participation are rarely discriminated a priori, although repeated abuses can lead to exclusion.
--- a/data/en.wikipedia.org/wiki/Economics_of_open_science-7.md
+++ b/data/en.wikipedia.org/wiki/Economics_of_open_science-7.md
@ -0,0 +1,20 @@
+---
+title: "Economics of open science"
+chunk: 8/15
+source: "https://en.wikipedia.org/wiki/Economics_of_open_science"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:49:05.253185+00:00"
+instance: "kb-cron"
+---
+
+==== Consolidation of the common ecosystem (2015-) ====
+Since 2015, open science infrastructures, platforms and journals have converged to the creation of digital academic commons. While they were initially conceived either as a public good (a "grid" that ensure the distribution of a non-excludable resource) or a club good (with a limited range beyond a specific communities), open science infrastructures have been increasingly structured around a shared ecosystem of services and standards has emerged through the network of dependencies from one infrastructure to another. Open science infrastructures face similar issues met by other open institutions such as open data repositories or large scale collaborative project such as Wikipedia: "When we study contemporary knowledge infrastructures we find values of openness often embedded there, but translating the values of openness into the design of infrastructures and the practices of infrastructuring is a complex and contingent process".
+The conceptual definition of open science infrastructures has been largely influenced by the analysis of Elinor Ostrom on the commons and more specifically on the knowledge commons. In accordance with Ostrom, Cameron Neylon understates that open infrastructures are not only a public good characterized by the management of a pool of common resources but also by the elaboration of common governance and norms. The economic theory of the commons make it possible to expand beyond the scope of limited scope of scholar associations toward large scale community-led initiatives: "Ostrom's work (...) provide a template (...) to make the transition from a local club to a community-wide infrastructure." Open science infrastructure tend to favor a non-for profit, publicly funded model with strong involvement from scientific communities, which disassociate them from privately owned closed infrastructures: "open infrastructures are often scholar-led and run by non-profit organisations, making them mission-driven instead of profit-driven." This status aims to ensure the autonomy of the infrastructure and prevent their incorporation into commercial infrastructure. It has wide range implications on the way the organization is managed: "the differences between commercial services and non-profit services permeated almost every aspect of their responses to their environment".
+As of 2022, major actors that have formally adopted the core principles of open science infrastructures (or POSSE) include Crossref, CORE, OpenAir, and OpenCitations.
+Consolidation of the commons ecosystem has been also visible in non-commercial journals, which moved from a knowledge club paradigm to more global commons initiative. While the daily management of non-commercial journals fit better to the definition of a knowledge club, more innovative models of governance "tend to bridge the secular heritage of scientific societies with the new wave of digitized knowledge commons such as Wikipedia or OpenStreetMap". New forms of common-based regulations and distributed decision-making processes have been gradually introduced: "The ascending role of the editorial committee and volunteers brings OA diamond journals closer to community-run projects where contributors are constantly self-learning and appropriating tasks" Integrations of the specific perspectives of the global South have redefined the common "understandings of the commons" beyond the perspectives of "more powerful stakeholders, wealthy disciplines and countries in the Global North".
+The future of scientific commons remain a debated issue. The OA Diamond Study underlines the Open Access Commons as a potential future road of development for non-commercial open access journals and beyond: "The OA Commons will be a new more integrated international OA publishing system and ecosystem that serves the research community." Fragmentation has hindered on the development of non-commercial structures. Reliancy on small local communities result on a low visibility to potential readers or funders: most of the estimated 17,000 to 29;000 non-commercial journals are currently off the charts of scientific publishing indicators. The creation of common services and infrastructures as well as inter-disciplinary and inter-community coordinations may contribute to overcome built-in limitations of the knowledge club model: "The OA Commons will be community-driven and will bring communities together who already are or want to work together to become more effective." Alternatives visions of scientific commons include more decentralized models of "small, semi-autonomous projects that are loosely affiliated but mutually reliant" as large platforms and infrastructures could be "unable to account for nuanced relational practices of commoning in local communities and a variety of contexts."
+
+== Cost ==
+Due to the coexistence of several economic models, there can be no unilateral estimate of the cost of open science. Cost-estimate frequently relies on different "scenarios" that match the different models of open science. In 2021, Grossmann and Brembs retain 7 different scenarios that includes outsourcing to a leading commercial publisher, small-scale non-commercial journals supported by free software and volunteer contributions or a hypothetical "decentralized, federated platform solution where all scholarly articles are published without being divided into journals". Economies of scale are also a significant factor, as large platforms and infrastructures can benefit from bundled expenses in numerous areas.
+According to Grossmann and Brembs the total costs of scholarly publication range between $194.89 to $723.16 per article. Regardless of the wide variation per models and potential economies of scale, even the highest estimate of costs of publications in open science are low: "publication costs only cover 15% of the subscription price (...) assuming a conservative profit margin of 30% (i.e., US$1,200 per article) for one of the large publishers there remains a sizeable gap of about US$2,200 in non-publication costs, or 55% of the price of a scholarly subscription article".
--- a/data/en.wikipedia.org/wiki/Economics_of_open_science-8.md
+++ b/data/en.wikipedia.org/wiki/Economics_of_open_science-8.md
@ -0,0 +1,21 @@
+---
+title: "Economics of open science"
+chunk: 9/15
+source: "https://en.wikipedia.org/wiki/Economics_of_open_science"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:49:05.253185+00:00"
+instance: "kb-cron"
+---
+
+=== Editorial work and evaluation ===
+Editorial work and the management of peer review remain the core activity of academic journals and their most identified contribution and service rendered to scientific communities. It is the main expenses of the non-commercial journals surveyed by the OA Diamond Study: "the five main expenses/payables of the journal are editing (531), copy-editing (463), technical and software support (393), typesetting (384), and design (336)". Translation is a frequently quoted added cost, that may have been incentivicized by open science as the potential audience of local non-academic readers create incentives to maintain a multilingual website. Even before their conversion to electronic publishing, non-for-profits journals have maintained an affordable price, due to a nearly exclusive focus on editorial services: "in 2013, the mean price per article in for-profit journals was 3.2 times higher than in non-for-profit journals, and that the corresponding ratio for the median price per article was even 4.33:1". With Internet publication, costs can be significantly lower, due to additional cuts in transaction and labor costs, use of share platforms and infrastructure or reliance on voluntary work: "over 60% of journals reported annual costs in the previous year under $/€10,000, including in-kind contributions."
+In contrast with the more technical aspects of scientific publication, editorial work cannot be easily scaled. Unless relying on volunteer work, cost in time and expertise is roughly similar: "For certain tasks, for example copyediting or typesetting, there are hundreds of individual companies worldwide providing those services (...) having compared the pricing of those service providers with others, we found only a very small variation of cost for such tasks". The only potential margin is by allocating this work to the scientific authors: "The wide availability of desktop hardware and software [has created] an expectation from publishers that authors would self-manage much of the layout and editing of articles."
+For Peter Suber, the expenses of open science journals "peer review is the most significant." As an inheritance of the historical model of knowledge club the main costs of peer review are not directly supported by journals: the evaluation is performed freely by researchers. Following the expansion of commercial journals since the late 20th century, free services of peer review have increasingly become an over-exploited resource. The conversion of subscription journals to author-pays model of open access has added new pressure to a strained practice: as authors are the main customers of what has essentially become an editorial service, fast-track evaluation are in high demand. While the development of integrated editorial system has streamlined the editorial process of receiving and managing reviews, locating competent reviewers is major issue and create added costs and work to journal editors: "finding, recruiting and retaining reviewers" are a major concern of non-commercial journal editors.
+The development of new Open science platforms and infrastructure makes it possible to unbundle the academic editorial workflow: "costs are reduced by eliminating the need for type-setting and copy-editing, with web-hosting costing only $15/year, and a total operating cost of between $6.50–$10.50 per article." Through theses mechanism, "open access has the opportunity to become a cost-reducing mechanism for scholarly publishing."
+
+=== Technical infrastructure ===
+Conversion to electronic publishing has created significant economies of scale. Large scale publishers has been among the first beneficiaries of reduced editorial and technical cost, through to the concentration and the standardization of the publishing infrastructure: "These newly empowered players brought an industrial approach to the publication and dissemination process, for the first time realising the benefits that these specialised capital and skills could provide by operating at a scale that was unprecedented to that date." This process started before the development of electronic publishing, with the creation of internal databases to manage peer-review and other key aspects of editorial management. Large academic search engines finalized this process: "As the dominant publishers build databases, discovery systems, and online platforms to house large and integrated collections of journals, it is more difficult for small publishers to compete with them. Building an effective platform for publishing e-journals is expensive (...) After a platform has been created, it is much cheaper and easier to add new journals to it than to build new and redundant platforms."
+After 2000, non-commercial publishers and infrastructure have gradually benefited from the same economies of scale, due to the development of open software tools dedicated to academic production such as Open Journal Systems, that facilitated the creation and the administration of journal website and the digital conversion of existing journals. Among the non-commercial journals registered to the Directory of Open Access Journals, the number of annual creation has gone from 100 by the end of the 1990s to 800 around 2010. By 2021, Open Journal Systems has become "a widespread solution in the peer review management of journals".
+Many Open Science Infrastructure run "at a relatively low cost" as small infrastructures are an important part of the open science ecosystem. 16 data research repositories surveyed by the OECD in 2017 quoted technical infrastructure and shared services among the costs "most likely to be susceptible to cost optimisation". In 2020, 21 out of 53 surveyed European infrastructures "report spending less than €50,000". Overall, European infrastructures were financially sustainable in 2020 which contrasts with the situation ten years prior: in 2010, European infrastructures had much less visibility: they usually lacked "a long-term perspective" and struggled "with securing the funding for more than 5 years".
+Beyond the economies of scale, technical infrastructure also created fixed costs to standard publishing services such as article identification (DOI), plagiarism check, long-term digital preservation and standardized XML. While most of theses services are covered by flat fees at a limited expenses, it can still affect significantly the tight budgets of small non-commercial journals.
--- a/data/en.wikipedia.org/wiki/Economics_of_open_science-9.md
+++ b/data/en.wikipedia.org/wiki/Economics_of_open_science-9.md
@ -0,0 +1,18 @@
+---
+title: "Economics of open science"
+chunk: 10/15
+source: "https://en.wikipedia.org/wiki/Economics_of_open_science"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:49:05.253185+00:00"
+instance: "kb-cron"
+---
+
+=== Commercial services and prestige ===
+The overall price per article of commercial publishers is consistently higher than in the non-commercial sectors. This discrepancy has been partly accounted by the maintenance of several services necessary to run a business activity (pricing, transaction management, marketing...), that non-commercial structure can drop entirely with little impact. Leading commercial journals claim to be more selective in regard to article submissions, which has the effect of creating a more complex and time-consuming acceptation workflow: "The more effort a publisher invests in each paper, and the more articles a journal rejects after peer review, the more costly is each accepted article to publish [although] the key question is whether the extra effort adds useful value". Besides, "costs vary widely in this sector". Without any transformation of the editorial workflow or efficiency gain a standard well-established subscription journal could "charge about $3,700 per paper to cover costs". Yet, in a publisher like Nature, the costs would be "at £20,000–30,000 ($30,000–40,000) per paper". In the higher end of the commercial spectrum, cost per articles are more likely to embed services that are not directly related to publishing : "One possible reason for such variation between journals and publishers is that it is generally unclear whether proposed costs relate to those directly involved in article processing or those required in order for a publisher to 'break even' if they receive zero subscription income for an article made OA."
+Early projections suggested that commercial models of open science would result in lower editorial costs than subscription journals. On the basis of the APC commonly practiced by leading open access publishers like PLOS, Houghton & Oppenheim identified a potential save of £800 per articles (£1525 instead of £2335 for subsciption publishing). Taken globally, this would result in "savings of around £500 million per annum nationally in the UK in a worldwide open access system". Critics at the time focused on the irrealism of a global conversion to open science: "many of the savings hypothesised would depend on the rest of the world adopting author-pays or self-archiving models." In 2012, David Lewis characterized commercial open access based on article-processing charges as a "disruptive innovation" that will radically "shift in the nature of scholarly journal publishing". New commercial publishers seemed able to lower significantly editorial expenses: by 2013, "some emerging players (...) say that their real internal costs are extremely low" with Hindawi publishing "22,000 articles at a cost of $290 per article".
+Prestige continues to be a significant driver of price-making in the commercial open access market: "In the academic environment, prestige and reputation have a lot of staying power (...) So far, leading firms in the academic industry have been remarkably resistant to disruptive innovators." The evolution of the cost of article processing charge and the concentration of the commercial open access market has challenged this assumptions. Due to the prestige of some publishers or the integration of new editorial services (like fast-track peer review), the mean price of open access articles has consistently risen: "there is no standard price, and largely no regulation of APCs, which results in some publishers demanding very large amounts of money from authors for the privilege of publishing OA." In France, the mean price of APCs of open access journals has gone up significantly between 2013 (€1395) and 2020 (€1745). A range of scenarios include the total cost of APCs nearly getting comparable to the cost of subscriptions by 2030 (68.7M€ vs. 97,5M€), while a full journal flip from subscriptions to APCs would be much costlier (168.7M€).
+High APCs price are less related to the measurable quality of the journal or of the editorial service than to the capacity of well-known actors to impose elevated prices: "several studies reported only very weak or no correlation between quality of journals (measured in journal impact factors) and the level of APC. Contrarily, the level of APCs for publishing an article is more related to the market power of specific academic publishing companies". The risk of uncontrolled growth of APCs had been clearly identified at the start of the Plan S initiative: its coordinator, Robert-Jan Smits, was "determined to introduce a limit on APCs of €2,000", but in the end, "the cap rejected, on account of too many members of the Plan S Coalition being against its enforcement."
+
+== Benefits ==
+The economic contribution of open science to scientific publishing, to non-academic economic sectors or to society remains little documented. In 2019, the economists Michael J. Fell underlined that while open science policies usually advanced the claim that opening research can bring "significant social and economic benefits", there have been "no systematic attempt has yet been made to identify and synthesize evidence relating to this claim and present a clear picture of the economic impacts that open science might have, how these comes about, and how benefits might be maximized." In his assessment of the state of the art, Fell identified 21 empirical studies that aimed to evaluate the "direct economic impacts in which open science has been a contributory factor", with a focus on Anglo-american countries (United Kingdom, United States and Canada) and Scandinavian countries (Denmark and Finland). Estimates are complicated by the fact that open science is both a scientific and a social movement: the specific scope of academic publishing is too limiting and yet it is more challenging to develop global macro-economic indicators like has been done on open data.
--- a/data/en.wikipedia.org/wiki/European_Open_Science_Cloud-0.md
+++ b/data/en.wikipedia.org/wiki/European_Open_Science_Cloud-0.md
@ -0,0 +1,56 @@
+---
+title: "European Open Science Cloud"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/European_Open_Science_Cloud"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:49:07.702753+00:00"
+instance: "kb-cron"
+---
+
+The European Open Science Cloud (EOSC) is a European Commission initiative aiming at developing an infrastructure providing its users with services promoting open science practices. 
+Besides being open science oriented, the envisaged infrastructure is built by aggregating services provided by several providers following a System of systems approach. Its goal is to establish a federation of infrastructures facilitating effortless access to interoperable research assets and enhanced services spanning geographical boundaries and diverse academic fields.
+The initiative started in 2015 with the plan that its organizers finish it by 2020. A European Union committee on research endorsed a plan for the cloud's development in May 2018. The European Open Science Cloud officially launched in November 2018, starting to provide access to services via their EOSC Portal.
+Public meetings about the project have emphasized the ideological motivations for promoting open science.
+
+
+== EOSC Governance ==
+The development of EOSC is governed by three bodies:  An executive board tasked to ensure implementation and accountability; a Governance board, working as an institutional group gathering representatives from the Member States and Associated Countries and from the Commission to ensure effective supervision of the EOSC implementation; and a Stakeholder Forum  working as the community actively contributing and participating to the EOSC.
+
+
+== EC funded Projects contributing to EOSC development ==
+The EOSC infrastructure is built by leveraging on efforts and services developed and operated by several providers. Moreover, the European Commission specifically funded several projects contributing to the development of EOSC including:
+
+projects started in 2017
+EOSCpilot - The European Open Science Cloud for Research Pilot Project (January 2017 - May 2019)
+eInfraCentral - European E-Infrastructure Services Gateway (January 2017 - 30 June 2019)
+projects started in 2018
+EOSC-hub - Integrating and managing services for the European Open Science Cloud (January 2018 - December 2020)
+OpenAIRE-Advance - OpenAIRE Advancing Open Scholarship (January 2018 - December 2020)
+PaNOSC - Photon and Neutron Open Science Cloud (December 2018 - November 2022)
+projects started in 2019
+EOSC Enhance - Enhancing the EOSC portal and connecting thematic clouds (December 2019 - November 2021)
+EOSC-Life - Providing an open collaborative space for digital biology in Europe (March 2019 - February 2023)
+EOSC-Nordic (September 2019  - August 2022)
+EOSC-pillar - Coordination and Harmonisation of National Inititiatives, Infrastructures and Data services in Central and Western Europe (July 2019 - June 2022)
+EOSC-synergy - European Open Science Cloud - Expanding Capacities by building Capabilities (September 2019 - February 2022)
+EOSCsecretariat.eu (January 2019 - June 2021)
+ExPaNDS - EOSC Photon and Neutron Data Services (September 2019 - August 2022)
+FAIRsFAIR - Fostering FAIR Data Practices in Europe (March 2019 - February 2022)
+NI4OS-Europe - National Initiatives for Open Science in Europe (September 2019  - August 2022)
+NEANIAS - Novel EOSC services for Emerging Atmosphere, Underwater and Space Challenges (November 2019 - October 2022)
+
+
+== References ==
+
+
+== Further reading ==
+Ekin, Annette (26 November 2018). "Digital 'coffeehouse' to spark new scientific ideas now ready for use". Horizon: the EU Research & Innovation magazine.
+Candela, Leonardo; Castelli, Donatella; Zoppi, Franco (2019). Final EOSC Service Architecture (Technical report). EOSC-pilot.
+Jones, Sarah; Abramatic, Jean-François, eds. (2019). European Open Science Cloud (EOSC) Strategic Implementation Plan (Technical report). European Commission.
+
+
+== External links ==
+EOSC portal
+Official website
+Open Access Tracking Project. European Open Science Cloud
--- a/data/en.wikipedia.org/wiki/Expression_Atlas-0.md
+++ b/data/en.wikipedia.org/wiki/Expression_Atlas-0.md
@ -0,0 +1,37 @@
+---
+title: "Expression Atlas"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Expression_Atlas"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:49:08.798435+00:00"
+instance: "kb-cron"
+---
+
+The Expression Atlas is a database maintained by the European Bioinformatics Institute that  provides information on gene expression patterns from RNA-Seq and Microarray studies, and protein expression from Proteomics studies. The Expression Atlas allows searches by gene, splice variant, protein attribute, disease, treatment or organism part (cell types/tissues). Individual genes or gene sets can be searched for. All datasets in Expression Atlas have its metadata manually curated and its data analysed through standardised analysis pipelines. There are two components to the Expression Atlas, the Baseline Atlas and the Differential Atlas:
+
+
+== Baseline Atlas ==
+The Baseline Atlas provides information about which gene products are present (and at what abundance) under "normal" conditions. This component of the Expression Atlas consists of RNA-seq experiments from ArrayExpress repositories. It aims to answer questions such as:
+
+Which genes are specifically expressed in kidney?
+What is the expression pattern for gene SAA4 in normal tissues?
+
+
+== Differential Atlas ==
+The Differential Atlas allows users to identify genes that are up- or down-regulated in different experimental conditions.
+
+
+== See also ==
+Human Protein Atlas
+The Cancer Genome Atlas
+
+
+== References ==
+
+
+== Further reading ==
+
+
+== External links ==
+"About Expression Atlas". European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI).
--- a/data/en.wikipedia.org/wiki/FAIR_data-0.md
+++ b/data/en.wikipedia.org/wiki/FAIR_data-0.md
@ -0,0 +1,74 @@
+---
+title: "FAIR data"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/FAIR_data"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:49:10.096945+00:00"
+instance: "kb-cron"
+---
+
+FAIR data is data which meets the 2016 FAIR principles of findability, accessibility, interoperability, and reusability (FAIR). 
+The FAIR principles emphasize machine-actionability (i.e., the capacity of computational systems to find, access, interoperate, and reuse data with none or minimal human intervention) because humans increasingly rely on computational support to deal with data as a result of the increase in the volume, complexity, and rate of production of data.
+The abbreviation FAIR/O data is sometimes used to indicate that the dataset or database in question complies with the FAIR principles and also carries an explicit data‑capable open license.
+
+
+== FAIR principles published by GO FAIR ==
+Findable
+The first step in (re)using data is to find them. Metadata and data should be easy to find for both humans and computers. Machine-readable metadata are essential for automatic discovery of datasets and services, so this is an essential component of the FAIRification process.
+F1. (Meta)data are assigned a globally unique and persistent identifier
+F2. Data are described with rich metadata (defined by R1 below)
+F3. Metadata clearly and explicitly include the identifier of the data they describe
+F4. (Meta)data are registered or indexed in a searchable resource
+Accessible
+Once the user finds the required data, they need to know how they can be accessed, possibly including authentication and authorisation.
+A1. (Meta)data are retrievable by their identifier using a standardised communications protocol
+A1.1 The protocol is open, free, and universally implementable
+A1.2 The protocol allows for an authentication and authorisation procedure, where necessary
+A2. Metadata are accessible, even when the data are no longer available
+Interoperable
+The data usually need to be integrated with other data. In addition, the data need to interoperate with applications or workflows for analysis, storage, and processing.
+I1. (Meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation
+I2. (Meta)data use vocabularies that follow FAIR principles
+I3. (Meta)data include qualified references to other (meta)data
+Reusable
+The ultimate goal of FAIR is to optimise the reuse of data. To achieve this, metadata and data should be well-described so that they can be replicated and/or combined in different settings.
+R1. (Meta)data are richly described with a plurality of accurate and relevant attributes
+R1.1. (Meta)data are released with a clear and accessible data usage license
+R1.2. (Meta)data are associated with detailed provenance
+R1.3. (Meta)data meet domain-relevant community standards
+
+The principles refer to three types of entities: data (or any digital object), metadata (information about that digital object), and infrastructure. For instance, principle F4 defines that both metadata and data are registered or indexed in a searchable resource (the infrastructure component).
+
+
+=== Acceptance and implementation ===
+Before FAIR, a 2007 OECD report was the most influential paper discussing similar ideas related to data accessibility. In January 2014, the Lorentz Centre at Leiden University hosted a workshop entitled "Jointly designing a data FAIRPORT" where the participants first formulated the FAIR principles. After further discussions, they were published in the March 2016 issue of Scientific Data.
+At the 2016 G20 Hangzhou summit, the G20 leaders issued a statement endorsing the application of FAIR principles to research. Also in 2016, a group of Australian organisations developed a Statement on FAIR Access to Australia's Research Outputs, which aimed to extend the principles to research outputs more generally. In 2017, Germany, Netherlands and France agreed to establish  an international office to support the FAIR initiative, the GO FAIR International Support and Coordination Office. 
+
+Other international organisations active in the research data ecosystem, such as CODATA or Research Data Alliance (RDA) also support FAIR implementations by their communities. FAIR principles implementation assessment is being explored by FAIR Data Maturity Model Working Group of RDA, CODATA's strategic Decadal Programme "Data for Planet: Making data work for cross-domain challenges" mentions FAIR data principles as a fundamental enabler of data driven science. The Association of European Research Libraries recommends the use of FAIR principles.
+A 2017 paper by advocates of FAIR data reported that awareness of the FAIR concept was increasing among various researchers and institutes, but also, understanding of the concept was becoming confused as different people apply their own differing perspectives to it.
+Guides on implementing FAIR data practices state that the cost of a data management plan in compliance with FAIR data practices should be 5% of the total research budget.
+In 2019 the Global Indigenous Data Alliance (GIDA) released the CARE Principles for Indigenous Data Governance as a complementary guide. The CARE principles extend principles outlined in FAIR data to include Collective benefit, Authority to control, Responsibility, and Ethics to ensure data guidelines address historical contexts and power differentials. The CARE Principles for Indigenous Data Governance were drafted at the International Data Week and Research Data Alliance Plenary co-hosted event, "Indigenous Data Sovereignty Principles for the Governance of Indigenous Data Workshop", held 8 November 2018, in Gaborone, Botswana.
+The lack of information on how to implement the guidelines have led to inconsistent interpretations of them.
+In January 2020, representatives of nine groups of universities around the world produced the Sorbonne declaration on research data rights, which included a commitment to FAIR data, and called on governments to provide support to enable it. In 2021, researchers identified the FAIR principles as a conceptual component of data catalog software tools, with the other components being metadata management, business context and data responsibility roles. In April 2022, Matthias Scheffler and colleagues argued in Nature that FAIR principles are "a must" so that data mining and artificial intelligence can extract useful scientific information from the data. There have been moves in the geosciences to establish FAIR data by use of decimal georeferencing 
+However, making data (and research outcomes) FAIR is a challenging task, and it is challenging to assess the FAIRness. In 2020, the FAIR Data Maturity Model Working Group published a set of guidelines for assessing "FAIRness".
+
+
+== See also ==
+Data management
+Open access
+Open data – datasets and databases carrying an explicit data‑capable open license
+Open science
+Remix culture
+
+
+== References ==
+
+
+== External links ==
+
+FAIR Data and Semantic Publishing, a statement from the lab of the first author of the original paper
+Guide to FAIR Data from Dutch Techcentre for Life Sciences
+GO FAIR initiative website
+FAIR Principles with detailed description of each of the guiding principles by the GO FAIR initiative
+A FAIRy tale explaining the FAIR principles, published by the FAIR project
--- a/data/en.wikipedia.org/wiki/Figshare-0.md
+++ b/data/en.wikipedia.org/wiki/Figshare-0.md
@ -0,0 +1,41 @@
+---
+title: "Figshare"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Figshare"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:49:11.518948+00:00"
+instance: "kb-cron"
+---
+
+Figshare is an online open access repository where researchers can preserve and share their research outputs, including figures, datasets, images, and videos. It is free to upload content and free to access, in adherence to the principle of open data. Figshare is one of a number of portfolio businesses supported by Digital Science, a subsidiary of Springer Nature.
+
+
+== History ==
+Figshare was launched in January 2011 by Mark Hahnel and has been supported by Digital Science since a January 2012 relaunch. Hahnel first developed the platform as a personal custom solution for the organization and publication of diverse research products generated in support of his PhD in stem cell biology. In January 2013, Figshare announced a partnership with PLOS to integrate Figshare data hosting, access, and visualization with their associated PLOS articles. In September 2013, the service launched a hosted institutional repository service. In December 2013, they announced integration with ImpactStory to support the collection of altmetrics.
+Figshare also hosts the Reproducibility Collection as a founding member of The Reproducibility Initiative, which acts as an independent and blinded validator for replication of submitted data.
+Figshare releases 'The State of Open Data' each year to assess the changing academic landscape around open research.
+
+
+== Concept ==
+Researchers can upload all of their research outputs to Figshare, thus making them publicly available. Users can upload files in any format, and items are attributed a DOI. The current 'types' that can be chosen are figures, datasets, media (including video), papers (including pre-prints), posters, code, and filesets (groups of files). All files are released under a Creative Commons license, CC-BY for most files and CC0 (public domain) for datasets. Figshare allows researchers to publish negative data. The withholding of negative publications is a widely known phenomenon that leads to a significant bias, often referred to as the file drawer effect. By encouraging publishing of figures, charts, and data, rather than being limited to the traditional entire 'paper', knowledge can be shared more quickly and effectively. Figshare also tracks the download statistics for hosted materials, acting in turn as a source for altmetrics. The main hosting mechanism for the platform is Amazon S3, with CLOCKSS serving as an additional host for public content. Both of these resources support backup and preservation via a distributed cloud computing network.
+
+
+== Integration with other platforms ==
+Figshare features integration with ORCID, Symplectic Elements, can import items from GitHub, and is a source tracked by Altmetric.com.
+
+
+== See also ==
+Dryad
+List of preprint repositories
+Open Science Framework
+Zenodo
+
+
+== References ==
+
+
+== External links ==
+figshare.com
+List of organisations, universities and other entities using Figshare
+ Media related to Figshare at Wikimedia Commons
--- a/data/en.wikipedia.org/wiki/Galaxy_Zoo-0.md
+++ b/data/en.wikipedia.org/wiki/Galaxy_Zoo-0.md
@ -0,0 +1,19 @@
+---
+title: "Galaxy Zoo"
+chunk: 1/5
+source: "https://en.wikipedia.org/wiki/Galaxy_Zoo"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:49:12.688307+00:00"
+instance: "kb-cron"
+---
+
+Galaxy Zoo is a crowdsourced astronomy project which invites people to assist in the morphological classification of large numbers of galaxies. It is an example of citizen science as it enlists the help of members of the public to help in scientific research.
+There have been 15 versions as of July 2017. Galaxy Zoo is part of the Zooniverse, a group of citizen science projects. An outcome of the project is to better determine the different aspects of objects and to separate them into classifications.
+
+== Origins ==
+
+A key factor leading to the creation of the project was the problem of what has been referred to as data deluge, where research produces vast sets of information to the extent that research teams are not able to analyse and process much of it. Kevin Schawinski, previously an astrophysicist at Oxford University and co-founder of Galaxy Zoo, described the problem that led to Galaxy Zoo's creation when he was set the task of classifying the morphology of more than 900,000 galaxies by eye that had been imaged by the Sloan Digital Sky Survey at the Apache Point Observatory in New Mexico, USA. "I classified 50,000 galaxies myself in a week, it was mind-numbing." Chris Lintott, a co-founder of the project and a professor of astrophysics at the University of Oxford, stated: "In many parts of science, we're not constrained by what data we can get, we're constrained by what we can do with the data we have. Citizen science is a very powerful way of solving that problem."
+The Galaxy Zoo concept was inspired by others such as Stardust@home, where the public was asked by NASA to search images obtained from a mission to a comet for interstellar dust impacts. Unlike earlier internet-based citizen science projects such as SETI@home, which used spare computer processing power to analyse data (also known as distributed or volunteer computing), Stardust@home involved the active participation of human volunteers to complete the research task. In August 2014, the Stardust team reported the discovery of first potential interstellar space particles after citizen scientists had looked through more than a million images.
+In 2007, when Galaxy Zoo first started, the science team hoped that 20–30,000 people would take part in classifying the 900,000 galaxies that made up the sample. It had been estimated that a perfect graduate student working 24 hours a day 7 days a week would take 3–5 years to classify all the galaxies in the sample once. However, in the first Galaxy Zoo, more than 40 million classifications were made in approximately 175 days by more than 100,000 volunteers, providing an average of 38 classifications per galaxy.
+Chris Lintott commented that: "One advantage is that you get to see parts of space that have never been seen before. These images were taken by a robotic telescope and processed automatically, so the odds are that when you log on, that first galaxy you see will be one that no human has seen before." This was confirmed by Kevin Schawinski: "Most of these galaxies have been photographed by a robotic telescope, and then processed by computer. So this is the first time they will have been seen by human eyes.".
--- a/data/en.wikipedia.org/wiki/Galaxy_Zoo-1.md
+++ b/data/en.wikipedia.org/wiki/Galaxy_Zoo-1.md
@ -0,0 +1,20 @@
+---
+title: "Galaxy Zoo"
+chunk: 2/5
+source: "https://en.wikipedia.org/wiki/Galaxy_Zoo"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:49:12.688307+00:00"
+instance: "kb-cron"
+---
+
+== Volunteers ==
+Galaxy Zoo recruited volunteers to help with the largest galaxy census ever carried out. Opening the project to the general public saved the professional astronomers the task of studying all the galaxies themselves, resulting in classification of a large number of galaxies undertaken in a shorter time than what smaller research teams would be able to do, classifying 900,000 galaxies in months rather than years if done by smaller research teams. Computer programs had been unable to reliably classify galaxies: several groups had attempted to develop image-analysis programs. Kevin Schawinski stated: "The human brain is actually much better than a computer at these pattern recognition tasks." However, volunteers astonished the project's organizers by classifying the entire catalog years ahead of schedule. An online forum was later set up two weeks after the initial start, partially due to a large volume of emails being sent around, to the point that it was troublesome for those receiving them to process and respond to them. This led volunteers to point out anomalies that on closer inspection have turned out to be new astronomical objects such as 'Hanny's Voorwerp' and 'the Green Pea galaxies'. "I'm incredibly impressed by what they've managed to achieve," says University of Oxford astronomer Roger Davies, former president of the Royal Astronomical Society."They've made it possible to do things with a huge survey."
+The Galaxy Zoo forum became a hotbed for the discussion of the SDSS images and more general science questions. Its 'global moderator', volunteer communuity manager and UK astronomy enthusiast Alice Sheppard, said of it: "I don't quite know what it is, but Galaxy Zoo does something to people. The contributions, both creative and academic, that people have made to the forum are as stunning as the sight of any spiral, and never fail to move me." Author Michael Nielsen wrote in his book Reinventing Discovery: "But Galaxy Zoo can go beyond computers, because it can also apply human intelligence in the analysis, the kind of intelligence that recognizes that Hanny's Voorwerp or a Pea galaxy is out of the ordinary, and deserves further investigation. Galaxy Zoo is thus a hybrid, able to do deep analyses of large data sets that are impossible in any other way." A community feeling was also created. Roger Davies stated: "The community of Galaxy Zoo gives them the opportunity to participate that they're looking for." This community became known as the 'Zooites'. Aida Berges, a homemaker living in Puerto Rico who has classified hundreds of thousands of galaxies, stated: "Every galaxy has a story to tell. They are beautiful, mysterious, and show how amazing our universe is. It was love at first sight when I started in Galaxy Zoo ... It is a magical place, and it feels like coming home at last." The Galaxy Zoo Forum became a read-only archive in July 2014. After seven years online and over 650,000 posts, it continues to generate science.
+As of July 2017, 60 scientific papers have been published as a direct result of Galaxy Zoo and hundreds of thousands of volunteers. In previous studies though, it was found that data produced by volunteers was more likely to contain bias or mistakes. However Chris Lintott says that crowdsourced results are reliable, as proven by the fact that they are being used and published in peer-reviewed science papers. Indeed, other scientists have questioned crowdsourcing and crowdsourced studies. Steven Bamford, a Galaxy Zoo research scientist, stated: "As a professional researcher you take pride in the work that you do. And the idea that anybody off the street could come and do something better sounds threatening but also implausible." David Anderson, the founder of BOINC, stated: [For many sceptical scientists] "There's this idea that they're giving up control somehow, and that their importance would be diminished". The continuing goodwill of citizen scientists is also questioned. Chris Lintott stated: "Rather than letting anyone pitch for volunteers, we'd like to be a place where people can come and expect a certain level of commitment".
+A conference was held between 10–12 July 2017 at St. Catherine's College, Oxford, to recognise the tenth anniversary of the start of Galaxy Zoo in July 2007. Co-founder Chris Lintott stated: "What started as a small project has been completely transformed by the enthusiasm and efforts of the volunteers... It has had a real impact on our understanding of galaxy evolution." 125 million galaxy classifications resulting in 60 peer reviewed academic papers from at least 15 different projects have been made since July 2007. Discoveries include: Hanny's Voorwerp, Green pea galaxies and more recently objects known as 'Yellow Balls'. On the conference Twitter feed, #GZ10, it states that 10 of the 60 papers have over 100 citations [within the Astrophysics Data System] in 10 years. Karen Masters, an astrophysicist at Portsmouth University and project scientist for GZ stated: "We're genuinely asking for help with something we cannot do ourselves and the results have made a big contribution to the field." As a result of GZ's success, the citizen science web portal Zooniverse was started, which has since hosted a 100 projects.
+
+== Retired projects ==
+
+=== Galaxy Zoo 1 ===
+The original Galaxy Zoo consisted  100,000 galaxies imaged by the Sloan Digital Sky Survey. With so many galaxies, it had been assumed that it would take years for visitors to the site to work through them all, but within 24 hours of launch, the website was receiving almost 70,000 classifications an hour. In the end, more than 50 million classifications were received by the project during its first year, contributed by more than 150,000 people. This was started in July 2007 and retired in 2009.
--- a/data/en.wikipedia.org/wiki/Galaxy_Zoo-2.md
+++ b/data/en.wikipedia.org/wiki/Galaxy_Zoo-2.md
@ -0,0 +1,41 @@
+---
+title: "Galaxy Zoo"
+chunk: 3/5
+source: "https://en.wikipedia.org/wiki/Galaxy_Zoo"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:49:12.688307+00:00"
+instance: "kb-cron"
+---
+
+=== Galaxy Zoo 2 ===
+This consisted of some 250,000 of the brightest galaxies from the Sloan Digital Sky Survey. Galaxy Zoo 2 allowed for a much more detailed classification, by shape and by the intensity or dimness of the galactic core, and with a special section for oddities like mergers or ring galaxies. The sample also contained fewer optical oddities. The project which collected classifications from February 2009 - April 2010 and closed with some 60 million classifications.
+
+=== Galaxy Zoo mergers ===
+This studied the role of interacting galaxies. Interacting galaxies are galaxies that exhibit a gravitational influence on one another. This influence is exhibited over the course of millions or even billions of years as two or more galaxies pass nearby one another. The near passage of two massive structures can cause the galaxies to be distorted and possibly merge. The Galaxy Zoo Mergers aimed to provide a set of tools that allowed users to randomly sample various sets of simulation parameters in rapid succession by showing 8 simulation outputs at a time. This started in November 2009 and was retired in June 2012.
+
+=== Galaxy Zoo supernovae ===
+Galaxy Zoo used images partner from the Palomar Transient Factory to find Supernovae. The task in this Galaxy Zoo project was to help catch exploding stars – supernovae. Data for the site was provided by an automatic survey in California at the Palomar Observatory. Astronomers followed up on the best candidates at telescopes around the world. This started in August 2009 and was retired in August 2012.
+
+=== Galaxy Zoo Hubble ===
+The site's third incarnation, Galaxy Zoo Hubble, drew from surveys conducted by the Hubble Space Telescope to view earlier epochs of galaxy formation. In these surveys, which involve many days of dedicated observing time, we can see light from galaxies which has taken billions of years to reach us. The idea behind Galaxy Zoo Hubble was to be able to compare galaxies then to galaxies now, giving us a clear understanding of what factors influence their growth, whether through mergers, active black holes or simply star formation. This started in April 2010 and was retired in September 2012.
+In October 2016, a study titled: "Galaxy Zoo: Morphological Classifications for 120,000 Galaxies in HST Legacy Imaging" was accepted for publication by the journal Monthly Notices of the Royal Astronomical Society. The abstract begins: "We present the data release paper for the Galaxy Zoo: Hubble project. This is the third phase in a large effort to measure reliable, detailed morphologies of galaxies by using crowdsourced visual classifications of colour composite images. Images in Galaxy Zoo Hubble were selected from various publicly-released Hubble Space Telescope Legacy programs conducted with the Advanced Camera for Surveys, with filters that probe the rest- frame optical emission from galaxies out to z ≈1."
+
+=== Galaxy Zoo 4 ===
+Galaxy Zoo 4 combined new imaging from the Sloan Digital Sky Survey with the most distant images yet from the Hubble Space Telescope CANDELS survey. The CANDELS survey makes use of the new Wide Field Camera 3 to take ultra-deep images of the universe. The project also includes images taken with the United Kingdom Infrared Telescope in Hawaii, for the recently completed UKIDSS project. UKIDSS is the largest, deepest survey of the sky at infrared wavelengths. Kevin Schawinski explained that: "The two sources of data work together perfectly: the new images from Sloan give us our most detailed view of the local universe, while the CANDELS survey from the Hubble telescope allows us to look deeper into the universe's past than ever before."
+In October 2016, a paper was accepted for publishing in MNRAS titled: "Galaxy Zoo: Quantitative Visual Morphological Classifications for 48,000 galaxies from CANDELS". Quoting: "We present quantified visual morphologies of approximately 48,000 galaxies observed in three Hubble Space Telescope legacy fields by the Cosmic And Near-infrared Deep Extragalactic Legacy Survey (CANDELS) and classified by participants in the Galaxy Zoo project. 90% of galaxies have z < 3 and are observed in rest-frame optical wavelengths by CANDELS. Each galaxy received an average of 40 independent classifications, which we combine into detailed morphological information on galaxy features such as clumpiness, bar instabilities, spiral structure, and merger and tidal signatures."
+
+=== Radio Galaxy Zoo ===
+
+On 17 December 2013, Galaxy Zoo opened a project called Radio Galaxy Zoo. It uses observations from the Australia Telescope Large Area Survey in Radio, and compares them to the Spitzer Space Telescope's infrared data. There are about 6000 images to look through. The CSIRO press release states that Radio Galaxy Zoo is a new citizen science project that lets anyone become a cosmic explorer. It continues that by matching galaxy images with radio images from CSIRO's Australia Telescope, a participant can work out if a galaxy has a supermassive black hole.
+
+=== Other projects ===
+Another project that uses data from volunteer classifications is Galaxy Zoo Quench, which studies the interactions between galaxies and the effect it has on starbursts (among others).
+
+== Active projects ==
+
+=== Galaxy Zoo James Webb Space Telescope ===
+The current incarnation of Galaxy Zoo uses volunteers to classify hundreds of thousands of images taken by James Webb Space Telescope's COSMOS-Web survey, which is a large extragalactic survey imaging galaxies in the COSMOS field at extremely early cosmic times. An earlier iteration of Galaxy Zoo using data from JWST's Cosmic Evolution Early Research (CEERS) survey was used as a pilot for this project, and helped to establish the existence of stable disc galaxies at redshift 7.4, or just 700 million years after the Big Bang.
+
+=== Complete list of projects ===
+As of March 2026, the full list of Galaxy Zoo projects is: Galaxy Zoo 1, Galaxy Zoo 2, Galaxy Zoo Mergers, Galaxy Zoo Supernovae, Galaxy Zoo Hubble, Galaxy Zoo CANDELS, Radio Galaxy Zoo, Galaxy Zoo Quench, Galaxy Zoo DECALS 1, Galaxy Zoo DECALS2 + SDSS, Illustris, UKIDSS, Galaxy Zoo Bar Lengths, FERENGI, GAMA, Cosmic Dawn, Euclid, CEERS, and JWST.
--- a/data/en.wikipedia.org/wiki/Galaxy_Zoo-3.md
+++ b/data/en.wikipedia.org/wiki/Galaxy_Zoo-3.md
@ -0,0 +1,24 @@
+---
+title: "Galaxy Zoo"
+chunk: 4/5
+source: "https://en.wikipedia.org/wiki/Galaxy_Zoo"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:49:12.688307+00:00"
+instance: "kb-cron"
+---
+
+=== Related ===
+In June 2019, citizen scientists through Galaxy Zoo reported that the usual Hubble classification, particularly concerning spiral galaxies, may not be supported, and may need updating.
+
+== Rotation of galaxies ==
+
+One of the original aims for Galaxy Zoo was to explore which way galaxies rotated. Cosmologist Kate Land stated: "Some people have argued that galaxies are rotating all in agreement with each other, not randomly as we'd expect. We want people to classify the galaxies according to which way they're rotating and I'll be able to go and see if there's anything bizarre going on. If there are any patterns that we're not expecting, it could really turn up some surprises." In Galaxy Zoo 1, volunteers were asked to judge from the SDSS images whether the galaxies were elliptical or spiral and, if spiral, whether they were rotating in a clockwise (Z-wise) or anti-clockwise (S-wise) direction. The rotation, also called the chirality, of galaxies has been examined in several Galaxy Zoo related papers.
+Among the results a psychological bias was demonstrated. Galaxy Zoo scientists wanted to determine whether spiral galaxies were evenly distributed, or whether an intrinsic property of the universe caused them to rotate one way or the other. When the Science team came to analyse the results, they found an excess of anticlockwise-spinning spiral galaxies. But when the team asked volunteers to classify the same images which had then been reversed, there was still an excess of anticlockwise classifications, delegating that the human brain has real difficulty discerning between something rotating clockwise or anticlockwise. Having measured this effect, the team could adjust for it, and established that spirals near each other tended to rotate in the same direction.
+
+== Blue ellipticals and red spirals ==
+Mainstream astronomical theory before Galaxy Zoo held that elliptical (or 'early type') galaxies were red in color and spiral (or 'late type') galaxies were blue in color: several papers published as a result of Galaxy Zoo have proved otherwise. A population of blue ellipticals was found. These are galaxies which have changed their shape from spiral to oval, but still have young stars in them. Indeed, Galaxy Zoo came about through Schawinski's searching for blue elliptical galaxies, as near the end of 2006, he had spent most of his waking hours trying to find these rare galaxies. Blueness in galaxies means that new stars are forming. However ellipticals are almost always red, indicating that they are full of old and dead stars. Thus, blue ellipticals are paradoxical, but give clues to star-formation in different types of galaxies.
+Also, a population of red spirals was found. These have a different evolutionary path from normal spiral galaxies, showing red spiral galaxies can stop making new stars without changing their shape. Using Galaxy Zoo data for their sample, Tojeiro et al. 2013 found (pg.5): 13,959 red ellipticals, 381 blue ellipticals, 5,139 blue late-type spirals, 294 red late-type spirals, 1,144 blue early-type spirals, and 1,265 red early-type spirals. Chris Lintott stated: "These red spiral galaxies had been lurking in the data and no-one had spotted them. They were staring us in the face. Now we know that a third of spirals around the edges of some clusters of galaxies are red." He also stated: "These results are possible thanks to a major scientific contribution from our many volunteer armchair astronomers. No group of professionals could have classified this many galaxies alone." A team using the Hubble Space Telescope has independently verified the existence of red spirals. Meghan Gray stated: "Our two projects have approached the problem from very different directions. It is gratifying to see that we each provide independent pieces of the puzzle pointing to the same conclusion."
+It is thought that Red Spirals are galaxies in the process of transition from young to old. They are more massive than blue spirals and are found on the outskirts of large clusters of galaxies. Chris Lintott stated: "We think what we're seeing is galaxies that have been gently strangled, so to speak, where somehow the gas supply for star formation has been cut off, but that they've been strangled so gently that the arms are still there." The cause might be the Red Spiral's gentle interaction with a galaxy cluster. He further explained: "The kind of thing we're imagining [is that] as the galaxy moves into a denser environment, there's lot of gas in clusters as well as galaxies, and it's possible the gas from the galaxy just gets stripped off by the denser medium it's plowing into."
+
+== Dust in galaxies ==
--- a/data/en.wikipedia.org/wiki/Galaxy_Zoo-4.md
+++ b/data/en.wikipedia.org/wiki/Galaxy_Zoo-4.md
@ -0,0 +1,50 @@
+---
+title: "Galaxy Zoo"
+chunk: 5/5
+source: "https://en.wikipedia.org/wiki/Galaxy_Zoo"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:49:12.688307+00:00"
+instance: "kb-cron"
+---
+
+The properties of Galactic Dust have been examined in several Galaxy Zoo papers. The interstellar medium of spiral galaxies is filled by gas and small solid particles called dust grains. Despite constituting only a minor fraction of the galactic mass (between 0.1% and 0.01% for the Milky Way), dust grains have a major role in shaping the appearance of a galaxy. Because of their dimension (typically smaller than a few tenths of a micron), they are very effective in absorbing and scattering the radiation emitted by stars in the ultraviolet, optical and near-infrared. Although the interstellar regions are more devoid of matter than any vacuum artificially created on earth, there is matter in space. These regions have very low densities and consist mainly of gas (99%) and dust. In total, approximately 15% of the visible matter in the Milky Way is composed of interstellar gas and dust.
+The study of dust in galaxies is interesting for many reasons.  For example, the dimming effects of dust need to be corrected for to estimate the total mass of a galaxy from measurements of its light. Standard candles used to measure the expansion history of the Universe also need to be corrected for dust extinction.
+A catalogue of 1,990 overlapping galaxies was published in 2013, which had been collected by volunteers on the Galaxy Zoo forum using SDSS images. The abstract states: 'Analysis of galaxies with overlapping images offers a direct way to probe the distribution of dust extinction and its effects on the background light.' This catalogue was also used in a study of ultraviolet attenuation laws.
+
+== Galactic bars and bulges ==
+
+Some spiral galaxies have central bar-shaped structures composed of stars. These galaxies are called 'barred spirals' and have been investigated by Galaxy Zoo in several studies. It is unclear why some spiral galaxies have bars and some do not. Galaxy Zoo research has shown that red spirals are about twice as likely to host bars as blue spirals. These colours are significant. Blue galaxies get their hue from the hot young stars they contain, implying that they are forming stars in large numbers. In red galaxies, this star formation has stopped, leaving behind the cooler, long-lived stars that give them their red colour. Karen Masters, a scientist involved in the studies, stated: "For some time data have hinted that spirals with more old stars are more likely to have bars, but with such a large number of bar classifications we're much more confident about our results. It's not yet clear whether the bars are some side effect of an external process that turns spiral galaxies red, or if they alone can cause this transformation."
+Spiral galaxies usually have 'bulges' at their centers. These bulges are huge, tightly packed groups of stars. However, using Galaxy Zoo volunteer classifications, it has been found that some spiral galaxies do not have bulges. Many galactic bulges are thought to host a supermassive black hole at their centers: however pure disk galaxies with no bulges but with growing central black holes were found. That pure disk galaxies and their central black holes may be consistent with a relation derived from elliptical and bulge-dominated galaxies with very different formation histories implies the details of stellar galaxy evolution and dynamics may not be fundamental to the co-evolution of galaxies and black holes. It seems that these bulgeless galaxies have formed in environments isolated from other galaxies. It is hypothesised that the black hole mass may be more tightly tied to the overall gravitational potential of a galaxy and therefore its dark matter halo,  rather than to the dynamical bulge component.
+In September 2014, a paper titled: "Galaxy Zoo: CANDELS Barred Disks and Bar Fractions" was accepted for publication by the MNRAS. This was the first set of results from the Hubble Space Telescope CANDELS survey that was part of Galaxy Zoo 4. The study reports "the discovery of strong barred structures in massive disk galaxies at z ≈1.5 in deep rest-frame optical images from CANDELS". From within a sample of 876 disk galaxies identified by visual classification in Galaxy Zoo 4, 123 barred galaxies are examined. It is found that the bar fraction across the redshift range 0.5 < z < 2 does not significantly evolve.
+
+== Galaxy mergers and interactions ==
+
+(See also under Retired projects above.)
+Galaxy Zoo Mergers was a Galaxy Zoo project started in November 2009 and retired in June 2012. There have also been a number of studies on galaxy mergers, among which was a survey of ≈3000, which presented "the largest, most homogeneous catalogue of merging galaxies in the nearby universe". This catalogue was spread over two papers and was a result of volunteers selecting likely candidates from Galaxy Zoo 1 and posting them on the Galaxy Zoo forum. Other papers that have used Galaxy Zoo data resulted in observations that include those taken by the Chandra X-ray Observatory.
+
+== Literature ==
+Lintott, Chris: The Crowd and the Cosmos: Adventures in the Zooniverse. Oxford University Press 2020. ISBN 978-0-19-884222-4
+
+== See also ==
+
+Amateur astronomy – Hobby of watching the sky and stars
+Blueberry galaxy -  Small and very active galaxies.
+Gems of the Galaxy Zoos – Astronomy project
+List of astronomy websites
+List of citizen science projects
+Participatory monitoring – Collection of measurements undertaken by citizens
+Wisdom of the crowd – Collective perception of a group of people
+Virtual volunteering – Online volunteering
+Zooniverse projects:
+
+The Daily Minor Planet -  The Catalina Sky Survey's NASA-funded citizen science project
+Asteroid Zoo – Citizen science project
+Backyard Worlds – NASA-funded citizen science project
+Disk Detective – NASA-citizen science project
+Old Weather – Citizen science project transcribing historical weather observations recorded at sea
+Planet Hunters – Citizen science project to find exoplanets
+SETILive
+The Milky Way Project
+
+== References ==
--- a/data/en.wikipedia.org/wiki/Global_Urban_Evolution_Project-0.md
+++ b/data/en.wikipedia.org/wiki/Global_Urban_Evolution_Project-0.md
@ -0,0 +1,33 @@
+---
+title: "Global Urban Evolution Project"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Global_Urban_Evolution_Project"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:49:13.929625+00:00"
+instance: "kb-cron"
+---
+
+The Global Urban Evolution Project is an international collaborative project  which was started by Marc T. J. Johnson at the Centre for Urban Environments of the University of Toronto Mississauga (UTM). It includes  partners from at least 5 continents, 26 countries, and 160 cities.  As a field study of evolution, and as a global study of the effects of urbanization on evolution, its scale is unprecedented.   It has been described as "the best replicated test of parallel evolution, on the largest scale ever attempted".
+The project uses white clover as a model organism for studying global urbanization and urban evolution. White clover was chosen because it already grew in most cities worldwide.  It examines the plant's production of hydrogen cyanide (HCN) in urban and more rural environments ("urban-rural clines").  Hydrogen cyanide deters herbivores and increases clover's tolerance for water stress.
+ 
+The project has demonstrated that urban environments are altering the ways in which plants evolve locally, and that similar changes are occurring globally, a demonstration of parallel evolution. It enables researchers to better understand  the nature of urban environments, the adaptive capacity of species, and their ability to deal with rapid global environmental changes.
+
+
+== Project history ==
+In 2018, lead scientist Marc Johnson announced the project by tweeting "We are seeking collaborators to participate in the Global Urban Evolution (GLUE) project, a global study that seeks to understand whether urbanization drives parallel evolution in cities around the world." Co-leaders of the project were Rob Ness and PhD student James Santangelo, all three at University of Toronto Mississauga.
+The resulting project has involved 287 scientists and over 550 people at various academic levels worldwide.
+It is an example of inclusive science with a team including equal numbers of women and men from around the world.  Sonja Knapp categorizes it as an experimental network and a global experiment with a shared methodology.
+
+
+== Results ==
+The project has collected over 110,000 clover samples  and sequenced over 2,500 clover genomes, creating a huge dataset for the study of the species around the globe.
+Analyzing urban-rural clines, scientists found that cyanide production tended to increase with distance from the center of cities, suggesting that clover populations were adapting to factors commonly found in urban centers worldwide. Possible factors could include temperature (freezing is related to cyanide content), herbivory pressures, and drought stress. The research suggests that the downtowns of cities such as Boston may more closely resemble far-flung cities such as Beijing as clover habitats than they resemble rural areas located nearby.
+
+
+== References ==
+
+
+== External links ==
+Global Urban Evolution Project
+Global Urban Evolution Project on GitHub
--- a/data/en.wikipedia.org/wiki/HARKing-0.md
+++ b/data/en.wikipedia.org/wiki/HARKing-0.md
@ -4,7 +4,7 @@ chunk: 1/1
 source: "https://en.wikipedia.org/wiki/HARKing"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T03:43:51.697921+00:00"
+date_saved: "2026-05-05T03:49:15.371909+00:00"
 instance: "kb-cron"
 ---