Scrape wikipedia-science: 92 new, 848 updated, 968 total (kb-cron)

2026-05-04 20:49:48 -07:00 · 2026-05-04 20:49:48 -07:00 · f1f480b165
commit f1f480b165
parent 2c1b4c161f
44 changed files with 1242 additions and 5 deletions
--- a/_index.db
+++ b/_index.db
--- a/data/en.wikipedia.org/wiki/Open-notebook_science-0.md
+++ b/data/en.wikipedia.org/wiki/Open-notebook_science-0.md
@ -4,7 +4,7 @@ chunk: 1/2
 source: "https://en.wikipedia.org/wiki/Open-notebook_science"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T03:44:33.661868+00:00"
+date_saved: "2026-05-05T03:49:46.721658+00:00"
 instance: "kb-cron"
 ---

--- a/data/en.wikipedia.org/wiki/Open-notebook_science-1.md
+++ b/data/en.wikipedia.org/wiki/Open-notebook_science-1.md
@ -4,7 +4,7 @@ chunk: 2/2
 source: "https://en.wikipedia.org/wiki/Open-notebook_science"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T03:44:33.661868+00:00"
+date_saved: "2026-05-05T03:49:46.721658+00:00"
 instance: "kb-cron"
 ---

--- a/data/en.wikipedia.org/wiki/OpenBCI-0.md
+++ b/data/en.wikipedia.org/wiki/OpenBCI-0.md
@ -0,0 +1,58 @@
+---
+title: "OpenBCI"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/OpenBCI"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:49:47.934225+00:00"
+instance: "kb-cron"
+---
+
+OpenBCI is an open-source brain–computer interface platform created by Joel Murphy and Conor Russomanno, after a successful Kickstarter campaign in late 2013. The company's headquarters is based in Brooklyn, NY. 
+OpenBCI boards are low-cost biometric amplifiers  used to measure and record electrical activity produced by the brain (EEG), muscles (EMG), and heart (EKG). The boards are compatible with standard EEG electrodes. They can be used with the open-source OpenBCI GUI software, or they can be integrated with other open-source EEG signal processing tools. OpenBCI boards have been scientifically validated in numerous research studies.
+
+
+== Hardware ==
+The OpenBCI 32-bit board uses the ADS1299, an IC developed by Texas Instruments for biopotential measurements. The OpenBCI uses a microcontroller for on-board processing — the 8bit version (now deprecated) uses an Arduino-compatible ATmega328P IC, while the 32bit board uses a PIC microcontroller — and can write the EEG data to an SD card, or transmit it to software on a computer over a bluetooth link.
+In 2015, OpenBCI announced the Ganglion board with a 2nd Kickstarter campaign. It has 4 input channels for measuring EEG, EMG, and EKG, and is also Bluetooth enabled. Initially it was offered on Kickstarter at $99, before later being listed at $249.99, and subsequently increasing in price to $499.00.
+
+
+== Software ==
+
+OpenBCI has released an open-source application for use with the OpenBCI, written with Processing. Display and processing software written in NodeJS and Python are also available.
+
+
+== 3D Printed Headset ==
+
+Design files for a 3D printed headset for pre-production OpenBCI boards have been released on GitHub. The headset, known as the Ultracortex, holds the electrodes in place, and makes it easy to configure their placement using the 10–20 System. The current iteration of the Ultracortex is called the Ultracortex "Mark IV".
+The headset design files are available for free download from OpenBCI's GitHub account, or the headset can be purchased preprinted from the OpenBCI online store. The headsets are manufactured and produced by Voodoo Manufacturing.
+
+
+== Applications ==
+OpenBCI technology has been utilized in various innovative applications, such as controlling a HexBug robot using steady-state visually evoked potentials SSVEPs (Steady State Visually Evoked Potentials). Locked in graffiti artist Tempt One has used the OpenBCI and the low-cost Eyewriter eye-tracking system to continue to draw after being diagnosed with the degenerative nerve disorder ALS.
+In 2023 OpenBCI's CEO & cofounder, Conor Russomanno, presented the talk A powerful new neurotech tool for augmenting your mind on the TED main stage alongside Christian Bayerlein, a web developer and accessibility activist based in Koblenz, Germany. In the talk, Russomanno and Bayerlein presented the Neurofly project, which combined OpenBCI technology and a Varjo HMD to enable Bayerlein to pilot a drone around the stage completely hands-free. Bayerlein used EEG and EMG signals to control the drone's speed and direction.
+
+
+== Galea ==
+
+In late 2020, the OpenBCI team unveiled a new product: Galea. Galea is a hardware and software platform that merges next-generation biometrics with mixed reality. It is the first device that integrates EEG, EMG, EDA, PPG, and eye-tracking into a single headset. 
+After several years of development and manufacturing, OpenBCI began shipping Galea Beta unit pre-orders in August 2024. The current lead time for a Galea unit is 2 weeks.
+
+
+== Awards & Recognition ==
+OpenBCI has been the recipient of several technology and innovation awards, including 2 Consumer Electronic Show (CES) 2023 Innovation Awards. Their latest product, Galea, was awarded in both the Virtual & Augmented Reality, and Wearable Technologies categories. Galea has also previously won a Unity Aerospace & Defense award and an AWE Auggie Award for Best Interaction Product
+After the successful 2023 TED talk was met with standing ovation,
+NPR featured OpenBCI and Galea in a TED Radio Hour episode called Brain Hacks: The beginning of mind-reading technology? No, it's not science fiction
+OpenBCI has also had several other notable media features including in the episode The tech harnessing the power of thought of CNN's Decoded series.
+They were also featured in a Lenovo customer success video called Neural interfaces: The future of brain-computer interaction.
+
+
+== See also ==
+List of open-source hardware projects
+
+
+== References ==
+
+
+== External links ==
+Official website
--- a/data/en.wikipedia.org/wiki/Open_Science_Award_for_Open_Source_Research_Software-0.md
+++ b/data/en.wikipedia.org/wiki/Open_Science_Award_for_Open_Source_Research_Software-0.md
@ -0,0 +1,35 @@
+---
+title: "Open Science Award for Open Source Research Software"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Open_Science_Award_for_Open_Source_Research_Software"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:49:39.274982+00:00"
+instance: "kb-cron"
+---
+
+The Open Science Award for Open Source Research Software (French: Prix science ouverte du logiciel libre de la recherche) is a French scientific award given since 2022.
+The prize is part of the second National Plan for open science and rewards projects, teams and young researchers engaged in exemplary practices of management, dissemination and reuse of research data. It is awarded by the French Ministry of Higher Education and Research and Space.
+The prize has four categories:
+
+The "Scientific and Technical" (Scientifique et technique) category rewards software that stands out for its scientific quality, methodological rigour and technical excellence.
+The "Community" (Communauté) category highlights projects that have successfully built and nurtured an active community of users and contributors.
+The "Documentation" category recognises software that offers exemplary documentation facilitating use, adoption and contribution to the project.
+The "Jury's Favourite" (Coup de cœur du jury) category rewards an exemplary project combining several of these dimensions.
+In each category, the jury awards a main prize as well as a "rising star" (espoir) award for promising projects, typically started less than five years ago.
+
+
+== Winners ==
+
+
+== Notes and references ==
+
+
+== See also ==
+Open science
+Open Science Award for Research Data (Prix science ouverte des données de la recherche)
+
+
+== External links ==
+Open Science: Research Open Source Software Prizes (Official Website)
+Establishing a national research software award (Open Research Europe article)
--- a/data/en.wikipedia.org/wiki/Open_Science_Infrastructure-0.md
+++ b/data/en.wikipedia.org/wiki/Open_Science_Infrastructure-0.md
@ -0,0 +1,26 @@
+---
+title: "Open Science Infrastructure"
+chunk: 1/8
+source: "https://en.wikipedia.org/wiki/Open_Science_Infrastructure"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:49:40.430585+00:00"
+instance: "kb-cron"
+---
+
+Open Science Infrastructure (or open scholarly infrastructure) is information infrastructure that supports the open sharing of scientific productions such as publications, datasets, metadata or code. In November 2021 the Unesco recommendation on Open Science describes it as "shared research infrastructures that are needed to support open science and serve the needs of different communities".
+Open science infrastructures are a form of scientific infrastructure (also called cyberinfrastructure, e-Science or e-infrastructure) that support the production of open knowledge. Beyond the management of common resources, they are frequently structured as community-led initiatives with a set collective norms and governance regulations, which makes them also a form of knowledge commons. The definition of open science infrastructures usually exclude privately owned scientific infrastructures run by leading commercial publishers. Conversely it may include actors not always characterized as scientific infrastructures that play a critical role in the ecosystem of open science, such as publishing platforms in open access (open scholarly communication service).
+Computing infrastructures and online services have played a key role in the production and diffusion of scientific knowledge since the 1960s. While these early scientific infrastructure were initially envisioned as community initiatives, they could not be openly used due to the lack of interconnectivity and the cost of network connection. The creation of the World Wide Web made it possible to share data and publications on a large scale. The sustainability of online research projects and services became a critical policy issue and entailed the development of major infrastructure in the 2000s.
+The concept of open science infrastructure emerged after 2015 following a scientific policy debate over the expansion of commercial and privately owned infrastructures in numerous research activities and the publication of the Principles for Open Scholarly Infrastructures. Since the 2010s, large ecosystems of interconnected scientific infrastructures have emerged in Europe, South and North America through the development of new open science project and the conversion of legacy infrastructures to open science principles.
+
+== Definitions and terminology ==
+Open science infrastructure is a form of knowledge infrastructure that makes it possible to create, publish and maintain open scientific outputs such as publication, data or software.
+A Unesco recommendation about open science approved in November 2021 defines open science infrastructures as "shared research infrastructures that are needed to support open science and serve the needs of different communities". A SPARC report on European open science infrastructure includes the following activities within the range of open science infrastructures: "We define Open Access & Open Science Infrastructure as sets of services, protocols, standards and software contributing to the research lifecycle – from collaboration and experimentation through data collection and storage, data organization, data analysis and computation, authorship, submission, review and annotation, copyediting, publishing, archiving, citation, discovery and more".
+
+=== Infrastructure ===
+The use of the term "infrastructure" is an explicit reference to the physical infrastructures and networks such as power grids, road networks or telecommunications that made it possible to run complex economic and social system after the industrial revolution: "The term infrastructure has been used since the 1920s to refer collectively to the roads, power grids, telephone systems, bridges, rail lines, and similar public works that are required for an industrial economy to function (...) If infrastructure is required for an industrial economy, then we could say that cyberinfrastructure is required for a knowledge economy". The concept of infrastructure was notably extended in 1996 to forms of computer-mediated knowledge production by Susan Leigh Star and Karen Ruhleder, through an empirical observation of an early form of open science infrastructure, the Worm Community System. This definition has remained influential through the next two decades in science and technology studies and has affected the policy debate over the building of scientific infrastructure since the early 2000s
+Open science infrastructure have specific properties that contrast them with other forms of open science projects or initiatives:
+
+Open science infrastructures are not simply a technical product but embed a set of tools, institutions and social norms. Consequently, infrastructures are not always visible as they can be largely hidden under the routine of normal activities The resilience and tacitness of the infrastructures makes it especially difficult to identify the real contributions and "labour cost" of open science work, as it remains "invisible in the university system". This make it also difficult to allocate funding effectively as critical infrastructure may remain undetected by funding bodies.
+Open science infrastructures are durable and resilient. They are expected to run on a long-term basis and multiple research programs relies on. To some extent, infrastructure are successful when they are forgotten and become an integral part of routine research activities: "Infrastructure at its best is invisible. We tend to only notice it when it fails."
+Open science infrastructures can be shared and used by different actors and communities. It must be sufficiently consistent to remain coordinated and yet it have to welcome a diverse array of local uses: "an infrastructure occurs when the tension between local and global is resolved". Predefined agreement on the scope and the governance of the infrastructure within all stakeholders is a critical step.
--- a/data/en.wikipedia.org/wiki/Open_Science_Infrastructure-1.md
+++ b/data/en.wikipedia.org/wiki/Open_Science_Infrastructure-1.md
@ -0,0 +1,31 @@
+---
+title: "Open Science Infrastructure"
+chunk: 2/8
+source: "https://en.wikipedia.org/wiki/Open_Science_Infrastructure"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:49:40.430585+00:00"
+instance: "kb-cron"
+---
+
+=== Openness and the commons ===
+Open science infrastructures are open, which differentiate them with other scientific and knowledge infrastructure and, more specifically, with subscription-based commercial infrastructures. Openness is both a core value and a directing principle that affect the aims, the governance and the management of the infrastructure. Open science infrastructure face similar issues met by other open institutions such as open data repositories or large scale collaborative project such as Wikipedia: "When we study contemporary knowledge infrastructures we find values of openness often embedded there, but translating the values of openness into the design of infrastructures and the practices of infrastructuring is a complex and contingent process".
+The conceptual definition of open science infrastructures has been largely influenced by the analysis of Elinor Ostrom on the commons and more specifically on the knowledge commons. In accordance with Ostrom, Cameron Neylon understates that open infrastructures are not only characterized by the management of a pool of common resources but also by the elaboration of common governance and norms. The economic theory of the commons make it possible to expand beyond the scope of limited scope of scholar associations toward large scale community-led initiatives: "Ostrom's work (…) provides a template (…) to make the transition from a local club to a community-wide infrastructure." Open science infrastructure tend to favor a non-for profit, publicly funded model with strong involvement from scientific communities, which disassociate them from privately owned closed infrastructures: "open infrastructures are often scholar-led and run by non-profit organisations, making them mission-driven instead of profit-driven." This status aims to ensure the autonomy of the infrastructure and prevent their incorporation into commercial infrastructure. It has wide range implications on the way the organization is managed: "the differences between commercial services and non-profit services permeated almost every aspect of their responses to their environment".
+Open science infrastructures are not only a more specific subset of scientific infrastructures and cyberinfrastructures but may also include actors that would not fall  into this definition. "Open access publication platforms" such as Scielo, OpenEdition or the Open Library of Humanities are considered an integral part of open science infrastructures in the UNESCO definition and in several literature review and policy reports, whereas they were usually considered as a separate entities in the policy debate on cyberinfrastructure and e-infrastructures. In the 2010 report of the European Commission on e-infrastructure, scientific publishing platforms are "not e-Infrastructures but closely related to it".
+Open science infrastructures may also incorporate additional values and ethical principles. Samuel Moore has theorized a form of care-full scholarly commons that does not exist yet but would incorporate latent forms of open science infrastructure and communities: "In addition to sharing resources with other projects, commoning also requires commoners to adopt an outwardly-focused, generous attitude to other commons projects, redirecting their labour away from proprietary." In 2018, Okune et al. introduced a similar concept of "inclusive knowledge infrastructures" that "deliberately allow for multiple forms of participation amongst a diverse set of actors (…) and seek to redress power relations within a given context."
+
+=== Principles for open science infrastructures ===
+In 2015 Principles for Open Scholarly Infrastructure have laid out an influential prescriptive definition of open science infrastructures. Subsequent definitions and terminologies of open science infrastructures have been largely elaborated on this basis. The text has also influenced the definition of open science infrastructure retained by the UNESCO in November 2021.
+The Principles attempt to hybridize the framework of infrastructure studies with the analysis of the commons initiated by Elinor Ostrom. The principles develop a series of recommendations in three critical areas to the success of open infrastructures:
+
+Governance: the governance of the infrastructure should be open and accountable to the scientific communities it aims to serve. Specific measures should ensure that the management of the organization is transparent and diverse.
+Sutainability: the core activities of organization should be covered by recurring funds. Short-term subventions should be limited to short-term projects. While the organization could charge for services, it should not extend to the data that should remain "a community property".
+Insurance: the technical infrastructure and the output of the organization are open. This ensure that the infrastructure can be recreated if necessary (in the jargon of open source, it becomes "forkable").
+The text ends by mentioning several potential consequences of the principles. The authors advocate for a responsible centralization, that embodies a different than the large web commercial platforms like Google and Facebook while still maintaining the important benefit of centralized infrastructures: "we will be able to build accountable and trusted organisations that manage this centralization responsibly". Existing examples of large open infrastructure include ORCID, the Wikimedia Foundation or CERN.
+A more critical reception has focused on the underlying political philosophy of the Principles. While the scientific community is a key part of the governance of open science infrastructure, Samuel Moore underline that it is never precisely defined, which raised potential issues of under-representation of minority groups:
+
+[this] raises questions over who is the community that gets to govern and exclude, and what gives them the right to decide the conditions These questions are especially relevant for understandings of the commons that are all-encompassing or operate on a large scale, which tend to favour more powerful stakeholders, wealthy disciplines and countries in the Global North. Such commons treat subjects in a political vacuum rather than embedded in a particular situation and entangled in a number of different relationships and projects with asymmetrical power structures.
+
+== History ==
+
+=== Early developments (1950–1990) ===
--- a/data/en.wikipedia.org/wiki/Open_Science_Infrastructure-2.md
+++ b/data/en.wikipedia.org/wiki/Open_Science_Infrastructure-2.md
@ -0,0 +1,20 @@
+---
+title: "Open Science Infrastructure"
+chunk: 3/8
+source: "https://en.wikipedia.org/wiki/Open_Science_Infrastructure"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:49:40.430585+00:00"
+instance: "kb-cron"
+---
+
+Scientific projects have been among the earliest use case for digital infrastructure. The theorization of scientific knowledge infrastructure even predates the development of computing technologies. The knowledge network envisioned by Paul Otlet or Vannevar Bush already incorporated numerous features of online scientific infrastructures.
+After the Second World War, the United States faced a "periodical crisis": existing journals could not keep up with the rapidly increasing scientific output. The issue became politically relevant after the successful launch of Sputnik: "The Sputnik crisis turned the librarians' problem of bibliographic control into a national information crisis." The emerging computing technologies were immediately considered as a potential solution to make a larger amount of scientific output readable and searchable. Access to foreign language publication was also a key issue that was expected to be solved by machine translation: in the 1950s, a significant amount of scientific publications were not available in English, especially the one coming from the Soviet bloc.
+Influent members of the National Science Foundation like Joshua Ledeberg advocated for the creation of a "centralized information system", SCITEL that would at first coexist with printed journals and gradually replace them altogether on account of its efficiency. In the plan laid out by Ledeberg to Eugen Garfield in November 1961, the deposit would index as much as 1,000,000 scientific articles per year. Beyond full-text searching, the infrastructure would also ensure the indexation of citation and other metadata, as well as the automated translation of foreign language articles.
+Although it anticipates key features of online scientific platforms, the SCITEL plan was technically irrealistic at the time. The first working prototype on an online retrieval system developed in 1963 by Doug Engelhart and Charles Bourne at the Stanford Research Institute was heavily constrained by memory issues: no more than 10,000 words of a few documents could be indexed.
+
+Instead of a general purpose publishing platform, the early scientific computing infrastructures focused on specific research areas, such as MEDLINE for medicine, NASA/RECON for space engineering or OCLC Worldcat for library search: "most of the earliest online retrieval system provided access to a bibliographic database and the rest used a file containing another sort of information—encyclopedia articles, inventory data, or chemical compounds." This early development of scientific computing affected a large variety of disciplines and communities, including the social sciences: "The 1960s and 1970s saw the establishment of over a dozen services and professional associations to coordinate quantitative data collection". Yet these infrastructures were mostly invisible to researchers, as most of the research was done by professional librarians. Not only were the search operating systems complicated to use, but the search has to be performed very efficiently given the prohibitive cost of long-distance telecommunication. To become technically feasible, scientific infrastructure could never be open and became fundamentally hidden to their end users:
+
+The designers of the first online systems had presumed that searching would be done by end users; that assumption undergirded system design. MEDLINE was intended to be used by medical researchers and clinicians, NASA/RECON was designed for aerospace engineers and scientists. For many reasons, however, most users through the seventies were librarians and trained intermediaries working on behalf of end users. In fact, some professional searchers worried that even allowing eager end users to get at the terminals was a bad idea.
+The development of digital infrastructure for scientific publication was largely undertaken by private companies. In 1963, Eugene Garfield created the Institute for Scientific Information that aimed to transform the projects initially envisioned with Lederberg into a profitable business. The Science Citation Index relied on a computational processing of citation data. It had a massive and lasting influence on the structuration of global scientific publication in the last decades of the 20th century, as its most important metrics, the Journal Impact Factor, "ultimately came to provide the metric tool needed to structure a competitive market among journal. Garfield also successfully launched Current Contents, a periodic compilation of scientific abstracts that acted as a simplified commercial version of the central deposit envisioned within SCITEL. Rather than being replaced by a centralized information system, leading scientific publishers have been able to develop their own information infrastructure that ultimately reinforced their business position. By the end of the 1960s, the dutch publisher Elsevier and the german publisher Springer have started to computarize their internal data, as well as the management of the journal reviews.
+Until the advent of the web, the landscape of scientific infrastructures remained fragmented. Projects, and communities relied on their own unconnected networks at a national or institutional level: "the Internet was nearly invisible in Europe because people there were pursuing a separate set of network protocols". The birthing place of the World Wide Web, the CERN, had its own version of Internet, CERN-Net and also supported its own protocol for e-mail exchange. The European Space Agency used its own iteration of the RECON system also used by NASA engineers (ESRO/RECON). The insulated scientific infrastructures could hardly be connected before the advent of the web. Communication between scientific infrastructures was not only challenging across space, but also across time. Whenever a communication protocol was no longer maintained, the data and knowledge it disseminated was likely to disappear as well: "the relationship between historical research and computing has been durably affected by aborted projects, data loss and unrecoverable formats".
--- a/data/en.wikipedia.org/wiki/Open_Science_Infrastructure-3.md
+++ b/data/en.wikipedia.org/wiki/Open_Science_Infrastructure-3.md
@ -0,0 +1,20 @@
+---
+title: "Open Science Infrastructure"
+chunk: 4/8
+source: "https://en.wikipedia.org/wiki/Open_Science_Infrastructure"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:49:40.430585+00:00"
+instance: "kb-cron"
+---
+
+=== The Web Revolution (1990–1995) ===
+The World Wide Web was originally framed as an open scientific infrastructure. The project was inspired by ENQUIRE, an information management software commissioned to Tim Berners-Lee by the CERN for the specific needs of high energy physics. The structure of ENQUIRE was closer to an internal web of data: it connected "nodes" that "could refer to a person, a software module, etc. and that could be interlined with various relations such as made, include, describes and so forth". While it "facilitated some random linkage between information" Enquire was not able to "facilitate the collaboration that was desired for in the international high-energy physics research community". Like any significant computing scientific infrastructure before the 1990s, the development of ENQUIRE was ultimately impeded by the lack of interoperability and the complexity of managing network communications: "although Enquire provided a way to link documents and databases,  and hypertext provided a common format in which to display them, there was still the problem of getting different computers with different operating systems to communicate with each other".
+Sharing of data and data documentation was a major focus in the initial communication of the World Wide Web when the project was first unveiled in August 1991 : "The WWW project was started to allow high energy physicists to share data, news, and documentation. We are very interested in spreading the web to other areas, and having gateway servers for other data".
+The web rapidly superseded pre-existing online infrastructure, even when they included more advanced computing features. From 1991 to 1994, users of the Worm Community System, a major biology database on worms, switched to the Web and Gopher. While the Web did not include many advanced functions for data retrieval and collaboration, it was easily accessible. Conversely, the Worm Community System could only be browsed on specific terminals shared across scientific institutions: "To take on board the custom-designed, powerful WCS (with its convenient interface) is to suffer inconvenience at the intersection of work habits, computer use, and lab resources (…) The World-Wide Web, on the other hand, can be accessed from a broad variety of terminals and connections, and Internet computer support is readily available at most academic institutions and through relatively inexpensive commercial services."
+The Web and similar protocols developed at the time have had a similar impact on scientific publications. Early forms of open access publishing were not developed by large scale institutional infrastructures but through small initiatives. Universal access, regardless of the operating system, made it possible to maintain and share community-driven electronic journals year before online commercial scientific publishings became viable:
+
+In the late '80s and early '90s, a host of new journal titles launched on listservs and (later) the Web. Journals such as Postmodern Cultures, Surfaces, the Bryn Mawr Classical Review and the Public-Access Computer Systems Review were all managed by scholars and library workers rather than publishing professionals.
+The first open-access repositories were individual or community initiatives as well. In August 1991, Paul Ginsparg created the first inception of the arXiv project at the Los Alamos National Laboratory in answer to recurring storage issue of academic mailboxes on account of the increasing sharing of scientific articles.
+
+=== Building scientific infrastructures for the web (1995-2015) ===
--- a/data/en.wikipedia.org/wiki/Open_Science_Infrastructure-4.md
+++ b/data/en.wikipedia.org/wiki/Open_Science_Infrastructure-4.md
@ -0,0 +1,18 @@
+---
+title: "Open Science Infrastructure"
+chunk: 5/8
+source: "https://en.wikipedia.org/wiki/Open_Science_Infrastructure"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:49:40.430585+00:00"
+instance: "kb-cron"
+---
+
+The development of the World-Wide Web had rendered numerous pre-existing scientific infrastructure obsolete. It also lifted numerous restrictions and obstacles to online contribution and network management that made it possible to attempt more ambitious project. By the end of the 1990s, the creation of public scientific computing infrastructure became a major policy issue. The first wave of web-based scientific projects in the 1990s and the early 2000s revealed critical issues of sustainability. As funding was allocated on a specific time period, critical databases, online tools or publishing platforms could hardly be maintained; and project managers were faced with a valley of death "between grant funding and ongoing operational funding".
+Several competing terms appeared to fill this need. In the United States, the cyber-infrastructure was used in a scientific context by a US National Science Foundation (NSF) blue-ribbon committee in 2003: "The newer term cyberinfrastructure refers to infrastructure based upon distributed computer, information and communication technology. If infrastructure is required for an industrial economy, then we could say that cyberinfrastructure is required for a knowledge economy." E-infrastructure or e-science were used in a similar meaning in the United Kingdom and European countries.
+Thanks to "sizable investments", major national and international infrastructures have been incepted from the initial policy discussion in the early 2000s to the economic crisis of 2007–2008, such as the Open Science Grid, BioGRID, the JISC, DARIAH or the Project Bamboo. Specialized free software for scientific publishing like Open Journal Systems became available after 2000. This development entailed a significant expansion of non-commercial open access journals by facilitating the creation and the administration of journal website and the digital conversion of existing journals. Among the non-commercial journals registered to the Directory of Open Access Journals, the number of annual creation has gone from 100 by the end of the 1990s to 800 around 2010, and not evolved significantly since then.
+By 2010, infrastructure are "no longer in infancy" and yet "they are also not yet fully mature". While the development of the web solved a large range of technical issues regarding network management, building scientific infrastructure remained challenging. Governance, communication across all involved stakeholders, and strategical divergences were major factors of success or failure. One of the first major infrastructure for the humanities and the social science, the Project Bamboo was ultimately unable to achieve its ambitious aims: "From the early planning workshops to the Mellon Foundation's rejection of the project's final proposal attempt, Bamboo was dogged by its reluctance and/or inability to concretely define itself". This lack of clarity was further aggravated by recurring communication missteps between the project initiators and the community it aimed to serve. "The community had spoken and made it clear that continuing to emphasize Service-oriented architecture would alienate the very members of the community Bamboo was intended to benefit most: the scholars themselves". Budgets cuts following the economic crisis of 2007-2008 underlined the fragility of ambitious infrastructure plans relying on a significant recurring funds.
+
+Leading commercial publishers were initially distanced by the unexpected rise of the Web for academic publication: the executive board of Elsevier "had failed to grasp the significance of electronic publishing altogether, and therefore the deadly danger that it posed—the danger, namely, that scientists would be able to manage without the journal". The persistence of high revenues from subscription and the consolidation of the sector made it possible to fund the conversion of the pre-existing online services to the web as well as the digitization of past collections. By the 2010s, leading publishers have been "moving from a content-provision to a data analytics business" and developed or acquired new key infrastructures for the management scientific and pedagogic activities: "Elsevier has acquired and launched products that extend its influence and its ownership of the infrastructure to all stages of the academic knowledge production process". Since it has expanded beyond publishing, the vertical integration of privately owned infrastructures has become extensively integrated to daily research activities.
+
+The privatised control of scholarly infrastructures is especially noticeable in the context of 'vertical integration' that publishers such as Elsevier and SpringerNature are seeking by controlling all aspects of the research life cycle, from submission to publication and beyond. For example, this vertical integration is represented in a number of Elsevier's business acquisitions, such as Mendeley (a reference manager), SSRN (a pre-print repository) and Bepress (a provider of repository and publishing software for universities).
--- a/data/en.wikipedia.org/wiki/Open_Science_Infrastructure-5.md
+++ b/data/en.wikipedia.org/wiki/Open_Science_Infrastructure-5.md
@ -0,0 +1,24 @@
+---
+title: "Open Science Infrastructure"
+chunk: 6/8
+source: "https://en.wikipedia.org/wiki/Open_Science_Infrastructure"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:49:40.430585+00:00"
+instance: "kb-cron"
+---
+
+=== Toward open science infrastructures (2015-…) ===
+The consolidation and expansion of commercial scientific infrastructure had entailed renewed calls to secure "community-controlled infrastructure". The acquisition of the open repositories Digital Commons and SSRN by Elsevier has highlighted the lack of reliability of critical scientific infrastructure for open science. The SPARC report on European Infrastructures underlines that "a number of important infrastructures at risk and as a consequence, the products and services that comprise open infrastructure are increasingly being tempted by buyout offers from large commercial enterprises. This threat affects both not-for-profit open infrastructure as well as closed, and is evidenced by the buyout in recent years of commonly relied on tools and platforms such as SSRN, bepress, Mendeley, and Github."
+In contrast with the consolidation of privately owned infrastructure, the open science movement "has tended to overlook the importance of social structures and systemic constraints in the design of new forms of knowledge infrastructures". It remained mostly focused to the content of scientific research, with little integration of technical tools and few large community initiatives. "Common pool of resources is not governed or managed by the current scholarly commons initiative. There is no dedicated hard infrastructure and though there may be a nascent community, there is no formal membership."
+More precise concepts were needed to embed ethical principles of openness, community-service and autonomous governance in the building of infrastructure and ensure the transformation of small localized scholarly networks into large, "community-wide" structures. In 2013, Cameron Neylon underlined that the lack of common infrastructure was one of the main weakness of the open science ecosystem: "in a world where it can be cheaper to re-do an analysis than to store the data, we need to consider seriously the social, physical, and material infrastructure that might support the sharing of the material outputs of research". Two years later, Neylon, Geoffrey Bilder and Jenifer Lin defined a series of Principles for Open Scholarly Infrastructure that reacted primarily to the discrepancy between the increasing openness of scientific publications or datasets and the closeness of the infrastructure that control their circulation.
+
+Over the past decade, we have made real progress to further ensure the availability of data that supports research claims. This work is far from complete. We believe that data about the research process itself deserves exactly the same level of respect and care. The scholarly community does not own or control most of this information. For example, we could have built or taken on the infrastructure to collect bibliographic data and citations but that task was left to private enterprise.
+Since 2015 these principles have become the most influential definition of Open Science Infrastructures and been endorsed by leading infrastructures such as Crossref, OpenCitations or Data Dryad and has become a common basis for the institutional evaluation of existing open infrastructures. The main focus of the Principles is to build "trustworthy institutions" with significant commitments in terms of governance, financial sustainability and technical efficiency sot that it can be durably relied on by scientific communities.
+By 2021, public services and infrastructures for research have largely endorsed open science as an integral part of their activity and identity: "open science is the dominant discourse to which new online services for research refer." According to the 2021 Roadmap of the European Strategy Forum on Research Infrastructures (ESFRI), major legacy infrastructures in Europe have embraced open science principles. "Most of the Research Infrastructures on the ESFRI Roadmap are at the forefront of Open Science movement and make important contributions to the digital transformation by transforming the whole research process according to the Open Science paradigm." Examples of extensive data sharing programs include the European Social Survey (in social science), ECRIN ERIC (for clinical data) or the Cherenkov Telescope Array (in Astronomy).
+In agreement with the original intent of the Principles, open science infrastructure are "seen as an antidote to the increased market concentration observed in the scholarly communication space." In November 2021, the UNESCO Recommendation for Open Science acknowledged open science infrastructure as one of the four pillar of open science, along with open science knowledge, open engagement of societal actors and open dialog with other knowledge system and called for sustained investment and funding: "open science infrastructures are often the result of community-building efforts, which are crucial for their longterm sustainability and therefore should be not-for-profit and guarantee permanent and unrestricted access to all public to the largest extent possible."
+The development of open scientific infrastructure has become a debated topic regarding the future of online scientific research. In January 2021, a collective of researchers called for a Plan I or Plan Infrastructure in reaction to perceived shortcomings of the international initiative for open science of the cOAlition S, the Plan S. In contrast with the focus of Plan S on scientific publication, Plan I aims to integrate all research outputs on large interoperable infrastructures: "research and scholarship are crucially dependent on an information infrastructure that treats all scholarly output, text, data and code, equally and that is based on open standards and open markets."
+
+== Organization of open infrastructures ==
+Most of the landscape reports on Open Infrastructure have been undertaken in Europe and, to a lesser extent, in Latin America. For Europe, the main sources include the SPARC report from 2020, the OPERAS report on social science and humanities infrastructure, as well as the 2019 report of Katherine Skinner (that also extends to a few North American infrastructures). International studies include European Commission 2010 report on The Role of E-Infrastructure which mostly receive input from Europe, South America and North America.
+These reports underline that important open science infrastructures may be already existing and yet remain invisible to funders and scientific policies: "alternative practices and projects exist inside and outside Europe, but these projects are almost invisible to the eyes of the public authorities".
--- a/data/en.wikipedia.org/wiki/Open_Science_Infrastructure-6.md
+++ b/data/en.wikipedia.org/wiki/Open_Science_Infrastructure-6.md
@ -0,0 +1,35 @@
+---
+title: "Open Science Infrastructure"
+chunk: 7/8
+source: "https://en.wikipedia.org/wiki/Open_Science_Infrastructure"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:49:40.430585+00:00"
+instance: "kb-cron"
+---
+
+=== Type and roles ===
+Open Access repositories are the most frequent form of Open Science Infrastructure with 5,791 repositories in existence in December 2021 according to OpenDOAR
+Yet, there is a significant diversification of the roles and the activities of open science infrastructure, at least among the largest infrastructures. In the survey of European infrastructure conducted by SPARC Europe, 95% of the respondents mention that they provide services in at least three different stages of research production out of six (Creation, Evaluation, Publishing, Hosting, Discovering and Archiving). Aggregation, hosting and indexing are especially central activities, common to most Open Science Infrastructures regardless of their focus.
+Specialization does happen at a higher level. A network analysis identifies "two main clusters of activities":
+
+Publishing-focused infrastructures which are associated with the "publishing and hosting traditional text formats". Among them, "paper submission (41 out of 70) and review (30) were the most commonly reported activities".
+Creation-focused infrastructures which deal preferably with the "processing and storing research outputs, particularly data". Theses actors provide specific services in the field of "data gathering (47 out of 71), and data analysis (40)". Besides, "computation and machine learning (18) and Experimentation (15) were roughly half as common".
+
+=== Standards and technologies ===
+Standardization is a major function of open science infrastructure as they aim to insure that the content they share and support is distributed consistently as well as ease reuse.
+Maintaining open standards is one of the main challenge identified by leading European open infrastructures, as it implies choosing among competing standards in some case, as well as ensuring that the standards are correctly updated and accessibile through APIs or other endpoints. Two third of the respondents have undertaken an evaluation of their technological environment during the past year, to ensure that key components have not become obsolete. As a consequence of this sustained efforts, most open infrastructure complies with the new established standards of open science, such as FAIR data or Plan S.
+Open science infrastructures preferably integrate standards from other open science infrastructures. Among European infrastructures: "The most commonly cited systems – and thus essential infrastructure for many – are ORCID, Crossref, DOAJ, BASE, OpenAIRE, Altmetric, and Datacite, most of which are not-for-profit". Google Scholar is the first mentioned commercial service, while Scopus, the leading proprietary academic search engine developed by Elsevier, is one of least quoted leading service. Open science infrastructure are then part of an emerging "truly interoperable Open Science commons" that hold the premise of "researcher-centric, low-cost, innovative, and interoperable tools for research, superior to the present, largely closed system."
+Infrastructures are frequently dependent on choices made by external stakeholders, especially scientific publishers: they "do not themselves decide on
+the openness of content since they are dependent on the policies of content providers". This affects not only the content but also the "user data policies [that] are set by publishers which limits what can be made available".
+Open Science Infrastructure have strong ties with the open source movement. 82% of the European infrastructures surveyed by SPARC claim to have partially built open source software and 53% have their entire technological infrastructure in open source.
+
+=== Governance ===
+Governance has been self-identified as a potential weakness by the European infrastructure surveyed by SPARC. Less than half of the respondents considering that they are at a "mature" stage in this regard and a "good governance" is quoted as the main challenge. Interaction between the communities they aim to support and the other stakeholders and funders is especially complicated: "One specific challenge identified was the tension between serving the needs of the community of users versus prioritising the needs of clients that provide financial support to the OSI".
+The tension between centralization and diversity largely characterizes Open Science Infrastructure. While historically defined as a "centralized [Open Access] project", Redalyc aims to become a "community-based sustainable infrastructure in Latin America" (Berrecil). The leading European open infrastructures have reported "challenges around ensuring sufficient (and sufficiently diverse) representation" as well as the involvement from some professional communities like researchers and librarians.
+
+=== Audience ===
+Open Science Infrastructure "target and serve a wide range of stakeholders". Researchers remain the primary target, but libraries, teachers and learners are among the expected audience of more than half of the infrastructure surveyed by Sparc Europe.
+A majority of European infrastructures "operate at a global scale", with English being the primary language of 82% of the respondents. These infrastructures are also frequently multilingual and integrate a specific national focus: they "provide access to a range of language content of local and international significance".
+
+Open Science Infrastructures benefit to diverse disciplines and scientific communities. In 2020, 72% of the European infrastructures surveyed by Sparc Europe claim to support all disciplines. The social sciences and the humanities are the most mentioned disciplines, which is partly attributed to the fact that the survey was "distributed widely by the OPERAS network". In 2010, the infrastructures supporting the social sciences and the humanities were much less prevalent and most of the uses cases came from "biosciences, High Energy Physics and other fields of physics, earth and environmental sciences, computer science, astronomy and astrophysics".
--- a/data/en.wikipedia.org/wiki/Open_Science_Infrastructure-7.md
+++ b/data/en.wikipedia.org/wiki/Open_Science_Infrastructure-7.md
@ -0,0 +1,17 @@
+---
+title: "Open Science Infrastructure"
+chunk: 8/8
+source: "https://en.wikipedia.org/wiki/Open_Science_Infrastructure"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:49:40.430585+00:00"
+instance: "kb-cron"
+---
+
+=== Economics ===
+Many Open Science Infrastructure run "at a relatively low cost" as small infrastructures are an important part of the open science ecosystem. In 2020, 21 out of 53 surveyed European infrastructures "report spending less than €50,000". Consequently, more than 75% of surveyed European infrastructures are run by small teams of 5 FTEs or less. The size of the infrastructure and the extent of its funding is far from always proportional to the critical service it offers: "some of the most heavily used services make ends meet with a tiny core team of two to five people." Volunteer contributions are significant as well with is both "a strength and weakness to an OSI's sustainability". The landscape of open science infrastructures is therefore rather close to the ideals of a "decentralised network of small projects" envisioned by theoricians of the scholarly commons. A very large majority of open science infrastructure are non-commercial and collaborations or financial support from the private sector remain very limited.
+Overall, European infrastructures were financially sustainable in 2020 which contrasts with the situation ten years prior: in 2010, European infrastructures had much less visibility: they usually lacked "a long-term perspective" and struggled "with securing the funding for more than 5 years". In 2020, European infrastructures frequently relies on grants from National funds and from the European Commission. Without theses grants, most of theses actors would "could only remain viable for less than a year". Yet, one quarter of surveyed European infrastructures was not supported by any grants and subventions and used either alternative means of incomes or voluntary contributions. As they can be "difficult to define adequately", open science infrastructures can be overlooked by funding bodies, which "contributes to the challenge of securing funding".
+
+== References ==
+
+== Bibliography ==
--- a/data/en.wikipedia.org/wiki/Open_peer_review-0.md
+++ b/data/en.wikipedia.org/wiki/Open_peer_review-0.md
@ -4,7 +4,7 @@ chunk: 1/3
 source: "https://en.wikipedia.org/wiki/Open_peer_review"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T03:42:39.491508+00:00"
+date_saved: "2026-05-05T03:49:36.870107+00:00"
 instance: "kb-cron"
 ---

--- a/data/en.wikipedia.org/wiki/Open_peer_review-1.md
+++ b/data/en.wikipedia.org/wiki/Open_peer_review-1.md
@ -4,7 +4,7 @@ chunk: 2/3
 source: "https://en.wikipedia.org/wiki/Open_peer_review"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T03:42:39.491508+00:00"
+date_saved: "2026-05-05T03:49:36.870107+00:00"
 instance: "kb-cron"
 ---

--- a/data/en.wikipedia.org/wiki/Open_peer_review-2.md
+++ b/data/en.wikipedia.org/wiki/Open_peer_review-2.md
@ -4,7 +4,7 @@ chunk: 3/3
 source: "https://en.wikipedia.org/wiki/Open_peer_review"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T03:42:39.491508+00:00"
+date_saved: "2026-05-05T03:49:36.870107+00:00"
 instance: "kb-cron"
 ---

--- a/data/en.wikipedia.org/wiki/Open_research-0.md
+++ b/data/en.wikipedia.org/wiki/Open_research-0.md
@ -0,0 +1,39 @@
+---
+title: "Open research"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Open_research"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:49:38.077952+00:00"
+instance: "kb-cron"
+---
+
+Open research is research that is openly accessible by others. Those who publish research in this way are often concerned with making research more transparent, more collaborative, more wide-reaching, and more efficient. Open research aims to make both research methods and the resulting data freely available, often via the internet, in order to support reproducibility and, potentially, massively distributed research collaboration. In this regard, it is related to both open source software and citizen science.
+Especially for research that is scientific in nature, open research may be referred to as open science. However, the term can also implicate research done in fields as varied as the social sciences, the humanities, mathematics, engineering and medicine.
+
+
+== Types of open projects ==
+Important distinctions exist between different types of open projects.
+Projects that provide open data but don't offer open collaboration are referred to as "open access" rather than open research. Providing open data is a necessary but not sufficient condition for open research, because although the data may be used by anyone, there is no requirement for subsequent research to take place openly. For example, though there have been many calls for more open collaborative research in drug discovery and the open deposition of large amounts of data, there are very few active, openly collaborative projects in this area.
+Crowdsourcing projects that recruit large numbers of participants to carry out small tasks which are then assembled into a larger project outcome have delivered significant research outcomes, but these projects are distinct from those in which participants are able to influence the overall direction of the research, or in which participants are expected to have creative input into the science behind the project.
+Most open research is conducted within existing research groups. Primary research data are posted which can be added to, or interpreted by, anyone who has the necessary expertise and who can therefore join the collaborative effort. Thus the "end product" of the project (which may still be subject to future expansion or modification) arises from many contributions across multiple research groups, rather than the effort of one group or individual. Open research is therefore distinct from open access in that the output of open research is prone to change with time.
+Unlike open access, true open research must demonstrate live, online collaboration. Project websites that demonstrate this capability have started to become available.
+
+
+== Copyright conventions ==
+Issues with copyright are dealt with by using either standard copyright (where applicable), releasing the content into the Public domain or by releasing the content under licenses such as one of the Creative Commons licenses or one of the GNU General Public Licenses.
+
+
+== Examples ==
+In 2005, several examples arose in the area of the search for new/improved medical treatments of Neglected Diseases.
+Science and engineering research to support the creation of open-source appropriate technology for sustainable development has long used open research principles. Open source research for sustainable development is now becoming formalized with open access for literature reviews, research methods, data, results and summaries for laypeople.
+Wiki-based examples include: Appropedia, Wikiversity, Citizendium, Scholarpedia.
+While first attempts towards opening research were primarily aimed at opening areas such as scientific data, methodologies, software and publications, now increasingly other artifacts of the scientific workflow are also tackled, such as scientific meta-data and funding ideas.
+In 2013, open research became more mainstream with web based platforms such as figshare continuing to grow in terms of users and publicly available outputs.
+The Transparency and Openness Promotion (TOP) Committee met in 2014 to address one key element of the incentive systems: journals' procedures and policies for publication. The committee consisted of disciplinary leaders, journal editors, funding agency representatives, and disciplinary experts largely from the social and behavioral sciences. By developing shared standards for open practices across journals, the committee said it hopes to translate scientific norms and values into concrete actions and change the current incentive structures to drive researchers' behavior toward more openness. The committee said it sought to produce guidelines that (a) focus on the commonalities across disciplines, and that (b) define what aspects of the research process should be made available to the community to evaluate, critique, reuse, and extend. The committee added that the guidelines aim to help improve journal policies in order to help transparency, openness, and reproducibility "become more evident in daily practice and ultimately improve the public trust in science, and science itself."
+
+
+== See also ==
+
+
+== References ==
--- a/data/en.wikipedia.org/wiki/Open_science_monitor-0.md
+++ b/data/en.wikipedia.org/wiki/Open_science_monitor-0.md
@ -0,0 +1,28 @@
+---
+title: "Open science monitor"
+chunk: 1/4
+source: "https://en.wikipedia.org/wiki/Open_science_monitor"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:49:41.653595+00:00"
+instance: "kb-cron"
+---
+
+An open science monitor or open access monitor is a scientific framework that aims to assess the spread of open practices in a scientific context.
+Open science monitors or dashboards are built at different scales: from institutional to national or international. They require an accurate assessment of the total scientific output and a further breakdown between open and closed content. They rely on a variety of data sources and methodologies to achieve this end. Consequently, open science monitors have also become relevant tools for bibliometric analysis.
+While initially conceived to track publications in academic journals, open science monitor have diversify their scopes and indicators. A recent trend has been to map other major outputs of open science research such as datasets, software or clinical trials.
+
+== Definition ==
+Open science monitor are a scientific infrastructure that provide a "good knowledge of the state" of scientific outputs and their "open access rate". They are also a policy tool that aims to better assess the discrepancy between actual practices and long-term objectives: they "can inform future strategies at institutional and national levels, provides guidance for policy development and review, helps to assess the effects of funding mechanisms and is crucial to negotiate transformative agreements with traditional subscription publishers."
+Open Access Monitors are a specific variant of open science monitors, that is focused on open access publications. They aim to track the share of open access among journal articles, but also "books, book chapters, proceedings, and other publication types". In contrast, generic open science monitors have a more expansive scope and will in effect include all forms of scientific outputs and activities: "By definition, open science concerns the entire cycle of the scientific process, not only open access to publications"
+Nearly all the open science monitor have been created at a national scale, as part of a general policy of enhanced visibility of public costs and investments in regards to scientific publications. Major examples include the Baromètre de la science ouverte in France, the Open Access Monitor in Germany, JUULI in Finland, the Open Access Barometer in Denmark, NARCIS and later openaccess.nl in the Netherlands and the Swiss Open Access Monitor. A prototype of open science monitor was also conceived in the United Kingdom in 2017 but "apparently not realized."
+International initiatives include the Australian-based Curtin Open Knowledge Initiative (CUKI), the Open Science Monitor of the European Union and OpenAIRE. Yet, the spread of their data is more limited than national monitors, as they do "not offer evaluation options on an institutional level".
+
+== History ==
+
+=== Context ===
+Open science monitors belong to a global ecosystem of open scientific infrastructures. This ecosystem emerged in the first decades of the 21st century as an alternative to the closed infrastructures built by large scientific publishers and analytic companies.
+After the Second World War, scientific publishing faced a "periodical crisis": funders, institutions and journals could not keep up with the rapidly increasing scientific output. New infrastructure, tools have to be developed also to keep track of scientific investment. Due to the limited success of public initiatives like SCITEL or MEDLINE in the United States, large private organizations filled this need. In 1963, Eugene Garfield created the Institute for Scientific Information that aimed to transform the projects initially envisioned with the Federal administration into a profitable business. The Science Citation Index and, later, the Web of Science had a massive and lasting influence on global scientific publication in the last decades of the 20th century, as its most important metrics, the Journal Impact Factor, "ultimately came to provide the metric tool needed to structure a competitive market among journal. Consequently funders increasingly relied on analytics created by the Science Citation Index and its main competitors to assess the performance of institutions or individual researchers.
+After 1990, leading academic publishers started to diversify their activities beyond publishing and moved "from a content-provision to a data analytics business." By 2019, Elsevier has either acquired or built a large portofolio platforms, tools, databases and indicators covering all aspects and stages of scientific research: "the largest supplier of academic journals is also in charge of evaluating and validating research quality and impact (e.g., Pure, Plum Analytics, Sci Val), identifying academic experts for potential employers (e.g., Expert Lookup5), managing the research networking platforms through which to collaborate (e.g., SSRN, Hivebench, Mendeley), managing the tools through which to find funding (e.g., Plum X, Mendeley, Sci Val), and controlling the platforms through which to analyze and store researchers' data (e.g., Hivebench, Mendeley)." Metrics and indicators are key components of this vertical integration: "Elsevier's further move to offering metrics-based decision making is simultaneously a move to gain further influence in the entirety of the knowledge production process, as well as to further monetize its disproportionate ownership of content." The new market for scientific publication and scientific data has been compared with the business models of social networks, search engines and other forms of platform capitalism While content access is free, it is indirectly paid through data extraction and surveillance.
+
+=== Early developments ===
--- a/data/en.wikipedia.org/wiki/Open_science_monitor-1.md
+++ b/data/en.wikipedia.org/wiki/Open_science_monitor-1.md
@ -0,0 +1,29 @@
+---
+title: "Open science monitor"
+chunk: 2/4
+source: "https://en.wikipedia.org/wiki/Open_science_monitor"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:49:41.653595+00:00"
+instance: "kb-cron"
+---
+
+The first open science monitors were created in the 2000s and the early 2010s. They were usually conceived as a natural outgrowth of new national and international policy in favor of open access and open science. The Berlin Declaration from 2003 especially introduced the concept of a global "transition of scientific publishing toward an open access system" which require "information on publication output and on subscription and publication fees."
+Additionally the diversification of open science publishing into various publication venues (journals, repositories, overlay journals...) and formats (articles, conferences, datasets...) created unprecedented challenges.
+One of the earliest form of open science monitor was the Dutch project NARCIS ("National Academic Research and Collaborations Information System") that started operating in December 2005. NARCIS is primarily a national scientific portal that aims to integrate "all kinds of types of information from scientific institutes in the Netherlands." Yet it also has a special focus on "academic OAI repositories" and publishes global statistics on the rate of open  restricted and embargoed scientific works since 2000.
+By 2013, Finland pioneered the influential Jyväskylä Model through its national portal JUULI. First experimented at the Open Science Centre of the University of Jyväskyl this approach aims "to centralize all aspects of the self-archiving and open access processes lying within the responsibility of the professionals at the university library" in order to ease the process of data collection: "Researchers do as little as possible and, in some cases, nothing at all."
+
+=== From open access to open science ===
+After 2015, the European Union started to implement ambitious programs and goals within its own funding mechanism like Horizon 2020. This created an unprecedented impetus for the development of monitoring tools and methodologies at a supranational scale: "there has also been a general push for increased monitoring, aiming for both increased transparency to enable each country to see what others are doing" By 2018, 81% of the scientific organizations from Science Europe stated that they "planned to develop Open Access monitoring mechanisms in the future"
+In their preparatory work of the Open Science Monitor, Smith et al. underlined that "open science is much more than simply open access, despite the fact that open access tends to dominate discussions at present." Beyond research publications, their approach singled out open research data and a wider range of Communication activities related to open science that included preprints, evaluations, comments and online discussions on social networks.
+In May 2018, the European Commission unveiled its plan for a European Open Science Monitor, through a detailed methodological note. While the core features of the Monitor were in line with previous research, it was also announced that Elsevier would be the leading subcontractor for the creation for the platform, despite the past commitments of the academic publisher against open science, and the metrics would combine the metadata of Scopus and Unpaywall to assess the rate of open access publications. The proposal was met with significant backlash, with nearly 1000 researchers and open science activists signing a formal complaint to European Ombudsman. In an oped to the Guardian, Jon Tennant stated that "it is a cruel irony that Elsevier are to be paid to monitor the very system that they have historically fought against."
+The European Science Monitor has been subsequently reworked in a different direction. As of 2023, the website only include data only up to 2018. In 2022, the European Council clearly states that "data and bibliographic databases used for research assessment should, in principle, be openly accessible and that tools and technical systems should enable transparency".
+The European Open Science Monitor has entailed a significant shift in the objectives and ambitions of similar projects in the member states. In 2018, the French feedback for the Monitor included a detailed plan for the elaboration of open science indicators beyond publications that would prove to have a direct influence over the Barometer of open science
+
+== Sources ==
+Yet, open science monitors have to deal with different sources of scientific data, since currently "no database provides an easy and complete answer". Consequently, "for most monitoring exercises, data from multiple sources will need to be gathered, aggregated, and reconciled"
+The most important sources available for open science monitors include international open science infrastructures, local sources and proprietary platforms. The choice of sources is frequently dictated by policy concerns and technical constraints. The United Kingdom or Germany lack a "pool of data" from local sources and consequently decided to rely significantly on proprietary databases like Dimensions, Wos or Scopus. Conversely, the French Open Science Monitor opted for a "constitutive choice" of open sources.
+
+=== International Infrastructures ===
+Leading open science infrastructures commonly used in open science monitor include, Unpaywall, Crossref and the Directory of Open Access Journals (DOAJ) Crossref is the primary information source of the French Open Science Monitor, as it only considers "publications associated with a Crossref DOI"
+Due to significant developments during the 2010s, international infrastructure have a larger scope of "publications, languages and sources" than proprietary databases. Yet "they offer insufficiently standardized metadata, which complicates their collection and processing" and may lack key information for the creation of the open science monitors, such as author affiliations.
--- a/data/en.wikipedia.org/wiki/Open_science_monitor-2.md
+++ b/data/en.wikipedia.org/wiki/Open_science_monitor-2.md
@ -0,0 +1,37 @@
+---
+title: "Open science monitor"
+chunk: 3/4
+source: "https://en.wikipedia.org/wiki/Open_science_monitor"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:49:41.653595+00:00"
+instance: "kb-cron"
+---
+
+=== Local infrastructures and repositories ===
+Local infrastructures include Current Research Information Systems directly managed by scientific institutions and universities that "help manage, understand, and evaluate research activities". At the institutional level they can bring the most extensive coverage of scientific output, especially taking into account locally published journals that would not necessarily be indexed in global scientific infrastructures. Due to their direct connections with scientific communities, local infrastructures can incentivize researchers to "enter their publications into those systems" and implement a more various range of indicators than what is commonly available in international databases.
+Local infrastructures are managed in a decentralized way, with varying levels of coverage and information depending on the institutions. In some cases, local repository are "fed solely by the large commercial databases" and will not have any added value.
+The integration of diverse local sources of data into a common and standardized schemes is a major challenge for open science monitors. The preexistence of ambitious funding policy considerably ease this process, as institutions will be already encouraged to adopt specific norms and metadata requirements.
+While local infrastructures are generally thoughts as providers of data for an open science monitor, the relationship can go both way. In France of the University of Lorraine implemented its own open science monitor that worked as a local expansion of the French Open Science Monitor.
+
+=== Proprietary databases ===
+Proprietary databases like the Web of Science or Scopus, have long been leading providers of publication metadata and analytics. Yet their integration into open science monitor is not consensual.
+Proprietary databases have long raised issues of data bias, that are especially problematic in the national context of most open science monitors. Their coverage is usually centered on English-speaking publications and neglects resources with a significant local impact. Moreover, reliance on proprietary platforms create long term dependency with added costs and risks of unsustainability: "Commercial providers require licences to access their services, which vary in price and access type"
+The French Open Science Monitor is committed to the exclusive use of "public or open datasources". Conversely the German Open Access Monitor currently relies on Dimensions, Web of Science and Scopus, especially to recover "corresponding author information", even though it "looks out for emerging new data sources, especially open sources" 
+
+== Methodology ==
+Open science monitors generally aim to bring diverse sources of publication metadata and data into a "central interface" that "enables continuous monitoring at a national level and provides a basis for fact-based decisions and actions." Due to "the complexity of the scholarly publishing system", the building of effective open science monitors and  is "no trivial task and involves a multitude of decisions".
+
+=== Data reconciliation ===
+The combination of various bibliometric sources create several challenges. Key metadata can be missing. Entries are also frequently duplicated, as articles are indexed both in local and international databases.
+Persistent identifiers (PIDs) are a critical component of open science monitors. In theory they make it possible to "unambiguously identify publications, authors, and associated research institutions". Publications in scientific journals can be associated with internationally recognized standards such as DOIs (for the actual publications) or ORCID (for authors), managed by leading international infrastructures like Crossref.
+Despite the preexistence of international standards, open sciences monitor usually have to introduce their own standardization schemes and identifiers. Limiting the analysis to theses standards would immediately "rule out a certain number of journals that do not adhere to this very general technology of persistent identifiers". Furthermore, other forms of scientific outputs or scientific activities (like funding) do not have the same level of standardization.
+Even when sources already include persistent identifiers, "some manual standardisation is required", as the original metadata is not always consistent or will not have the same focus. Author affiliation is a crucial information for most of open science monitor, as it makes it possible to discriminate the scientific production of a given country. Yet it will not always be commonly available nor in a systematic manner.
+
+=== Text & data mining ===
+Open science monitor have recently experimented a range of text mining methods to reconstruct missing metadata. Even leading databases can miss key information: on Crossref, institutional affiliations are missing for "75% of the indexed content".
+Since 2022, the French Open Science Monitor has successfully experimented the use of natural language processing methods and models to detect disciplines or institutional affiliations. For discipline classification, this has led to the development of scientific-tagger, a word embedding model based on Fasttext and trained on two annotated databases, PASCAL and FRANCIS.
+In 2022, Chaignon and Egret published a systematic reproduction and assessment of the methology of the Monitor in Quantitative Science Studies. Using a mix of proprietary and open databases, they found nearly the same rate of open access publications for the year 2019 (53% vs. 54%) Overall, the open-source strategy used by the BSO proved to be the most efficient approach in comparison with alternative proprietary sources: "The open-source strategy used by the BSO effectively identifies the vast majority of publications with a persistent identifier (DOI) for open science monitoring." Additionally the BSO makes it possible to provide metadata at a "sufficiently fine level to shed light on the geographical, thematic, linguistic, etc. disparities that affect bibliometric studies"
+Text and data mining methods are especially promising for the indexation of a wider range of open science outputs. Datasets, code, reports or clinical trials have never been systematically cataloged. Since 2022, the national French plan for open science, aims to implement indicators beyond publications and consequently the French Open Science Monitor is working on the data extraction of "references to software and research data" in full text article with experimental deep learning models.
+
+== Uses and impact ==
--- a/data/en.wikipedia.org/wiki/Open_science_monitor-3.md
+++ b/data/en.wikipedia.org/wiki/Open_science_monitor-3.md
@ -0,0 +1,60 @@
+---
+title: "Open science monitor"
+chunk: 4/4
+source: "https://en.wikipedia.org/wiki/Open_science_monitor"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:49:41.653595+00:00"
+instance: "kb-cron"
+---
+
+=== Tracking open science adoption ===
+The French Open Science Monitor was conceived from the start to capture "open access dynamic". This has significant implication in terms of design and data flow as the "OA status of a publication evolves over time" due to embargo policies as well retrospective opening of past content.
+Despite significant differences in regards to methodologies or to data source, Pierre Mounier underlined in 2022 that "we observe the same dynamic" in the open access monitors of "three different European countries": the French, German and Dutch monitor all convege to show that slightly more than 60% of research is published in open access.
+
+=== Economic analysis ===
+Open science monitors aim to facilitate the estimation of scientific publishing costs. Without any aggregation of publication data, "information on expenditure for open access publication fees  and especially for non-open access publication fees is often not available centrally"
+The monitor can also contribute to better assess the economic impact of open science across the entire academic ecosystem. While it is generally assumed that the conversion to open access publishing should not be costlier than the existing system, there can still be significant variations, especially with an APC-based model: institutions with a high volume of publication but limited needs for subscriptions can be in a "worse position financially".
+
+== References ==
+
+== Bibliography ==
+
+=== Books & Thesis ===
+Wouters, P. F. (1999). The citation culture (Thesis). Retrieved 2018-09-09.
+Chen, George; Posada, Alejandro; Chan, Leslie (2019-06-02). "Vertical Integration in Academic Publishing : Implications for Knowledge Inequality". In Pierre Mounier (ed.). Connecting the Knowledge Commons – From Projects to Sustainable Infrastructure : The 22nd International Conference on Electronic Publishing – Revised Selected Papers. Laboratoire d'idées. Marseille: OpenEdition Press. ISBN 979-10-365-3802-5. Retrieved 2022-02-26.
+Moore, Samuel (2019). Common Struggles: Policy-based vs. scholar-led approaches to open access in the humanities (thesis deposit) (Thesis). King's College London. Retrieved 2021-12-11.
+
+=== Reports ===
+Johnson, Rob; Chiarelli (2017-08-15). Defining and Prototyping an Open Access Dashboard (Report). JISC.
+Smith, Elta; Parks, Sarah; Chataway, Joanna (2016). A framework to monitor open science trends in the EU (Report). Rand Europe.
+European Commission. Directorate General for Research and Innovation. (2019). Future of scholarly publishing and scholarly communication: report of the Expert Group to the European Commission (Report). LU: Publications Office. doi:10.2777/836532. Retrieved 2021-12-12.
+Aspesi, Claudio; Allen, Nicole Starr; Crow, Raym; Daugherty, Shawn; Joseph, Heather; McArthur, Joseph; Shockey, Nick (2019-04-03). SPARC Landscape Analysis: The Changing Academic Publishing Industry – Implications for Academic Institutions (Report). LIS Scholarship Archive. Retrieved 2022-01-05.
+Borrego, Ángel (2021). Creació d'un indicador d'accés obert a la producció científica de Catalunya (PDF) (Report). Consorci de Serveis Universitaris de Catalunya.
+Philipp, Tobias; Botz, Georg; Kita, Jean-Claude; Sänger, Astrid; Siegert, Olaf; Reumaux, Mathilde (2021-05-10). Open Access Monitoring: Guidelines and Recommendations for Research Organisations and Funders (Report). Retrieved 2023-10-05.
+
+=== Journal articles ===
+Elbæk, Mikael K. (2014). "Danish Open Access Barometer: mapping Open Access to Danish research and creation of an online prototype for automated open access monitoring". ScieCom Info. 10 (1). ISSN 1652-3202. Retrieved 2023-09-13.
+Smith, Elta; Parks, Sarah; Chataway, Joanna (2016). "A framework to monitor open science trends in the EU". Informing science and innovation policies: Towards the next generation of data and indicators. OECD Blue Sky III Forum.
+Schopfel, Joachim; Prost, Hélène (2019). "The scope of open science monitoring and grey literature". 12th Conference on Grey Literature and Repositories. hdl:20.500.12210/70680.
+Wainwright, Joel; Bervejillo, Guillermo (January 2021). "Leveraging monopoly power up the value chain: Academic publishing in an era of surveillance capitalism". Geoforum. 118: 210–212. doi:10.1016/j.geoforum.2020.04.012. ISSN 0016-7185. S2CID 234328559.
+Bracco, Laetitia (2022-10-03). "Promoting Open Science through bibliometrics: a practical guide to build an open access monitor". LIBER Quarterly: The Journal of the Association of European Research Libraries. 32 (1): 1–18. doi:10.53377/lq.11545. ISSN 2213-056X. Retrieved 2023-09-08.
+Bracco, Laetitia; L'Hôte, Anne; Jeangirard, Eric; Torny, Didier (2022). "Extending the open monitoring of open science". HAL.
+Chaignon, Lauranne; Egret, Daniel (2022-04-12). "Identifying scientific publications countrywide and measuring their open access: The case of the French Open Science Barometer (BSO)". Quantitative Science Studies. 3 (1): 18–36. doi:10.1162/qss_a_00179. ISSN 2641-3337. Retrieved 2023-10-05.
+Jeangirard, Éric (2022). "L'utilisation de l'apprentissage automatique dans le Baromètre de la science ouverte : une façon de réconcilier bibliométrie et science ouverte ?". Arabesques (107): 10–11. doi:10.35562/arabesques.3084. Retrieved 2023-10-05.
+Achenbach, Kelly; Błaszczyńska, Marta; De Paoli, Stefano; Di Donato, Francesca; Dumouchel, Suzanne; Forbes, Paula; Kraker, Peter; Vignoli, Michela (2022). "Defining discovery: Is Google Scholar a discovery platform? An essay on the need for a new approach to scholarly discovery". Open Research Europe. 2: 28. doi:10.12688/openreseurope.14318.2. ISSN 2732-5121. PMC 10445934. PMID 37645282.
+Barbers, Irene; Stanzel, Franziska; Mittermaier, Bernhard (2022-04-03). "Open Access Monitor Germany: Best Practice in Providing Metrics for Analysis and Decision-Making". Serials Review. 48 (1–2): 49–62. doi:10.1080/00987913.2022.2066968. ISSN 0098-7913. S2CID 249260262. Retrieved 2023-09-08.
+
+=== Conferences ===
+Dijk, E.M.S.; Baars, C.; Hogenaar, A.Th.; van Meel, M. (2006). "NARCIS: The Gateway to Dutch Scientific Information. ELPUB 2006". Digital Spectrum: Integrating Technology and Culture. Bansko, Bulgaria: ELPUB. pp. 49–58.
+Jeangirard, Eric (2019-06-07). Monitoring Open Access at a national level: French case study. ELPUB 2019 23d International Conference on Electronic Publishing. arXiv:2104.06844. doi:10.4000/proceedings.elpub.2019.20. Retrieved 2023-09-13.
+Papastefanatos, George; Papadopoulou, Elli; Meimaris, Marios; Lempesis, Antonis; Martziou, Stefania; Manghi, Paolo; Manola, Natalia (2020). "Open Science Observatory: Monitoring Open Science in Europe". In Ladjel Bellatreche; Mária Bieliková; Omar Boussaïd; Barbara Catania; Jérôme Darmont; Elena Demidova; Fabien Duchateau; Mark Hall; et al. (eds.). ADBIS, TPDL and EDA 2020 Common Workshops and Doctoral Consortium. Communications in Computer and Information Science. Vol. 1260. Cham: Springer International Publishing. pp. 341–346. doi:10.1007/978-3-030-55814-7_29. ISBN 978-3-030-55814-7.
+Mounier, Pierre (2022-10-13). "Academic Publishing and Open Science – Where do we stand?". Proceedings of the Paris Open Science European Conference : OSEC 2022. Laboratoire d'idées. Marseille: OpenEdition Press. pp. 69–78. ISBN 979-10-365-4562-7. Retrieved 2023-09-14.
+
+=== Other publications ===
+Olsbo, Pekka (2017). "Measurement of Open Access as an Infrastructural Challenge: The Case of Finland". Expanding Perspectives on Open Science: Communities, Cultures and Diversity in Concepts and Practices. IOS Press. pp. 217–226. doi:10.3233/978-1-61499-769-6-217. Retrieved 2023-11-25.
+Open Science Monitor Methodological Note v2 (Report). European Commission. 2018-04-30. Retrieved 2023-10-09.
+Hameau, Thérèse (2018-08-31). "Feedback on EC Open Science Monitor Methodological note". Ouvrir la Science. Retrieved 2023-10-08.
+Knecht, Sicco de (2018-07-12). "Elsevier is trying to co-opt the open science space, and we shouldn't let them". ScienceGuide. Retrieved 2023-10-11.
+Tennant, Jon (2018-06-29). "Elsevier are corrupting open science in Europe". The Guardian. ISSN 0261-3077. Retrieved 2023-10-08.
+Research assessment and implementation of Open Science (PDF) (Report). Council of the European Union. 2022.
--- a/data/en.wikipedia.org/wiki/Open_scientific_data-0.md
+++ b/data/en.wikipedia.org/wiki/Open_scientific_data-0.md
@ -0,0 +1,30 @@
+---
+title: "Open scientific data"
+chunk: 1/11
+source: "https://en.wikipedia.org/wiki/Open_scientific_data"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:49:42.862927+00:00"
+instance: "kb-cron"
+---
+
+Open scientific data or open research data is a type of open data focused on publishing observations and results of scientific activities available for anyone to analyze and reuse. A major purpose of the drive for open data is to allow the verification of scientific claims, by allowing others to look at the reproducibility of results, and to allow data from many sources to be integrated to give new knowledge.
+The modern concept of scientific data emerged in the second half of the 20th century, with the development of large knowledge infrastructure to compute scientific information and observation. The sharing and distribution of data has been early identified as an important stake but was impeded by the technical limitations of the infrastructure and the lack of common standards for data communication. The World Wide Web was immediately conceived as a universal protocol for the sharing of scientific data, especially coming from high-energy physics.
+
+== Definition ==
+
+=== Scientific data ===
+The concept of open scientific data has developed in parallel with the concept of scientific data.
+Scientific data was not formally defined until the late 20th century. Before the generalization of computational analysis, data has been mostly an informal terms, frequently used interchangeably with knowledge or information. Institutional and epistemological discourses favored alternative concepts and outlooks on scientific activities: "Even histories of science and epistemology comments, mention data only in passing. Other foundational works on the making of meaning in science discuss facts, representations, inscriptions, and publications, with little attention to data per se."
+The first influential policy definition of scientific data appeared as late as 1999, when the National Academies of Science described data as "facts, letters, numbers or symbols that describe an object, condition, situation or other factors". Terminologies have continued to evolve: in 2011, the National Academies updated the definition to include a large variety of dataified objects such as "spectrographic, genomic sequencing, and electron microscopy data; observational data, such as remote sensing, geospatial, and socioeconomic data; and other forms of data either generated or compiled, by humans or machines" as well as "digital representation of literature"
+While the forms and shapes of data remain expansive and unsettled, standard definitions and policies have recently tended to restrict scientific data to computational or digital data. The open data pilot of Horizon 2020 has been voluntarily restricted to digital research: "'Digital research data' is information in digital form (in particular facts or numbers), collected to be examined and used as a basis for reasoning, discussion or calculation; this includes statistics, results of experiments, measurements, observations resulting from fieldwork, survey results, interview recordings and images"
+Overall, the status scientific data remains a flexible point of discussion among individual researchers, communities and policy-makers: "In broader terms, whatever 'data' is of interest to researchers should be treated as 'research data'" Important policy reports, like the 2012 collective synthesis of the National Academies of science on data citation, have intentionally adopted a relative and nominalist definition of data: "we will devote little time to definitional issues (e.g., what are data?), except to acknowledge that data often exist in the eyes of the beholder." For Christine Borgman, the main issue is not to define scientific data ("what are data") but to contextualize the point where data became a focal point of discussion within a discipline, an institution or a national research program ("when are data"). In the 2010s, the expansion of available data sources and the sophistication of data analysis method has expanded the range of disciplines primarily affected by data management issues to "computational social science, digital humanities, social media data, citizen science research projects, and political science."
+
+=== Open scientific data ===
+Opening and sharing have both been major topic of discussion in regard to scientific data management, but also a motivation to make data emerge as a relevant issue within an institution, a discipline or a policy framework.
+For Paul Edwards, whether or not to share the data, to what extent it should be shared and to whom have been major causes of data friction, that revealed the otherwise hidden infrastructures of science: "Edwards' metaphor of data friction describes what happens at the interfaces between data 'surfaces': the points where data move between people, substrates, organizations, or machines (...) Every movement of data across an interface comes at some cost in time, energy, and human attention. Every interface between groups and organizations, as well as between machines, represents a point of resistance where data can be garbled, misinterpreted, or lost. In social systems, data friction consumes energy and produces turbulence and heat – that is, conflicts, disagreements, and inexact, unruly processes." The opening of scientific data is both a data friction in itself and a way to collectively manage data frictions by weakening complex issues of data ownership. Scientific or epistemic cultures have been acknowledged as primary factors in the adoption of open data policies: "data sharing practices would be expected to be community-bound and largely determined by epistemic culture."
+In the 2010s, new concepts have been introduced by scientist and policy-makers to more accurately define what open scientific data. Since its introduction in 2016, FAIR data has become a major focus of open research policies. The acronym describe an ideal-type of Findable, Accessible, Interoperable, and Reusable data. Open scientific data has been categorized as a commons or a public good, which is primarily maintained, enriched and preserved by collective rather than individual action: "What makes collective action useful in understanding scientific data sharing is its focus on how the appropriation of individual gains is determined by adjusting the costs and benefits that accrue with contributions to a common resource"
+
+== History ==
+
+=== Development of knowledge infrastructures (1945-1960) ===
--- a/data/en.wikipedia.org/wiki/Open_scientific_data-1.md
+++ b/data/en.wikipedia.org/wiki/Open_scientific_data-1.md
@ -0,0 +1,20 @@
+---
+title: "Open scientific data"
+chunk: 2/11
+source: "https://en.wikipedia.org/wiki/Open_scientific_data"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:49:42.862927+00:00"
+instance: "kb-cron"
+---
+
+The emergence of scientific data is associated with a semantic shift in the way core scientific concepts like data, information and knowledge are commonly understood. Following the development of computing technologies, data and information are increasingly described as "things": "Like computation, data always have a material aspect. Data are things. They are not just numbers but also numerals, with dimensionality, weight, and texture".
+After the Second World War large scientific projects have increasingly relied on knowledge infrastructure to collect, process and analyze important amount of data. Punch-cards system were first used experimentally on climate data in the 1920s and were applied on a large scale in the following decade: "In one of the first Depression-era government make-work projects, Civil Works Administration workers punched some 2 million ship log observations for the period 1880–1933." By 1960, the meteorological data collections of the US National Weather Records Center has expanded to 400 millions cards and had a global reach. The physically of scientific data was by then fully apparent and threatened the stability of entire buildings: "By 1966 the cards occupied so much space that the Center began to fill its main entrance hall with card storage cabinets (figure 5.4). Officials became seriously concerned that the building might collapse under their weight".
+By the end of the 1960s, knowledge infrastructure have been embedded in a various set of disciplines and communities. The first initiative to create a database of electronic bibliography of open access data was the Educational Resources Information Center (ERIC) in 1966. In the same year, MEDLINE was created – a free access online database managed by the National Library of Medicine and the National Institute of Health (USA) with bibliographical citations from journals in the biomedical area, which later would be called PubMed, currently with over 14 million complete articles. Knowledge infrastructures were also set up in space engineering (with NASA/RECON), library search (with OCLC Worldcat) or the social sciences: "The 1960s and 1970s saw the establishment of over a dozen services and professional associations to coordinate quantitative data collection".
+
+=== Opening and sharing data: early attempts (1960-1990) ===
+Early discourses and policy frameworks on open scientific data emerged immediately in the wake of the creation of the first large knowledge infrastructure. The World Data Center system (now the World Data System), aimed to make observation data more readily available in preparation for the International Geophysical Year of 1957–1958.  The International Council of Scientific Unions (now the International Council for Science) established several World Data Centers to minimize the risk of data loss and to maximize data accessibility, further recommending in 1955 that data be made available in machine-readable form. In 1966, the International Council for Science created CODATA, an initiative to "promote cooperation in data management and use".
+These early forms of open scientific data did not develop much further. There were too many data frictions and technical resistance to the integration of external data to implement a durable ecosystem of data sharing. Data infrastructures were mostly invisible to researchers, as most of the research was done by professional librarians. Not only were the search operating systems complicated to use, but the search has to be performed very efficiently given the prohibitive cost of long-distance telecommunication. While their conceptors have originally anticipated direct uses by researcher, that could not really emerge due to technical and economic impediment:
+
+The designers of the first online systems had presumed that searching would be done by end users; that assumption undergirded system design. MEDLINE was intended to be used by medical researchers and clinicians, NASA/RECON was designed for aerospace engineers and scientists. For many reasons, however, most users through the seventies were librarians and trained intermediaries working on behalf of end users. In fact, some professional searchers worried that even allowing eager end users to get at the terminals was a bad idea.
+Christine Borgman does not recall any significant policy debates over the meaning, the production and the circulation of scientific data save for a few specific fields (like climatology) after 1966. The insulated scientific infrastructures could hardly be connected before the advent of the web. Projects, and communities relied on their own unconnected networks at a national or institutional level: "the Internet was nearly invisible in Europe because people there were pursuing a separate set of network protocols". Communication between scientific infrastructures was not only challenging across space, but also across time. Whenever a communication protocol was no longer maintained, the data and knowledge it disseminated was likely to disappear as well: "the relationship between historical research and computing has been durably affected by aborted projects, data loss and unrecoverable formats".
--- a/data/en.wikipedia.org/wiki/Open_scientific_data-10.md
+++ b/data/en.wikipedia.org/wiki/Open_scientific_data-10.md
@ -0,0 +1,31 @@
+---
+title: "Open scientific data"
+chunk: 11/11
+source: "https://en.wikipedia.org/wiki/Open_scientific_data"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:49:42.862927+00:00"
+instance: "kb-cron"
+---
+
+The Unesco recommendation of Open Science approved in November 2021 defines open science infrastructures as "shared research infrastructures that are needed to support open science and serve the needs of different communities" Open science infrastructures have been recognized as a major factor in the implementation and the development of data sharing policies.
+Leading infrastructures for open scientific data include data repositories, data analysis platforms, indexes, digitized libraries, or digitized archives. Infrastructures ensure that individual researchers and institutions do not entirely support the costs of publishing, maintaining, and indexing datasets. They are also critical stakeholders in the definition and adoption of open data standards, especially regarding licensing or documentation.
+By the end of the 1990s, the creation of public scientific computing infrastructure became a major policy issue: "The lack of infrastructure to support release and reuse was acknowledged in some of the earliest policy reports on data sharing." The first wave of web-based scientific projects in the 1990s and the early 2000s revealed critical issues of sustainability. As funding was allocated on a specific period, critical databases, online tools or publishing platforms could hardly be maintained. Project managers were faced with a valley of death "between grant funding and ongoing operational funding". After 2010, the consolidation and expansion of commercial, scientific infrastructure such as the acquisition of the open repositories Digital Commons and SSRN by Elsevier had further entailed calls to secure "community-controlled infrastructure". In 2015, Cameron Neylon, Geoffrey Bilder and Jenifer Lin defined an influential series of Principles for Open Scholarly Infrastructure that has been endorsed by leading infrastructures such as Crossref, OpenCitations or Data Dryad By 2021, public services and infrastructures for research have largely endorsed open science as an integral part of their activity and identity: "open science is the dominant discourse to which new online services for research refer." According to the 2021 Roadmap of the European Strategy Forum on Research Infrastructures (ESFRI), major legacy infrastructures in Europe have embraced open science principles. "Most of the Research Infrastructures on the ESFRI Roadmap are at the forefront of Open Science movement and make important contributions to the digital transformation by transforming the whole research process according to the Open Science paradigm."
+Open science infrastructure represents a higher level of commitment to data sharing. They rely on significant and recurrent investments to ensure that data is effectively maintained and documented and "add value to data through metadata, provenance, classification, standards for data structures, and migration".  Furthermore, infrastructures need to be integrated into the norms and expected uses of the scientific communities they mean to serve: "The most successful become reference collections that attract longer-term funding and can set standards for their communities" Maintaining open standards is one of the main challenge identified by leading European open infrastructures, as it implies choosing among competing standards in some case, as well as ensuring that the standards are correctly updated and accessible through APIs or other endpoints.
+The conceptual definition of open science infrastructures has been influenced mainly by the analysis of Elinor Ostrom on the commons and, more specifically, on the knowledge commons. In accordance with Ostrom, Cameron Neylon understates that open infrastructures are not only characterized by the management of a pool of shared resources but also by the elaboration of joint governance and norms. The diffusion of open scientific data also raise stringent issues of governance. In regards to the determination of the ownership of the data, the adoption of free license and the enforcement of regulations in regard to privacy, "continual negotiation is necessary" and involve a wide range of stakeholders.
+Beyond their integration in specific scientific communities, open science infrastructure have strong ties with the open source and the open data movements. 82% of the European infrastructures surveyed by SPARC claim to have partially built open source software and 53% have their entire technological infrastructure in open source. Open science infrastructures preferably integrate standards from other open science infrastructures. Among European infrastructures: "The most commonly cited systems – and thus essential infrastructure for many – are ORCID, Crossref, DOAJ, BASE, OpenAIRE, Altmetric, and Datacite, most of which are not-for-profit". Open science infrastructure are then part of an emerging "truly interoperable Open Science commons" that hold the premise of "researcher-centric, low-cost, innovative, and interoperable tools for research, superior to the present, largely closed system."
+
+== See also ==
+
+== References ==
+
+== Bibliography ==
+
+== External links ==
+Research Data Canada Archived 2024-02-10 at the Wayback Machine
+Open Data In Science article (P Murray-Rust)
+Open Data about monitoring of deforestation in the Brazilian Amazon Rainforest
+OpenWetWare
+Open ConnectomeProject
+LinkedScience.org
+Collective Mind Repository for computer engineering
--- a/data/en.wikipedia.org/wiki/Open_scientific_data-2.md
+++ b/data/en.wikipedia.org/wiki/Open_scientific_data-2.md
@ -0,0 +1,21 @@
+---
+title: "Open scientific data"
+chunk: 3/11
+source: "https://en.wikipedia.org/wiki/Open_scientific_data"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:49:42.862927+00:00"
+instance: "kb-cron"
+---
+
+=== Sharing scientific data on the web (1990-1995) ===
+The World Wide Web was originally conceived as an infrastructure for open scientific data. Sharing of data and data documentation was a major focus in the initial communication of the World Wide Web when the project was first unveiled in August 1991 : "The WWW project was started to allow high energy physicists to share data, news, and documentation. We are very interested in spreading the web to other areas, and having gateway servers for other data".
+The project stemmed from a close knowledge infrastructure, ENQUIRE. It was an information management software commissioned to Tim Berners-Lee by the CERN for the specific needs of high energy physics. The structure of ENQUIRE was closer to an internal web of data: it connected "nodes" that "could refer to a person, a software module, etc. and that could be interlined with various relations such as made, include, describes and so forth". While it "facilitated some random linkage between information" Enquire was not able to "facilitate the collaboration that was desired for in the international high-energy physics research community". Like any significant computing scientific infrastructure before the 1990s, the development of ENQUIRE was ultimately impeded by the lack of interoperability and the complexity of managing network communications: "although Enquire provided a way to link documents and databases,  and hypertext provided a common format in which to display them, there was still the problem of getting different computers with different operating systems to communicate with each other".
+The web rapidly superseded pre-existing closed infrastructure for scientific data, even when they included more advanced computing features. From 1991 to 1994, users of the Worm Community System, a major biology database on worms, switched to the Web and Gopher. While the Web did not include many advanced functions for data retrieval and collaboration, it was easily accessible. Conversely, the Worm Community System could only be browsed on specific terminals shared across scientific institutions: "To take on board the custom-designed, powerful WCS (with its convenient interface) is to suffer inconvenience at the intersection of work habits, computer use, and lab resources (…) The World-Wide Web, on the other hand, can be accessed from a broad variety of terminals and connections, and Internet computer support is readily available at most academic institutions and through relatively inexpensive commercial services."
+Publication on the web completely changed the economics of data publishing. While in print "the cost of reproducing large datasets is prohibitive", the storage expenses of most datasets is low. In this new editorial environment, the main limiting factors for data sharing becomes no longer technical or economic but social and cultural.
+
+=== Defining open scientific data (1995-2010) ===
+The development and the generalization of the World Wide Web lifted numerous technical barriers and frictions had constrained the free circulation of data. Yet, scientific data had yet to be defined and new research policy had to be implemented to realize the original vision laid out by Tim Berners-Lee of a web of data. At this point, scientific data has been largely defined through the process of opening scientific data, as the implementation of open policies created new incentives for setting up actionable guidelines, principles and terminologies.
+Climate research has been a pioneering field in the conceptual definition of open scientific data, as it has been in the construction of the first large knowledge infrastructure in the 1950s and the 1960s. In 1995 the GCDIS articulated a clear commitment On the Full and Open Exchange of Scientific Data: "International programs for global change research and environmental monitoring crucially depend on the principle of full and open data exchange (i.e., data and information are made available without restriction, on a non-discriminatory basis, for no more than the cost of reproduction and distribution). The expansion of the scope and the management of knowledge infrastructures also created to incentives to share data, as the "allocation of data ownership" between a large number of individual and institutional stakeholders has become increasingly complex. Open data creates a simplified framework to ensure that all contributors and users of the data have access to it.
+Open data has been rapidly identified as a key objective of the emerging open science movement. While initially focused on publications and scholarly articles, the international initiatives in favor of open access expanded their scope to all the main scientific productions. In 2003 the Berlin Declaration supported the diffusion of "original scientific research results, raw data and metadata, source materials and digital representations of pictorial and graphical and scholarly multimedia materials"
+After 2000, international organizations, like the OECD (Organisation for Economic Co-operation and Development), have played an instrumental role in devising generic and transdisciplinary definitions of scientific data, as open data policies have to be implemented beyond the specific scale of a discipline of a country. One of the first influential definition of scientific data was coined in 1999 by a report of the National Academies of Science: "Data are facts, numbers, letters, and symbols that describe an object, idea, condition, situation, or other factors". In 2004, the Science Ministers of all nations of the OECD signed a declaration which essentially states that all publicly funded archive data should be made publicly available. In 2007 the OECD "codified the principles for access to research data from public funding" through the Principles and Guidelines for Access to Research Data from Public Funding which defined scientific data as "factual records (numerical scores, textual records, images and sounds) used as primary sources for scientific research, and that are commonly accepted in the scientific community as necessary to validate research findings." The Principles acted as soft-law recommendation and affirmed that "access to research data increases the returns from public investment in this area; reinforces open scientific inquiry; encourages diversity of studies and opinion; promotes new areas of work and enables the exploration of topics not envisioned by the initial investigators."
--- a/data/en.wikipedia.org/wiki/Open_scientific_data-3.md
+++ b/data/en.wikipedia.org/wiki/Open_scientific_data-3.md
@ -0,0 +1,29 @@
+---
+title: "Open scientific data"
+chunk: 4/11
+source: "https://en.wikipedia.org/wiki/Open_scientific_data"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:49:42.862927+00:00"
+instance: "kb-cron"
+---
+
+=== Policy implementations (2010-…) ===
+After 2010, national and supra-national institutions took a more interventionist stance. New policies have been implemented not only to ensure and incentivize the opening of scientific data, usually in continuation to existing open data program. In Europe, the "European Union Commissioner for Research, Science, and Innovation, Carlos Moedas made open research data one of the EU's priorities in 2015."
+First published in 2016, the FAIR Guiding Principles have become an influential framework for opening scientific data. The principles have been originally designed two years earlier during a policy ad research workshop at Lorentz, Jointly Designing a Data FAIRport. During the deliberations of the workshop, "the notion emerged that, through the definition of, and widespread support for, a minimal set of community-agreed guiding principles and practice"
+The principles do not attempt to define scientific data, which remains a relatively plastic concept, but strive to describe "what constitutes 'good data management'". They cover four foundational principles, "that serve to guide data producer": Findability, Accessibility, Interoperability, and Reusability. and also aim to provide a step toward machine-actionability by expliciting the underlying semantics of data. As it fully acknowledge the complexity of data management, the principles do not claim to introduce a set of rigid recommendations but rather "degrees of FAIRness", that can be adjusted depending on the organizational costs but also external restrictions in regards to copyright or privacy.
+The FAIR principles have immediately been coopted by major international organization: "FAIR experienced rapid development, gaining recognition from the European Union, G7, G20 and US-based Big Data to Knowledge (BD2K)" In August 2016, the European Commission set up an expert group to turn "FAIR Data into reality". As of 2020, the FAIR principles remain "the most advanced technical standards for open scientific data to date"
+In 2022, the French Open Science Monitor started to publish an experimental survey of research data publications from text mining tools. Retrospective analysis showed that the rate of publications mentioning sharing of their associated has nearly doubled in 10 years, from 13% (in 2013) to 22% (in 2021).
+By the end of the 2010s, open data policy are well supported by scientific communities. Two large surveys commissioned by the European Commission in 2016 and 2018 find a commonly perceived benefit: "74% of researchers say that having access to other data would benefit them" Yet, more qualitative observations gathered in the same investigation also showed that "what scientists proclaim ideally, versus what they actually practice, reveals a more ambiguous situation."
+
+== Diffusion of scientific data ==
+
+=== Publication and edition ===
+
+Until the 2010s, the publication of scientific data referred mostly to "the release of datasets associated with an individual journal article" This release is documented by a Data Accessibility Statement or DAS. Several typologies or data accessibility statements have been proposed. In 2021, Colavizza et al. identified three categories or levels of access:
+
+DAS 1: "Data available on request or similar"
+DAS 2: "Data available with the paper and its supplementary files"
+DAS 3: "Data available in a repository"
+Supplementary data files have appeared in the early phase of the transition to scientific digital publishing. While the format of publications have largely kept the constraints of the printing format, additional materials could be included in "supplementary information". As a publication supplementary data files have an ambiguous status. In theory they are meant to be raw documents, giving access to the background of research. In practice, the released datasets have often to be specially curated for publication. They will usually focus on the primary data sources, not on the entire range of observations or measurements done for the purpose of the research: "Identifying what are "the data" associated with any individual article, conference paper, book, or other publication is often difficult [as] investigators collect data continually." The selection of the data is also further influenced by the publisher. Editorial policy of the journal largely determines "goes in the main text, what in the supplemental information" and editors are especially weary on including large datasets which may be difficult to maintain in the long run.
+Scientific datasets have been increasingly acknowledged as an autonomous scientific publication. The assimilation of data to academic articles aimed to increase the prestige and recognition of published datasets: "implicit in this argument is that familiarity will encourage data release". This approach has been favored by several publishers and repositories as it made it possible to easily integrate data in existing publishing infrastructure and to extensively reuse editorial concepts initially created around articles Data papers were explicitly introduced as "a mechanism to incentivize data publishing in biodiversity science".
--- a/data/en.wikipedia.org/wiki/Open_scientific_data-4.md
+++ b/data/en.wikipedia.org/wiki/Open_scientific_data-4.md
@ -0,0 +1,30 @@
+---
+title: "Open scientific data"
+chunk: 5/11
+source: "https://en.wikipedia.org/wiki/Open_scientific_data"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:49:42.862927+00:00"
+instance: "kb-cron"
+---
+
+=== Citation and indexation ===
+The first digital databases of the 1950s and the 1960s have immediately raised issues of citability and bibliographic descriptions. The mutability of computer memory was especially challenging: in contrast with printed publications, digital data could not be expected to remain stable on the long run. In 1965, Ralph Bisco underlined that this uncertainty affected all the associated documents like code notebooks, which may become increasingly out of date. Data management have to find a middle ground between continuous enhancements and some form of generic stability: "the concept of a fluid, changeable, continually improving data archive means that study cleaning and other processing must be carried to such a point that changes will not significantly affect prior analyses"
+Structured bibliographic metadata for database has been a debated topic since the 1960s. In 1977, the American Standard for Bibliographic Reference adopted a definition of "data file" with a strong focus on the materiability and the mutability of the dataset: neither dates nor authors were indicated but the medium or "Packaging Method" had to be specified. Two years later, Sue Dodd introduced an alternative convention, that brought the citation of data closer to the standard of references of other scientific publications: Dodd's recommendation included the use of titles, author, editions and date, as well as alternative mentions for sub-documentations like code notebook.
+The indexation of dataset has been radically transformed by the development of the web, as barriers to data sharing were substantially reduced. In this process, data archiving, sustainability and persistence have become critical issues. Permanent digital object identifiers (or DOI) have been introduced for scientific articles to avoid broken links, as website structures continuously evolved. In the early 2000s, pilot programs started to allocate DOIs to dataset as well While it solves concrete issues of link sustainability, the creation of data DOI and norms of data citation is also part of legitimization process, that assimilate dataset to standard scientific publications and can draw from similar sources of motivation (like the bibliometric indexes)
+Accessible and findable datasets yield a significant citation advantage. A 2021 study of 531,889 articles published by PLOS estimated that there is a "25.36% relative gain in citation counts in general" for a journal article with "a link to archived data in a public repository". Diffusion of data as a supplementary materials does not yield a significant citation advantage which suggest that "the citation advantage of DAS [Data Availability Statement] is not as much related to their mere presence, but to their contents"
+As of 2022, the recognition of open scientific data is still an ongoing process. The leading reference software Zotero does not have yet a specific item for dataset.
+
+=== Reuse and economic impact ===
+Within academic research, storage and redundancy has proven to be a significant benefit of open scientific data. In contrast, non-open scientific data is weakly preserved and can only "be retrieved only with considerable effort by the authors" if not completely lost.
+Analysis of the uses of open scientific data run into the same issues as for any open content: while free, universal and indiscriminate access has demonstrably expanded the scope, range and intensity of the reception it has also made it harder to track, due to the lack of transaction process.
+These issues are further complicated by the novelty of data as a scientific publication: "In practice, it can be difficult to monitor data reuse, mainly because researchers rarely cite the repository"
+In 2018, a report of the European Commission estimated the cost of not opening scientific data in accordance with the FAIR principles: it amounted at 10.2 billion annually in direct impact and 16 billions in indirect impact over the entire innovation economy. Implementing open scientific open data at a global scale "would have a considerable impact on the time we spent manipulating data and the way we store data."
+
+== Practices and data culture ==
+The sharing of scientific data is rooted in scientific cultures or communities of practice. As digital tools have become widespread, the infrastructures, the practices and the common representations of research communities have increasingly relied of shared meanings of what is data and what can be done with it.
+Pre-existing epistemic machineries can be more or less predisposed to data sharing. Important factors may include shared values (individualistic or collective), data ownership allocation and frequent collaborations with external actors which may be reluctant to data sharing.
+
+=== The emergence of an open data culture ===
+The development of scientific open data is not limited to scientific research. It involves a diverse set of stakeholders: "Arguments for sharing data come from many quarters: funding agencies—both public and private—policy bodies such as national academies and funding councils, journal publishers, educators, the public at large, and from researchers themselves." As such, the movement for scientific open data largely intersects with more global movements for open data. Standards definition of open data used by a wide range of public nd private actors have been partly elaborated by researchers around concrete scientific issues. The concept of transparency has especially contributed to create convergences between open science, open data and open government. In 2015, the OECD describe transparency as a common "rationale for open science and open data".
+Christine Borgman has identified four major rationales for sharing data commonly used across the entire regulatory and public debate over scientific open data:
--- a/data/en.wikipedia.org/wiki/Open_scientific_data-5.md
+++ b/data/en.wikipedia.org/wiki/Open_scientific_data-5.md
@ -0,0 +1,28 @@
+---
+title: "Open scientific data"
+chunk: 6/11
+source: "https://en.wikipedia.org/wiki/Open_scientific_data"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:49:42.862927+00:00"
+instance: "kb-cron"
+---
+
+Research reproducibility: lack of reproducibility is frequently attributed to deficiencies in research transparency and data analysis process. Consequently, as "a rationale for sharing research data, [research reproducibility] is powerful yet problematic". Reproducibility only applies to "certain kinds of research", mostly in regards to experimental sciences.
+Public accessibility: this rationale that "products of public funding should be available to the public" is "found in arguments for open government". While directly inspired by similar arguments made in favor of open access to publications, its range is more limited as scientific open data "has direct benefits to far fewer people, and those benefits vary by stakeholder"
+Research valorization: open scientific data may bring a substantial value to the private sector. This argument is especially used to support "the need for more repositories that can accept and curate research data, for better tools and services to exploit data, and for other investments in knowledge infrastructure".
+Increased research and innovation: open scientific data may significantly enhanced the quality of private and public research. This argument aims for "investing in knowledge infrastructure to sustain research data, curated to high standards of professional practices"
+Yet collaboration between the different actors and stakeholders of the data lifecycle is partial. Even within academic institution, cooperation remains limited: "most researchers are making [data related search] without consulting a data manager or librarian."
+The global open data movement has partly lost its cohesiveness and identity during the 2010s, as debates over data availability and licensing have been overcome by domain specific issues: "When the focus shifts from calling for access to data to creating data infrastructure and putting data to work, the divergent goals of those who formed an initial open data movement come clearly into view and managing the tensions that emerge can be complex." The very generic scope of open data definition that aims to embrace a very wide set of preexisting data cultures does not well take into account the higher threshold of accessibility and contextualization necessitated by scientific research: "open data in the sense of being free for reuse is a necessary but not sufficient condition for research purposes."
+
+=== Ideal and implementation: the paradox of data sharing ===
+Since the 2000s, surveys of scientific communities have underlined a consistent discrepancy between the ideals of data sharing and their implementation in practice: "When present-day researchers are asked whether they are willing to share their data, most say yes, they are willing to do so. When the same researchers are asked if they do release their data, they typically acknowledge that they have not done so." Open data culture does not emerge in a vacuum and has to content with preexisting culture of scientific data and a range of systemic factors that can discourage data sharing: "In some fields, scholars are actively discouraged from reusing data. (…) Careers are made by charting territory that was previously uncharted."
+In 2011, 67% of 1329 scientists agree that lack of data sharing is a "major impediment to progress in science." and yet "only about a third (36%) of the respondents agree that others can access their data easily". In 2016, a survey of researchers in the environmental sciences finds overwhelming support for easily accessible open data (99% as at least somewhat important) and funder policies for open data (88%).  Yet, "even with willingness to share data there are discrepancies with common practices, e.g. willingness to spend time and resources preparing and up-loading data". A 2022 study of 1792 data sharing statements from BioMed Central found that less 7% of the authors (123) actually provided the data upon requests.
+The prevalence of accessible and findable data is even lower: "Despite several decades of policy moves toward open access to data, the few statistics available reflect low rates of data release or deposit." In a 2011 poll for Science, only 7.6% of researchers shared their data on community repositories with local websites hosted by universities or laboratories being favored instead. Consequently "many bemoaned the lack of common metadata and archives as a main impediment to using and storing data".
+According to Borgmann, the paradox of data sharing is partly due to the limitation of open data policies which tends to focus on "mandating or encouraging investigators to release their data" without meeting the "expected demand for data or the infrastructure necessary to support release and reuse."
+
+=== Incentives and barriers to scientific open data ===
+In 2022, Pujol Priego, Wareham and Romasanta stressed that incentives for the sharing of scientific data were primarily collective and include reproducibility, scientific efficiency, scientific quality, along with more individual retributions such as personal credit Individual benefits include increased visibility: open dataset yield a significant citation advantage but only when they have been shared on an open repository
+Important barriers include the need to publish first, legal constraints and concerns about loss of credit of recognition. For individual researchers, datasets may be major assets to barter for "new jobs or new collaborations" and their publication may be difficult to justify unless they "get something of value in return".
+Lack of familiarity with data sharing, rather than a straight rejection of the principles of open science is also ultimately a leading obstacle. Several surveys in the early 2010s have shown that researchers "rarely seek data from other investigators and (…) they rarely are asked for their own data." This creates a negative feedback loop as researchers make little effort to ensure data sharing which in turns discouraged effective use whereas "the heaviest demand for reusing data exists in fields with high mutual dependence." The reality of data reuse may also be underestimated as data is not considered to be a prestigious data publication and the original sources are not quoted.
+According to a 2021 empirical study of 531,889 articles published by PLOS show that soft incentives and encouragements have a limited impact on data sharing: "journal policies that encourage rather than require or mandate DAS [Data Availability Statement] have only a small effect".
--- a/data/en.wikipedia.org/wiki/Open_scientific_data-6.md
+++ b/data/en.wikipedia.org/wiki/Open_scientific_data-6.md
@ -0,0 +1,23 @@
+---
+title: "Open scientific data"
+chunk: 7/11
+source: "https://en.wikipedia.org/wiki/Open_scientific_data"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:49:42.862927+00:00"
+instance: "kb-cron"
+---
+
+== Legal status ==
+The opening of scientific data has raised a variety of legal issues in regards to ownership rights, copyrights, privacy and ethics. While it is commonly considered that researchers "own the data they collect in the course of their research", this "view is incorrect": the creation of dataset involves potentially the rights of numerous additional actors such as institutions (research agencies, funders, public bodies), associated data producers, personal data on private citizens. The legal situation of digital data has been consequently described as a "bundle of rights" due to the fact that the "legal category of "property" (...) is not a suitable model for dealing with the complexity of data governance problems"
+
+=== Copyright ===
+Copyright has been the primary focus of the legal literature of open scientific data until the 2010s. The legality of data sharing was early on identified a crucial issue. In contrast with the sharing of scientific publication, the main impediment was not copyright but uncertainty: "the concept of 'data' [was] a new concept, created in the computer age, while copyright law emerged at the time of printed publications." In theory, copyright and author rights provisions do not apply to simple collections of facts and figures. In practice, the notion of data is much more expansive and could include protected content or creative arrangement of non-copyrightable contents.
+The status of data in international conventions on intellectual property is ambiguous. According to the Article 2 of the Berne Convention "every production in the literary, scientific and artistic domain" are protected. Yet, research data is often not an original creation entirely produced by one or several authors, but rather a "collection of facts, typically collated using automated or semiautomated instruments or scientific equipment." Consequently, there are no universal convention on data copyright and debates over "the extent to which copyright applies" are still prevalent, with different outcomes depending on the jurisdiction or the specifics of the dataset. This lack of harmonization stems logically from the novelty of "research data" as a key concept of scientific research: "the concept of 'data' is a new concept, created in the computer age, while copyright law emerged at the time of printed publications."
+In the United States, the European Union and several other jurisdictions, copyright laws have acknowledged a distinction between data itself (which can be an unprotected "fact") and the compilation of the data (which can be a creative arrangement). This principle largely predates the contemporary policy debate over scientific data, as the earliest court cases ruled in favor of compilation rights go back to the 19th century.
+In the United States compilation rights have been defined in the Copyright Act of 1976 with an explicit mention of datasets: "a work formed by the collection and assembling of pre-existing materials or of data" (Par 101). In its 1991 decision, Feist Publications, Inc., v. Rural Telephone Service Co., the Supreme Court has clarified the extents and the limitations on database copyrights, as the "assembling" should be demonstrably original and the "raw facts" contained in the compilation are still unprotected.
+Even in the jurisdiction where the application of the copyright to data outputs remains unsettled and partly theoretical, it has nevertheless created significant legal uncertainties. The frontier between a set of raw facts and an original compilation is not clearly delineated. Although scientific organizations are usually well aware of copyright laws, the complexity of data rights create unprecedented challenges. After 2010, national and supra-national jurisdiction have partly changed their stance in regard to the copyright protection of research data. As the sharing is encouraged, scientific data has been also acknowledged as an informal public good: "policymakers, funders, and academic institutions are working to increase awareness that, while the publications and knowledge derived from research data pertain to the authors, research data needs to be considered a public good so that its potential social and scientific value can be realised"
+
+=== Database rights ===
+The European Union provides one of the strongest intellectual property framework for data, with a double layer of rights: copyrights for original compilations (similarly to the United States) and sui generis database rights. Criteria for the originality of compilations have been harmonized across the membership states, by the 1996 Database Directive and by several major case laws settled by the European court of justice such as Infopaq International A/S v Danske Dagblades Forening c or Football Dataco Ltd et al. v Yahoo! UK Ltd. Overall, it has been acknowledged that significant efforts in the making of the dataset are not sufficient to claim compilation rights, as the structure has to "express his creativity in an original manner" The Database Directive has also introduced an original framework of protection for dataset, the sui generis rights that are conferred to any dataset that required a "substantial investment". While they last 15 year, sui generis rights have the potential to become permanent, as they can be renewed for every update of the dataset.
+Due to their large scope in length and protection, sui generis rights have initially not been largely acknowledged by the European jurisprudence, which has raised a high bar its enforcement. This cautious approach has been reversed in the 2010s, as the 2013 decision Innoweb BV v Wegener ICT Media BV and Wegener Mediaventions strengthened the positions of database owners and condemned the reuse of non-protected data in web search engines. The consolidation and expansion of database rights remain a controversial topic in European regulations, as it is partly at odds with the commitment of the European Union in favor of data-driven economy and open science. While a few exceptions exists for scientific and pedagogic uses, they are limited in scope (no rights for further reutilization) and they have not been activated in all member states.
--- a/data/en.wikipedia.org/wiki/Open_scientific_data-7.md
+++ b/data/en.wikipedia.org/wiki/Open_scientific_data-7.md
@ -0,0 +1,28 @@
+---
+title: "Open scientific data"
+chunk: 8/11
+source: "https://en.wikipedia.org/wiki/Open_scientific_data"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:49:42.862927+00:00"
+instance: "kb-cron"
+---
+
+=== Ownership ===
+Copyright issues with scientific datasets have been further complicated by uncertainties regarding ownership. Research is largely a collaborative activity that involves a wide range of contributions. Initiatives like CRediT (Contributor Roles Taxonomy) have identified 14 different roles, of which 4 are explicitly related to data management (Formal Analysis, Investigation, Data curation and Visualization).
+In the United States, ownership of research data is usually "determined by the employer of the researcher", with the principal investigator acting as the caretaker of the data rather than the owner. Until the development of research open data, US institutions have been usually more reluctant to waive copyrights on data than on publications, as they are considered strategic assets. In the European Union, there is no largely agreed framework on the ownership of data.
+The additional rights of external stakeholders has also been raised, especially in the context of medical research. Since the 1970s, patients have claimed some form of ownership of the data produced in the context of clinical trials, notably with important controversies concerning 'whether research subjects and patients actually own their own tissue or DNA."
+
+=== Privacy ===
+Numerous scientific projects rely on data collection of persons, notably in medical research and the social sciences. In such cases, any policy of data sharing has to be necessarily balanced with the preservation and protection of personal data.
+Researchers and, most specifically, principal investigators have been subjected to obligations of confidentiality in several jurisdictions. Health data has been increasingly regulated since the late 20th century, either by law or by sectorial agreements. In 2014, the European Medicines Agency have introduced important changes to the sharing of clinical trial data, in order to prevent the release of all personal details and all commercially relevant information. Such evolution of the European regulation "are likely to influence the global practice of sharing clinical trial data as open data".
+Research management plans and practices have to be open, transparent and confidential by design.
+
+=== Free licenses ===
+Open licenses have been the preferred legal framework to clear the restrictions and ambiguities in the legal definition of scientific data. In 2003, the Berlin Declaration called for a universal waiver of reuse rights on scientific contributions that explicitly included "raw data and metadata".
+In contrast with the development of open licenses for publications which occurred on short time frame, the creation of licenses for open scientific data has been a complicated process. Specific rights, like the sui generis database rights in the European Union or specific legal principles, like the distinction between simple facts and original compilation have not been initially anticipated. Until the 2010s, free licenses could paradoxically add more restrictions to the reuse of datasets, especially in regard with attributions (which is not required for non-copyrighted objects like raw facts): "in such cases, when no rights are attached to research data, then there is no ground for licencing the data"
+To circumvent the issue several institutions like the Harvard-MIT Data Center started to share the data in the Public Domain. This approach ensures that no right is applied on non-copyrighted items. Yet, the public domain and some associated tools like the Public Domain Mark are not a properly defined legal contract and varies significantly from one jurisdiction to another. First introduced in 2009, the Creative Commons Zero (or CC0) license has been immediately contemplated for data licensing. It has since become "the recommended tool for releasing research data into the public domain". In accordance with the principles of the Berlin Declaration it is not a license but a waiver, as the producer of the data "overtly, fully, permanently, irrevocably and unconditionally waives, abandons, and surrenders all of Affirmer's Copyright and Related Rights".
+Alternative approaches have included the design of new free license to disentangle the attribution stacking specific to database rights. In 2009, the Open Knowledge Foundation published the Open Database License which has been adopted by major online projects like OpenStreetMap. Since 2015, all the different Creative Commons licenses have been updated to become fully effective on dataset, as database rights have been explicitly anticipated in the 4.0 version.
+
+== Open scientific data management ==
+Data management has recently become a primary focus of the policy and research debate on open scientific data. The influential FAIR principles are voluntarily centered on the key features of "good data management" in a scientific context. In a research context, data management is frequently associated to data lifecycles. Various models of lifecycles in different stage have been theorized by institutions, infrastructures and scientific communities. However, "such lifecycles are a simplification of real life, which is far less linear and more iterative in practice."
--- a/data/en.wikipedia.org/wiki/Open_scientific_data-8.md
+++ b/data/en.wikipedia.org/wiki/Open_scientific_data-8.md
@ -0,0 +1,23 @@
+---
+title: "Open scientific data"
+chunk: 9/11
+source: "https://en.wikipedia.org/wiki/Open_scientific_data"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:49:42.862927+00:00"
+instance: "kb-cron"
+---
+
+=== Integration to the research workflow ===
+In contrast with the broad incitations for data sharing included in the early policies in favor of open scientific data, the complexity and the underlying costs and requirements of scientific data management are increasingly acknowledged: "Data sharing is difficult to do and to justify by the return on investment." Open data is not simply a supplementary task but has to be envisioned throughout the entire research process as it "requires changes in methods and practices of research."
+The opening of research data creates a new settlement of costs and benefits. Public data sharing introduces a new communication setting that primarily contrasts with private data exchange with research collaborators or partners. The collection, the purpose, and the limitation of data have to be explicit as it is impossible to rely on pre-existing informal knowledge: "the documentation and representations are the only means of communicating between data creator and user." Lack of proper documentation means that the burden of recontextualization falls on the potential users and may render the dataset useless.
+Publication requires further verification regarding the ownership of the data and the potential legal liability if the data is misused. This clarification phase becomes even more complex in international research projects that may overlap several jurisdictions. Data sharing and applying open science principles also bring significant long-term advantages that may not be immediately visible. Documentation of the dataset helps clarify the chain of provenance and ensure that the original data has not been significantly altered or that all the further treatments are fully documented if this is the case. Publication under a free license also allows delegating tasks such as long-term preservation to external actors.
+By the end of the 2010s, a new specialized literature on data management for research had emerged to codify existing practices and regulatory principles.
+
+=== Storage and preservation ===
+The availability of non-open scientific data decays rapidly: in 2014 a retrospective study of biological datasets showed that "the odds of a data set being reported as extant fell by 17% per year" Consequently, the "proportion of data sets that still existed dropped from 100% in 2011 to 33% in 1991". Data loss has also been singled out as a significant issue in major journals like Nature or Science
+Surveys of research practices have consistently shown that storage norms, infrastructures, and workflow remain unsatisfying in most disciplines. The storage and preservation of scientific data have been identified early on as critical issues, especially concerning observational data,  which are considered essential to preserve because they are the most difficult to replicate. A 2017-2018 survey of 1372 researchers contacted through the American Geophysical Union shows that only "a quarter and a fifth of the respondents" report good data storage practices. Short-term and unsustainable storage remains widespread, with 61% of the respondents storing most or all of their data on personal computers. Due to their ease of use at an individual scale, unsustainable storage solutions are viewed favorably in most disciplines: "This mismatch between good practices and satisfaction may show that data storage is less important to them than data collection and analysis".
+First published in 2012, the reference model of Open Archival Information System states that scientific infrastructure should seek long-term preservation, that is, "long enough to be concerned with the impacts of changing technologies, including support for new media and data formats, or with a changing user community". Consequently, good practices of data management imply both on storage (to materially preserve the data) and, even more crucially on curation, "to preserve knowledge about the data to facilitate reuse".
+Data sharing on public repositories has contributed to mitigating preservation risks due to the long-term commitment of data infrastructures and the potential redundancy of open data. A 2021 study of 50,000 data availability statements published in PLOS One showed that 80% of the dataset could be retrieved automatically, and 98% of the dataset with a data DOI could be retrieved automatically or manually. Moreover, accessibility did not decay significantly for older publications: "URLs and DOIs make the data and code associated with papers more likely to be available over time". Significant benefits have not been found when the open data was not correctly linked or documented: "Simply requiring that data be shared in some form may not have the desired impact of making scientific data FAIR, as studies have repeatedly demonstrated that many datasets that are ostensibly shared may not actually be accessible."
+
+=== Plan and governance ===
--- a/data/en.wikipedia.org/wiki/Open_scientific_data-9.md
+++ b/data/en.wikipedia.org/wiki/Open_scientific_data-9.md
@ -0,0 +1,17 @@
+---
+title: "Open scientific data"
+chunk: 10/11
+source: "https://en.wikipedia.org/wiki/Open_scientific_data"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:49:42.862927+00:00"
+instance: "kb-cron"
+---
+
+Research data management can be laid out in a data management plan or DMP.
+Data management plans were incepted in 1966 for the specific needs of aeronautic and engineering research, which already faced increasingly complex data frictions. These first examples were focused on material issues associated with the access, transfer, and storage of the data: "Until the early 2000s, DMPs were utilised in this manner: in limited fields, for projects of great technical complexity, and for limited mid-study data collection and processing purposes"
+After 2000, the implementation of extensive research infrastructure and the development of open science changed the scope and the purpose of data management plans. Policy-makers, rather than scientists, have been instrumental in this development: "The first publications to provide general advice and guidance to researchers around the creation of DMPs were published from 2009 following the publications from JISC and the OECD (…) DMP use, we infer, has been imposed onto the research community through external forces"
+Empirical studies of data practices in research have "highlighted the need for organizations to offer more formal training and assistance in data management to scientists" In a 2017-2018 international survey of 1372 scientist, most requests for help and formalization were associated with data management plan: "creating data management plans (33.3%); training on best practices in data management (31.3%); assistance on creating metadata to describe data or datasets (27.6%)" The expansion of data collection and data analysis processes have increasingly strained a large range of unformal and non-codified data practices.
+The implication of external shareholders in research projects creates significant potential tensions with the principles of sharing open data. Contributions from commercial actors can especially rely on some form of exclusivity and appropriation of the final research results. In 2022, Pujol Priego, Wareham, and Romasanta created several accommodation strategies to overcome these issues, such as data modularity (with sharing limited to some part of the data) and time delay (with year-long embargoes before the final release of the data).
+
+=== Open science infrastructures ===
--- a/data/en.wikipedia.org/wiki/Open_source-0.md
+++ b/data/en.wikipedia.org/wiki/Open_source-0.md
@ -0,0 +1,27 @@
+---
+title: "Open source"
+chunk: 1/11
+source: "https://en.wikipedia.org/wiki/Open_source"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:49:44.361173+00:00"
+instance: "kb-cron"
+---
+
+Open source is software that is made freely available for possible modification and redistribution, also in form of source code. The licensing conditions include permission to use and view the source code, design documents, or content of the product. The open source model is a decentralized software development model that encourages open collaboration. A main principle of open source software development is peer production, with products such as source code, blueprints, and documentation freely available to the public. The open source movement in software began as a response to the limitations of proprietary code. The model is used for projects such as in open source eCommerce, open source appropriate technology, and open source drug discovery.
+Open source promotes universal access via an open-source or free license to a product's design or blueprint, and universal redistribution of that design or blueprint. Before the phrase open source became widely adopted, developers and producers used a variety of other terms, such as free software, shareware, and public domain software. The term Open source was introduced in 1998 and gained hold with the rise of the Internet. The open-source software movement arose to clarify copyright, licensing, domain, and consumer issues.
+Generally, open source refers to a computer program in which the source code is available to the general public for usage, modification from its original design, and publication of their version (fork) back to the community. Many large formal institutions have sprung up to support the development of the open-source movement, including the Apache Software Foundation, which supports community projects such as the open-source framework  and the open-source HTTP server Apache HTTP.
+
+== History ==
+
+The sharing of technical information predates the Internet and the personal computer considerably. For instance, in the early years of automobile development a group of capital monopolists owned the rights to a 2-cycle gasoline-engine patent originally filed by George B. Selden. By controlling this patent, they were able to monopolize the industry and force car manufacturers to adhere to their demands, or risk a lawsuit.
+In 1911, independent automaker Henry Ford won a challenge to the Selden patent. The result was that the Selden patent became virtually worthless and a new association (which would eventually become the Motor Vehicle Manufacturers Association) was formed. The new association instituted a cross-licensing agreement among all US automotive manufacturers: although each company would develop technology and file patents, these patents were shared openly and with no exchange of money among all the firms. By the time the US entered World War II, 92 Ford patents and 515 patents from other companies were being shared among these manufacturers, with no exchange of money, or lawsuits.
+Early instances of the free sharing of source code include IBM's source releases of its operating systems and other programs in the 1950s and 1960s, and the SHARE user group that formed to facilitate the exchange of software. Beginning in the 1960s, ARPANET researchers used an open "Request for Comments" (RFC) process to encourage feedback in early telecommunication network protocols. This led to the birth of the early Internet in 1969.
+The sharing of source code on the Internet began when the Internet was relatively primitive, with software distributed via UUCP, Usenet, IRC, and Gopher. BSD, for example, was first widely distributed by posts to comp.os.linux on the Usenet, which is also where its development was discussed. Linux followed in this model.
+
+=== Open source as a term ===
+Open source as a term emerged in the late 1990s by a group of people in the free software movement who were critical of the political agenda and moral philosophy implied in the term "free software" and sought to reframe the discourse to reflect a more commercially minded position. In addition, the ambiguity of the term "free software" was seen as discouraging business adoption. However, the ambiguity of the word "free" exists primarily in English as it can refer to cost. The group included Christine Peterson, Todd Anderson, Larry Augustin, Jon Hall, Sam Ockman, Michael Tiemann and Eric S. Raymond. Peterson suggested "open source" at a meeting held at Palo Alto, California, in reaction to Netscape's announcement in January 1998 of a source code release for Navigator. Linus Torvalds gave his support the following day, and Phil Hughes backed the term in Linux Journal. Richard Stallman, the founder of the Free Software Foundation (FSF) in 1985, quickly decided against endorsing the term. The FSF's goal was to promote the development and use of free software, which they defined as software that grants users the freedom to run, study, share, and modify the code. This concept is similar to open source but places a greater emphasis on the ethical and political aspects of software freedom. Netscape released its source code under the Netscape Public License and later under the Mozilla Public License.
+Raymond was especially active in the effort to popularize the new term. He made the first public call to the free software community to adopt it in February 1998. Shortly after, he founded The Open Source Initiative in collaboration with Bruce Perens.
+The term gained further visibility through an event organized in April 1998 by technology publisher O'Reilly Media . Originally titled the "Freeware Summit" and later known as the "Open Source Summit", the event was attended by the leaders of many of the most important free and open-source projects, including Linus Torvalds, Larry Wall, Brian Behlendorf, Eric Allman, Guido van Rossum, Michael Tiemann, Paul Vixie, Jamie Zawinski, and Eric Raymond. At that meeting, alternatives to the term "free software" were discussed. Tiemann argued for "sourceware" as a new term, while Raymond argued for "open source." The assembled developers took a vote, and the winner was announced at a press conference the same evening.
+
+== Economics ==
--- a/data/en.wikipedia.org/wiki/Open_source-1.md
+++ b/data/en.wikipedia.org/wiki/Open_source-1.md
@ -0,0 +1,32 @@
+---
+title: "Open source"
+chunk: 2/11
+source: "https://en.wikipedia.org/wiki/Open_source"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:49:44.361173+00:00"
+instance: "kb-cron"
+---
+
+Some economists agree that open-source is an information good or "knowledge good" with original work involving a significant amount of time, money, and effort. The cost of reproducing the work is low enough that additional users may be added at zero or near zero cost – this is referred to as the marginal cost of a product. Copyright creates a monopoly so that the price charged to consumers can be significantly higher than the marginal cost of production. This allows the author to recoup the cost of making the original work. Copyright thus creates access costs for consumers who value the work more than the marginal cost but less than the initial production cost. Access costs also pose problems for authors who wish to create a derivative work—such as a copy of a software program modified to fix a bug or add a feature, or a remix of a song—but are unable or unwilling to pay the copyright holder for the right to do so.
+Open source eliminates some of the access costs of consumers and creators of derivative works by reducing the restrictions of copyright. Basic economic theory predicts that lower costs would lead to higher consumption and also more frequent creation of derivative works. Organizations such as Creative Commons host websites where individuals can file for alternative "licenses", or levels of restriction, for their works.
+These self-made protections free the general society of the costs of policing copyright infringement.
+Others argue that since consumers do not pay for their copies, creators are unable to recoup the initial cost of production and thus have little economic incentive to create in the first place. By this argument, consumers would lose out because some of the goods they would otherwise purchase would not be available. In practice, content producers can choose whether to adopt a proprietary license and charge for copies, or an open license. Some goods which require large amounts of professional research and development, such as the pharmaceutical industry (which depends largely on patents, not copyright for intellectual property protection) are almost exclusively proprietary, although increasingly sophisticated technologies are being developed on open-source principles.
+There is evidence that open-source development creates enormous value. For example, in the context of open-source hardware design, digital designs are shared for free and anyone with access to digital manufacturing technologies (e.g. RepRap 3D printers) can replicate the product for the cost of materials. The original sharer may receive feedback and potentially improvements on the original design from the peer production community.
+Many open-source projects have a high economic value. According to the Battery Open Source Software Index (BOSS), the ten economically most important open-source projects for 2017 are:
+
+The rank given is based on the activity regarding projects in online discussions, on GitHub, on search activity in search engines and on the influence on the labour market.
+
+=== Licensing alternatives ===
+
+Alternative arrangements have also been shown to result in good creation outside of the proprietary license model. Examples include:
+
+Creation for its own sake – For example, Wikipedia editors add content for recreation. Artists have a drive to create. Both communities benefit from free starting material.
+Voluntary after-the-fact donations – used by shareware, street performers, and public broadcasting in the United States.
+Patron – For example, open-access publishing relies on institutional and government funding of research faculty, who also have a professional incentive to publish for reputation and career advancement. Works of the US government are automatically released into the public domain.
+Freemium – Give away a limited version for free and charge for a premium version (potentially using a dual license).
+Give away the product and charge something related – charge for support of open-source enterprise software, give away music but charge for concert admission.
+Give away work to gain market share – used by artists, in corporate software to spoil a dominant competitor (for example in the browser wars and the Android operating system).
+For own use – Businesses or individual software developers often create software to solve a problem, bearing the full cost of initial creation. They will then open source the solution, and benefit from the improvements others make for their own needs. Communalizing the maintenance burden distributes the cost across more users; free riders can also benefit without undermining the creation process. Drupal's founder Dries Buytaert has summarized this as the Maker/Taker problem.
+
+== Open collaboration ==
--- a/data/en.wikipedia.org/wiki/Open_source-10.md
+++ b/data/en.wikipedia.org/wiki/Open_source-10.md
@ -0,0 +1,33 @@
+---
+title: "Open source"
+chunk: 11/11
+source: "https://en.wikipedia.org/wiki/Open_source"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:49:44.361173+00:00"
+instance: "kb-cron"
+---
+
+=== Literature on legal and economic aspects ===
+Abramson, Bruce (2005). Digital Phoenix; Why the Information Economy Collapsed and How it Will Rise Again. MIT Press. ISBN 978-0-262-51196-4.
+Benkler, Y. (December 2002). "Coase's Penguin, or, Linux and The Nature of the Firm" (PDF). Yale Law Journal. 112 (3): 369–446. arXiv:cs/0109077. doi:10.2307/1562247. hdl:10535/2974. ISSN 0044-0094. JSTOR 1562247. S2CID 16684329.
+Berry, D.M.; Moss, G. (2008). "Libre Culture: Meditations on Free Culture" (PDF). Canada: Pygmalion Books.
+Bitzer, J.; Schröder, P.J.H. (2005). "The Impact of Entry and Competition by Open Source Software on Innovation Activity" (PDF). Industrial Organization. EconWPA.
+v. Engelhardt, S. (2008). "The Economic Properties of Software" (PDF). Jena Economic Research Papers. 2: 2008–045.
+v. Engelhardt, S. (2008): "Intellectual Property Rights and Ex-Post Transaction Costs: the Case of Open and Closed Source Software", Jena Economic Research Papers 2008-047. (PDF)
+v. Engelhardt, S.; Swaminathan, S. (2008). "Open Source Software, Closed Source Software or Both: Impacts on Industry Growth and the Role of Intellectual Property Rights" (PDF). Discussion Papers of Diw Berlin.
+European Commission. (2006). Economic impact of open source software on innovation and the competitiveness of the Information and Communication Technologies sector in the EU. Brussels.
+v. Hippel, E.; v. Krogh, G. (2003). "Open source software and the "private-collective" innovation model: Issues for organization science" (PDF). Organization Science. 14 (2): 209–223. doi:10.1287/orsc.14.2.209.14992. hdl:1721.1/66145. ISSN 1047-7039. S2CID 11947692.
+Kostakis, V.; Bauwens, M. (2014). Network Society and Future Scenarios for a Collaborative Economy. Palgrave Macmillan. ISBN 978-1-137-41506-6. (wiki)
+Lerner, Josh; Pathak, Parag A.; Tirole, Jean (2006). "The Dynamics of Open Source Contributors". American Economic Review. 96 (2): 114–8. CiteSeerX 10.1.1.510.9948. doi:10.1257/000282806777211874. ISSN 0002-8282. Archived from the original on 4 January 2012. Retrieved 9 July 2021.
+Lerner, Josh; Tirole, Jean (2002). "Some simple economics on open source". Journal of Industrial Economics. 50 (2): 197–234. CiteSeerX 10.1.1.461.3373. doi:10.1111/1467-6451.00174. ISSN 0022-1821. S2CID 219722756. earlier revision (PDF)
+Lerner, J.; Tirole, J. (2005). "The Scope of Open Source Licensing". The Journal of Law, Economics, and Organization. 21: 20–56. CiteSeerX 10.1.1.72.465. doi:10.1093/jleo/ewi002. ISSN 8756-6222.
+Lerner, J.; Tirole, J. (2005). "The Economics of Technology Sharing: Open Source and Beyond" (PDF). Journal of Economic Perspectives. 19 (2): 99–120. doi:10.1257/0895330054048678. ISSN 0895-3309. S2CID 17968894.
+Maurer, S.M. (2008). "Open source biology: Finding a niche (or maybe several)". UMKC Law Review. 76 (2). doi:10.2139/ssrn.1114371. ISSN 1556-5068. S2CID 54046895. SSRN 1114371.
+Osterloh, M.; Rota, S. (2007). "Open source software development — Just another case of collective invention?" (PDF). Research Policy. 36 (2): 157–171. doi:10.1016/j.respol.2006.10.004. hdl:10419/214322. ISSN 0048-7333.
+Riehle, D. (April 2007). "The Economic Motivation of Open Source: Stakeholder Perspectives". IEEE Computer. 40 (4): 25–32. doi:10.1109/MC.2007.147. ISSN 0018-9162. S2CID 168544.
+Rossi, M.A. (2006). "Decoding the free/open source software puzzle: A survey of theoretical and empirical contributions" (PDF). In Bitzer, J.; Schröder, P. (eds.). The Economics of Open Source Software Development. Elsevier. pp. 15–55. ISBN 978-0-444-52769-1.
+Sampathkumar, K.S. Understanding FOSS Version 4.0 revised. ISBN 978-8-184-65469-1.
+Schiff, A. (2002). "The Economics of Open Source Software: A Survey of the Early Literature" (PDF). Review of Network Economics. 1 (1): 66–74. doi:10.2202/1446-9022.1004. ISSN 2194-5993. S2CID 201280221. Archived from the original on 7 May 2003.
+Schwarz, M.; Takhteyev, Y. (2010). "Half a Century of Public Software Institutions: Open Source as a Solution to the Hold-Up Problem". Journal of Public Economic Theory. 12 (4): 609–639. CiteSeerX 10.1.1.625.2368. doi:10.1111/j.1467-9779.2010.01467.x. ISSN 1097-3923. S2CID 154317482. earlier revision
+Spagnoletti, P.; Federici, T. (2011). "Exploring the Interplay Between FLOSS Adoption and Organizational Innovation". Communications of the Association for Information Systems. 29 (15): 279–298. doi:10.17705/1CAIS.02915.
--- a/data/en.wikipedia.org/wiki/Open_source-2.md
+++ b/data/en.wikipedia.org/wiki/Open_source-2.md
@ -0,0 +1,31 @@
+---
+title: "Open source"
+chunk: 3/11
+source: "https://en.wikipedia.org/wiki/Open_source"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:49:44.361173+00:00"
+instance: "kb-cron"
+---
+
+The open-source model is a decentralized software development model that encourages open collaboration, meaning "any system of innovation or production that relies on goal-oriented yet loosely coordinated participants who interact to create a product (or service) of economic value, which they make available to contributors and noncontributors alike." A main principle of open-source software development is peer production, with products such as source code, blueprints, and documentation freely available to the public. The open-source movement in software began as a response to the limitations of proprietary code. The model is used for projects such as in open-source appropriate technology, and open-source drug discovery.
+The open-source model for software development inspired the use of the term to refer to other forms of open collaboration, such as in Internet forums, mailing lists and online communities. Open collaboration is also thought to be the operating principle underlining a gamut of diverse ventures, including TEDx and Wikipedia.
+Open collaboration is the principle underlying peer production, mass collaboration, and wikinomics. It was observed initially in open-source software, but can also be found in many other instances, such as in Internet forums, mailing lists, Internet communities, and many instances of open content, such as Creative Commons. It also explains some instances of crowdsourcing, collaborative consumption, and open innovation.
+Riehle et al. define open collaboration as collaboration based on three principles of egalitarianism, meritocracy, and self-organization. Levine and Prietula define open collaboration as "any system of innovation or production that relies on goal-oriented yet loosely coordinated participants who interact to create a product (or service) of economic value, which they make available to contributors and noncontributors alike." This definition captures multiple instances, all joined by similar principles. For example, all of the elements – goods of economic value, open access to contribute and consume, interaction and exchange, purposeful yet loosely coordinated work – are present in an open-source software project, in Wikipedia, or in a user forum or community. They can also be present in a commercial website that is based on user-generated content. In all of these instances of open collaboration, anyone can contribute and anyone can freely partake in the fruits of sharing, which are produced by interacting participants who are loosely coordinated.
+An annual conference dedicated to the research and practice of open collaboration is the International Symposium on Wikis and Open Collaboration (OpenSym, formerly WikiSym). As per its website, the group defines open collaboration as "collaboration that is egalitarian (everyone can join, no principled or artificial barriers to participation exist), meritocratic (decisions and status are merit-based rather than imposed) and self-organizing (processes adapt to people rather than people adapt to pre-defined processes)."
+
+== Open-source license ==
+
+Open source promotes universal access via an open-source or free license to a product's design or blueprint, and universal redistribution of that design or blueprint. Before the phrase open source became widely adopted, developers and producers used a variety of other terms. Open source gained hold in part due to the rise of the Internet. The open-source software movement arose to clarify copyright, licensing, domain, and consumer issues.
+An open-source license is a type of license for computer software and other products that allows the source code, blueprint or design to be used, modified or shared (with or without modification) under defined terms and conditions. This allows end users and commercial companies to review and modify the source code, blueprint or design for their own customization, curiosity or troubleshooting needs. Open-source licensed software is mostly available free of charge, though this does not necessarily have to be the case. Licenses which only permit non-commercial redistribution or modification of the source code for personal use only are generally not considered as open-source licenses. However, open-source licenses may have some restrictions, particularly regarding the expression of respect to the origin of software, such as a requirement to preserve the name of the authors and a copyright statement within the code, or a requirement to redistribute the licensed software only under the same license (as in a copyleft license). One popular set of open-source software licenses are those approved by the Open Source Initiative (OSI) based on their Open Source Definition (OSD).
+
+== Applications ==
+
+Social and political views have been affected by the growth of the concept of open source. Advocates in one field often support the expansion of open source in other fields. But Eric Raymond and other founders of the open-source movement have sometimes publicly argued against speculation about applications outside software, saying that strong arguments for software openness should not be weakened by overreaching into areas where the story may be less compelling. The broader impact of the open-source movement, and the extent of its role in the development of new information sharing procedures, remain to be seen.
+The open-source movement has inspired increased transparency and liberty in biotechnology research, for example CAMBIA Even the research methodologies themselves can benefit from the application of open-source principles. It has also given rise to the rapidly-expanding open-source hardware movement.
+
+=== Computer software ===
+
+Open-source software is software which source code is published and made available to the public, enabling anyone to copy, modify and redistribute the source code without paying royalties or fees.
+LibreOffice and the GNU Image Manipulation Program are examples of open source software. As they do with proprietary software, users must accept the terms of a license when they use open source software—but the legal terms of open source licenses differ dramatically from those of proprietary licenses.
+Open-source code can evolve through community cooperation. These communities are composed of individual programmers as well as large companies. Some of the individual programmers who start an open-source project may end up establishing companies offering products or services incorporating open-source programs. Examples of open-source software products are:
--- a/data/en.wikipedia.org/wiki/Open_source-3.md
+++ b/data/en.wikipedia.org/wiki/Open_source-3.md
@ -0,0 +1,61 @@
+---
+title: "Open source"
+chunk: 4/11
+source: "https://en.wikipedia.org/wiki/Open_source"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:49:44.361173+00:00"
+instance: "kb-cron"
+---
+
+Linux (that much of world's server parks are running)
+MediaWiki (that Wikipedia is based upon)
+Many more:
+List of free and open-source software packages
+List of formerly proprietary software
+The Google Summer of Code, often abbreviated to GSoC, is an international annual program in which Google awards stipends to contributors who successfully complete a free and open-source software coding project during the summer.  GSoC is a large scale project with 202 participating organizations in 2021.  There are similar smaller scale projects such as the Talawa Project run by the Palisadoes Foundation (a non profit based in California, originally to  promote the use of information technology in Jamaica, but now also supporting underprivileged communities in the US)
+
+=== Electronics ===
+
+Open-source hardware is hardware which initial specification, usually in a software format, is published and made available to the public, enabling anyone to copy, modify and redistribute the hardware and source code without paying royalties or fees. Open-source hardware evolves through community cooperation. These communities are composed of individual hardware/software developers, hobbyists, as well as very large companies. Examples of open-source hardware initiatives are:
+
+Openmoko: a family of open-source mobile phones, including the hardware specification and the operating system.
+OpenRISC: an open-source microprocessor family, with architecture specification licensed under GNU GPL and implementation under LGPL.
+Sun Microsystems's OpenSPARC T1 Multicore processor. Sun has released it under GPL.
+Arduino, a microcontroller platform for hobbyists, artists and designers.
+Simputer, an open hardware handheld computer, designed in India for use in environments where computing devices such as personal computers are deemed inappropriate.
+LEON: A family of open-source microprocessors distributed in a library with peripheral IP cores, open SPARC V8 specification, implementation available under GNU GPL.
+Tinkerforge: A system of open-source stackable microcontroller building blocks. Allows control of motors and read out sensors with the programming languages C, C++, C#, Object Pascal, Java, PHP, Python and Ruby over a USB or Wifi connection on Windows, Linux and Mac OS X. All of the hardware is licensed under CERN OHL (CERN Open Hardware License).
+Open Compute Project: designs for computer data center including power supply, Intel motherboard, AMD motherboard, chassis, racks, battery cabinet, and aspects of electrical and mechanical design.
+
+=== Food and beverages ===
+
+Some publishers of open-access journals have argued that data from food science and gastronomy studies should be freely available to aid reproducibility. A number of people have published creative commons licensed recipe books.
+
+Open-source colas – cola soft drinks, similar to Coca-Cola and Pepsi, whose recipe is open source and developed by volunteers. The taste is said to be comparable to that of the standard beverages. Most corporations producing beverages keep their formulas secret and unknown to the general public.
+Free Beer (originally Vores Øl) – is an open-source beer created by students at the IT-University in Copenhagen together with Superflex, an artist collective, to illustrate how open-source concepts might be applied outside the digital world.
+
+=== Digital content ===
+
+Open-content projects organized by the Wikimedia Foundation – Sites such as Wikipedia and Wiktionary have embraced the open-content Creative Commons content licenses. These licenses were designed to adhere to principles similar to various open-source software development licenses. Many of these licenses ensure that content remains free for re-use, that source documents are made readily available to interested parties, and that changes to content are accepted easily back into the system. Important sites embracing open-source-like ideals are Project Gutenberg and Wikisource, both of which post many books on which the copyright has expired and are thus in the public domain, ensuring that anyone has free, unlimited access to that content.
+Open ICEcat is an open catalog for the IT, CE and Lighting sectors with product data-sheets based on Open Content License agreement. The digital content are distributed in XML and URL formats.
+SketchUp's 3D Warehouse is an open-source design community centered around the use of proprietary software that's distributed free of charge.
+The University of Waterloo Stratford Campus invites students every year to use its three-storey Christie MicroTiles wall as a digital canvas for their creative work.
+
+=== Medicine ===
+Pharmaceuticals – There have been several proposals for open-source pharmaceutical development, which led to the establishment of the Tropical Disease Initiative and the Open Source Drug Discovery for Malaria Consortium.
+Genomics – The term "open-source genomics" refers to the combination of rapid release of sequence data (especially raw reads) and crowdsourced analyses from bioinformaticians around the world that characterized the analysis of the 2011 E. coli O104:H4 outbreak.
+OpenEMR – OpenEMR is an ONC-ATB Ambulatory EHR 2011-2012 certified electronic health records and medical practice management application. It features fully integrated electronic health, records, practice management, scheduling, electronic billing, and is the base for many EHR programs.
+
+=== Science and engineering ===
+
+Research – The Science Commons was created as an alternative to the expensive legal costs of sharing and reusing scientific works in journals etc.
+Research – The Open Solar Outdoors Test Field (OSOTF) is a grid-connected photovoltaic test system, which continuously monitors the output of a number of photovoltaic modules and correlates their performance to a long list of highly accurate meteorological readings. The OSOTF is organized under open-source principles – All data and analysis is to be made freely available to the entire photovoltaic community and the general public.
+Construction – WikiHouse is an open-source project for designing and building houses.
+Energy research – The Open Energy Modelling Initiative promotes open-source models and open data in energy research and policy advice.
+
+==== Robotics ====
+
+An open-source robot is a robot whose blueprints, schematics, or source code are released under an open-source model.
+
+=== Other ===
--- a/data/en.wikipedia.org/wiki/Open_source-4.md
+++ b/data/en.wikipedia.org/wiki/Open_source-4.md
@ -0,0 +1,52 @@
+---
+title: "Open source"
+chunk: 5/11
+source: "https://en.wikipedia.org/wiki/Open_source"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:49:44.361173+00:00"
+instance: "kb-cron"
+---
+
+Open-source principles can be applied to technical areas such as digital communication protocols and data storage formats.
+Open-design – which involves applying open-source methodologies to the design of artifacts and systems in the physical world. It is very nascent but has huge potential.
+Open-source appropriate technology (OSAT) refers to technologies that are designed in the same fashion as free and open-source software. These technologies must be "appropriate technology" (AT) – meaning technology that is designed with special consideration to the environmental, ethical, cultural, social, political, and economic aspects of the community it is intended for. An example of this application is the use of open-source 3D printers like the RepRap to manufacture appropriate technology.
+Teaching – which involves applying the concepts of open source to instruction using a shared web space as a platform to improve upon learning, organizational, and management challenges. An example of an Open-source courseware is the Java Education & Development Initiative (JEDI). Other examples include Khan Academy and wikiversity. At the university level, the use of open-source-appropriate technology classroom projects has been shown to be successful in forging the connection between science/engineering and social benefit: This approach has the potential to use university students' access to resources and testing equipment in furthering the development of appropriate technology. Similarly OSAT has been used as a tool for improving service learning.
+There are few examples of business information (methodologies, advice, guidance, practices) using the open-source model, although this is another case where the potential is enormous. ITIL is close to open source. It uses the Cathedral model (no mechanism exists for user contribution) and the content must be bought for a fee that is small by business consulting standards (hundreds of British pounds). Various checklists are published by government, banks or accounting firms.
+An open-source group emerged in 2012 that is attempting to design a firearm that may be downloaded from the internet and "printed" on a 3D Printer. Calling itself Defense Distributed, the group wants to facilitate "a working plastic gun that could be downloaded and reproduced by anybody with a 3D printer".
+Agrecol, a German NGO has developed an open-source licence for seeds operating with copyleft and created OpenSourceSeeds as a respective service provider. Breeders that apply the license to their new invented material prevent it from the threat of privatisation and help to establish a commons-based breeding sector as an alternative to the commercial sector.
+Open Source Ecology, farm equipment and global village construction kit.
+
+== "Open" versus "free" versus "free and open" ==
+Free and open-source software (FOSS) or free/libre and open-source software (FLOSS) is openly shared source code that is licensed without any restrictions on usage, modification, or distribution. Confusion persists about this definition because the "free", also known as "libre", refers to the freedom of the product, not the price, expense, cost, or charge. For example, "being free to speak" is not the same as "free beer".
+Conversely, Richard Stallman argues the "obvious meaning" of term "open source" is that the source code is public/accessible for inspection, without necessarily any other rights granted, although the proponents of the term say the conditions in the Open Source Definition must be fulfilled.
+"Free and open" should not be confused with public ownership (state ownership), deprivatization (nationalization), anti-privatization (anti-corporate activism), or transparent behavior.
+
+GNU
+GNU Manifesto
+Richard Stallman
+Gratis versus libre (no cost vs no restriction)
+
+== Software ==
+
+Generally, open source refers to a computer program in which the source code is available to the general public for use for any (including commercial) purpose, or modification from its original design. Open-source code is meant to be a collaborative effort, where programmers improve upon the source code and share the changes within the community. Code is released under the terms of a software license. Depending on the license terms, others may then download, modify, and publish their version (fork) back to the community.
+
+List of free and open-source software packages
+Open-source license, a copyright license that makes the source code available with a product
+The Open Source Definition, as used by the Open Source Initiative for open source software
+Open-source model, a decentralized software development model that encourages open collaboration
+Open-source software, software which permits the use and modification of its source code
+History of free and open-source software
+Open-source software advocacy
+Open-source software development
+Open-source-software movement
+Open-source video games
+List of open-source video games
+Business models for open-source software
+Comparison of open-source and closed-source software
+Diversity in open-source software
+MapGuide Open Source, a web-based map-making platform to develop and deploy web mapping applications and geospatial web services (not to be confused with OpenStreetMap (OSM), a collaborative project to create a free editable map of the world).
+
+== Hardware ==
+
+RISC-V
--- a/data/en.wikipedia.org/wiki/Open_source-5.md
+++ b/data/en.wikipedia.org/wiki/Open_source-5.md
@ -0,0 +1,73 @@
+---
+title: "Open source"
+chunk: 6/11
+source: "https://en.wikipedia.org/wiki/Open_source"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:49:44.361173+00:00"
+instance: "kb-cron"
+---
+
+== Agriculture, economy, manufacturing and production ==
+Open-source appropriate technology (OSAT), is designed for environmental, ethical, cultural, social, political, economic, and community aspects
+Open-design movement, development of physical products, machines and systems via publicly shared design information, including free and open-source software and open-source hardware, among many others:
+Open Architecture Network, improving global living conditions through innovative sustainable design
+OpenCores, a community developing digital electronic open-source hardware
+Open Design Alliance, develops Teigha, a software development platform to create engineering applications including CAD software
+Open Hardware and Design Alliance (OHANDA), sharing open hardware and designs via free online services
+Open Source Ecology (OSE), a network of farmers, engineers, architects and supporters striving to manufacture the Global Village Construction Set (GVCS)
+OpenStructures (OSP), a modular construction model where everyone designs on the basis of one shared geometrical OS grid
+Open manufacturing or "Open Production" or "Design Global, Manufacture Local", a new socioeconomic production model to openly and collaboratively produce and distribute physical objects
+Open-source architecture (OSArc), emerging procedures in imagination and formation of virtual and real spaces within an inclusive universal infrastructure
+Open-source cola, cola soft drinks made to open-sourced recipes
+Open-source hardware, or open hardware, computer hardware, such as microprocessors, that is designed in the same fashion as open source software
+List of open-source hardware projects
+Open-source product development (OSPD), collaborative product and process openness of open-source hardware for any interested participants
+Open-source robotics, physical artifacts of the subject are offered by the open design movement
+Open Source Seed Initiative, open source varieties of crop seeds, as an alternative to patent-protected seeds sold by large agriculture companies.
+
+== Science and medicine ==
+Open science, the movement to make scientific research, data and dissemination accessible to all levels of an inquiring society, amateur or professional
+Open science data, a type of open data focused on publishing observations and results of scientific activities available for anyone to analyze and reuse
+Open Science Framework and the Center for Open Science
+Open Source Lab (disambiguation), several laboratories
+Open-Source Lab (book), a 2014 book by Joshua M. Pearce
+Open-notebook science, the practice of making the entire primary record of a research project publicly available online as it is recorded
+Open Source Physics (OSP), a National Science Foundation and Davidson College project to spread the use of open source code libraries that take care of much of the heavy lifting for physics
+Open Source Geospatial Foundation
+NASA Open Source Agreement (NOSA), an OSI-approved software license
+List of open-source software for mathematics
+List of open-source bioinformatics software
+List of open-source health software
+List of open-source health hardware
+
+== Media ==
+Open-source film, open source movies
+List of open-source films
+Open Source Cinema, a collaborative website to produce a documentary film
+Open-source journalism, commonly describes a spectrum on online publications, forms of innovative publishing of online journalism, and content voting, rather than the sourcing of news stories by "professional" journalists
+Open-source investigation
+See also: Crowdsourcing, crowdsourced journalism, crowdsourced investigation, trutherism, and historical revisionism considered "fringe" by corporate media.
+Open-source record label, open source music
+"Open Source", a 1960s rock song performed by The Magic Mushrooms
+Open Source (radio show), a radio show using open content information gathering methods hosted by Christopher Lydon
+Open textbook, an open copyright licensed textbook made freely available online for students, teachers, and the public
+CAD libraries - such as SketchUp 3D Warehouse and GrabCAD
+
+== Organizations ==
+Open Source Initiative (OSI), an organization dedicated to promote open source
+Open Source Software Institute
+Journal of Open Source Software
+Open Source Day, the dated varies from year to year for an international conference for fans of open solutions from Central and Eastern Europe
+Open Source Developers' Conference
+Open Source Development Labs (OSDL), a non-profit corporation that provides space for open-source project
+Open Source Drug Discovery, a collaborative drug discovery platform for neglected tropical diseases
+Open Source Technology Group (OSTG), news, forums, and other SourceForge resources for IT
+Open source in Kosovo
+Open Source University Meetup
+New Zealand Open Source Awards
+
+== Procedures ==
+Open security, application of open source philosophies to computer security
+Open Source Information System, the former name of an American unclassified network serving the U.S. intelligence community with open-source intelligence, since mid-2006 the content of OSIS is now known as Intelink-U while the network portion is known as DNI-U
+Open-source intelligence, an intelligence gathering discipline based on information collected from open sources (not to be confused with open-source artificial intelligence such as Mycroft (software)).
--- a/data/en.wikipedia.org/wiki/Open_source-6.md
+++ b/data/en.wikipedia.org/wiki/Open_source-6.md
@ -0,0 +1,32 @@
+---
+title: "Open source"
+chunk: 7/11
+source: "https://en.wikipedia.org/wiki/Open_source"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:49:44.361173+00:00"
+instance: "kb-cron"
+---
+
+== Society ==
+The rise of open-source culture in the 20th century resulted from a growing tension between creative practices that involve require access to content that is often copyrighted, and restrictive intellectual property laws and policies governing access to copyrighted content. The two main ways in which intellectual property laws became more restrictive in the 20th century were extensions to the term of copyright (particularly in the United States) and penalties, such as those articulated in the Digital Millennium Copyright Act (DMCA), placed on attempts to circumvent anti-piracy technologies.
+Although artistic appropriation is often permitted under fair-use doctrines, the complexity and ambiguity of these doctrines create an atmosphere of uncertainty among cultural practitioners. Also, the protective actions of copyright owners create what some call a "chilling effect" among cultural practitioners.
+The idea of an "open-source" culture runs parallel to "Free Culture", but is substantively different. Free culture is a term derived from the free software movement, and in contrast to that vision of culture, proponents of open-source culture (OSC) maintain that some intellectual property law needs to exist to protect cultural producers. Yet they propose a more nuanced position than corporations have traditionally sought. Instead of seeing intellectual property law as an expression of instrumental rules intended to uphold either natural rights or desirable outcomes, an argument for OSC takes into account diverse goods (as in "the Good life") and ends.
+Sites such as ccMixter offer up free web space for anyone willing to license their work under a Creative Commons license. The resulting cultural product is then available to download free (generally accessible) to anyone with an Internet connection. Older, analog technologies such as the telephone or television have limitations on the kind of interaction users can have.
+Through various technologies such as peer-to-peer networks and blogs, cultural producers can take advantage of vast social networks to distribute their products. As opposed to traditional media distribution, redistributing digital media on the Internet can be virtually costless. Technologies such as BitTorrent and Gnutella take advantage of various characteristics of the Internet protocol (TCP/IP) in an attempt to totally decentralize file distribution.
+
+=== Government ===
+Open politics (sometimes known as Open-source politics) is a political process that uses Internet technologies such as blogs, email and polling to provide for a rapid feedback mechanism between political organizations and their supporters. There is also an alternative conception of the term Open-source politics which relates to the development of public policy under a set of rules and processes similar to the open-source software movement.
+Open-source governance is similar to open-source politics, but it applies more to the democratic process and promotes the freedom of information.
+Open-source political campaigns refer specifically to political campaigns.
+The South Korean government wants to increase its use of free and open-source software, to decrease its dependence on proprietary software solutions. It plans to make open standards a requirement, to allow the government to choose between multiple operating systems and web browsers. Korea's Ministry of Science, ICT & Future Planning is also preparing ten pilots on using open-source software distributions.
+
+=== Ethics ===
+Open-source ethics is split into two strands:
+
+Open-source ethics as an ethical school – Charles Ess and David Berry are researching whether ethics can learn anything from an open-source approach. Ess famously even defined the AoIR Research Guidelines as an example of open-source ethics.
+Open-source ethics as a professional body of rules – This is based principally on the computer ethics school, studying the questions of ethics and professionalism in the computer industry in general and software development in particular.
+
+=== Religion ===
+
+Irish philosopher Richard Kearney has used the term "open-source Hinduism" to refer to the way historical figures such as Mohandas Gandhi and Swami Vivekananda worked upon this ancient tradition.
--- a/data/en.wikipedia.org/wiki/Open_source-7.md
+++ b/data/en.wikipedia.org/wiki/Open_source-7.md
@ -0,0 +1,21 @@
+---
+title: "Open source"
+chunk: 8/11
+source: "https://en.wikipedia.org/wiki/Open_source"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:49:44.361173+00:00"
+instance: "kb-cron"
+---
+
+=== Media ===
+Open-source journalism formerly referred to the standard journalistic techniques of news gathering and fact checking, reflecting open-source intelligence, a similar term used in military intelligence circles. Now, open-source journalism commonly refers to forms of innovative publishing of online journalism, rather than the sourcing of news stories by a professional journalist. In the 25 December 2006 issue of TIME magazine this is referred to as user created content and listed alongside more traditional open-source projects such as OpenSolaris and Linux.
+Weblogs, or blogs, are another significant platform for open-source culture. Blogs consist of periodic, reverse chronologically ordered posts, using a technology that makes webpages easily updatable with no understanding of design, code, or file transfer required. While corporations, political campaigns and other formal institutions have begun using these tools to distribute information, many blogs are used by individuals for personal expression, political organizing, and socializing. Some, such as LiveJournal or WordPress, use open-source software that is open to the public and can be modified by users to fit their own tastes. Whether the code is open or not, this format represents a nimble tool for people to borrow and re-present culture; whereas traditional websites made the illegal reproduction of culture difficult to regulate, the mutability of blogs makes "open sourcing" even more uncontrollable since it allows a larger portion of the population to replicate material more quickly in the public sphere.
+Messageboards are another platform for open-source culture. Messageboards (also known as discussion boards or forums), are places online where people with similar interests can congregate and post messages for the community to read and respond to. Messageboards sometimes have moderators who enforce community standards of etiquette such as banning spammers. Other common board features are private messages (where users can send messages to one another) as well as chat (a way to have a real time conversation online) and image uploading. Some messageboards use phpBB, which is a free open-source package. Where blogs are more about individual expression and tend to revolve around their authors, messageboards are about creating a conversation amongst its users where information can be shared freely and quickly. Messageboards are a way to remove intermediaries from everyday life—for instance, instead of relying on commercials and other forms of advertising, one can ask other users for frank reviews of a product, movie or CD. By removing the cultural middlemen, messageboards help speed the flow of information and exchange of ideas.
+OpenDocument is an open document file format for saving and exchanging editable office documents such as text documents (including memos, reports, and books), spreadsheets, charts, and presentations. Organizations and individuals that store their data in an open format such as OpenDocument avoid being locked into a single software vendor, leaving them free to switch software if their current vendor goes out of business, raises their prices, changes their software, or changes their licensing terms to something less favorable.
+Open-source movie production is either an open call system in which a changing crew and cast collaborate in movie production, a system in which the result is made available for re-use by others or in which exclusively open-source products are used in the production. The 2006 movie Elephants Dream is said to be the "world's first open movie", created entirely using open-source technology.
+An open-source documentary film has a production process allowing the open contributions of archival material footage, and other filmic elements, both in unedited and edited form, similar to crowdsourcing. By doing so, on-line contributors become part of the process of creating the film, helping to influence the editorial and visual material to be used in the documentary, as well as its thematic development. The first open-source documentary film is the non-profit WBCN and the American Revolution, which went into development in 2006, and will examine the role media played in the cultural, social and political changes from 1968 to 1974 through the story of radio station WBCN-FM in Boston. The film is being produced by Lichtenstein Creative Media and the non-profit Center for Independent Documentary. Open Source Cinema is a website to create Basement Tapes, a feature documentary about copyright in the digital age, co-produced by the National Film Board of Canada.
+Open-source film-making refers to a form of film-making that takes a method of idea formation from open-source software, but in this case the 'source' for a filmmaker is raw unedited footage rather than programming code. It can also refer to a method of film-making where the process of creation is 'open' i.e. a disparate group of contributors, at different times contribute to the final piece.
+Open-IPTV is IPTV that is not limited to one recording studio, production studio, or cast. Open-IPTV uses the Internet or other means to pool efforts and resources together to create an online community that all contributes to a show.
+
+=== Education ===
--- a/data/en.wikipedia.org/wiki/Open_source-8.md
+++ b/data/en.wikipedia.org/wiki/Open_source-8.md
@ -0,0 +1,28 @@
+---
+title: "Open source"
+chunk: 9/11
+source: "https://en.wikipedia.org/wiki/Open_source"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:49:44.361173+00:00"
+instance: "kb-cron"
+---
+
+Within the academic community, there is discussion about expanding what could be called the "intellectual commons" (analogous to the Creative Commons). Proponents of this view have hailed the Connexions Project at Rice University, OpenCourseWare project at MIT, Eugene Thacker's article on "open-source DNA", the "Open Source Cultural Database", Salman Khan's Khan Academy and Wikipedia as examples of applying open source outside the realm of computer software.
+Open-source curricula are instructional resources whose digital source can be freely used, distributed and modified. Another strand to the academic community is in the area of research. Many funded research projects produce software as part of their work. Due to the benefits of sharing software openly in scientific endeavours, there is an increasing interest in making the outputs of research projects available under an open-source license. In the UK the Joint Information Systems Committee (JISC) has developed a policy on open-source software. JISC also funds a development service called OSS Watch which acts as an advisory service for higher and further education institutions wishing to use, contribute to and develop open-source software.
+On 30 March 2010, President Barack Obama signed the Health Care and Education Reconciliation Act, which included $2 billion over four years to fund the TAACCCT program, which is described as "the largest OER (open education resources) initiative in the world and uniquely focused on creating curricula in partnership with industry for credentials in vocational industry sectors like manufacturing, health, energy, transportation, and IT".
+
+=== Innovation communities ===
+The principle of sharing pre-dates the open-source movement; for example, the free sharing of information has been institutionalized in the scientific enterprise since at least the 19th century. Open-source principles have always been part of the scientific community. The sociologist Robert K. Merton described the four basic elements of the community—universalism (an international perspective), communalism (sharing information), objectivity (removing one's personal views from the scientific inquiry) and organized skepticism (requirements of proof and review) that describe the (idealised) scientific community.
+These principles are, in part, complemented by US law's focus on protecting expression and method but not the ideas themselves. There is also a tradition of publishing research results to the scientific community instead of keeping all such knowledge proprietary. One of the recent initiatives in scientific publishing has been open access—the idea that research should be published in such a way that it is free and available to the public. There are currently many open access journals where the information is available free online, however most journals do charge a fee (either to users or libraries for access). The Budapest Open Access Initiative is an international effort with the goal of making all research articles available free on the Internet.
+The National Institutes of Health has recently proposed a policy on "Enhanced Public Access to NIH Research Information". This policy would provide a free, searchable resource of NIH-funded results to the public and with other international repositories six months after its initial publication. The NIH's move is an important one because there is significant amount of public funding in scientific research. Many of the questions have yet to be answered—the balancing of profit vs. public access, and ensuring that desirable standards and incentives do not diminish with a shift to open access.
+Benjamin Franklin was an early contributor eventually donating all his inventions including the Franklin stove, bifocals, and the lightning rod to the public domain. New NGO communities are starting to use open-source technology as a tool. One example is the Open Source Youth Network started in 2007 in Lisboa by ISCA members. Open innovation is also a new emerging concept which advocates putting R&D in a common pool. The Eclipse platform is openly presenting itself as an open innovation network.
+
+=== Arts and recreation ===
+Copyright protection is used in the performing arts and even in athletic activities. Some groups have attempted to remove copyright from such practices.
+In 2012, Russian music composer, scientist and Russian Pirate Party member Victor Argonov presented detailed raw files of his electronic opera "2032" under free license CC BY-NC 3.0 (later relicensed under CC BY-SA 4.0). This opera was originally composed and published in 2007 by Russian label MC Entertainment as a commercial product, but then the author changed its status to free. In his blog he said that he decided to open raw files (including wav, midi and other used formats) to the public to support worldwide pirate actions against SOPA and PIPA. Several Internet resources called "2032" the first open-source musical opera in history.
+
+=== Other related movements ===
+
+Notable events and applications that have been developed via the open source community, and echo the ideologies of the open source movement, include the Open Education Consortium, Project Gutenberg, Synthethic Biology, and Wikipedia. The Open Education Consortium is an organization composed of various colleges that support open source and share some of their material online. This organization, headed by Massachusetts Institute of Technology, was established to aid in the exchange of open source educational materials. Wikipedia is a user-generated online encyclopedia with sister projects in academic areas, such as Wikiversity—a community dedicated to the creation and exchange of learning materials.
+Prior to the existence of Google Scholar Beta, Project Gutenberg was the first supplier of electronic books and the first free library project.
--- a/data/en.wikipedia.org/wiki/Open_source-9.md
+++ b/data/en.wikipedia.org/wiki/Open_source-9.md
@ -0,0 +1,62 @@
+---
+title: "Open source"
+chunk: 10/11
+source: "https://en.wikipedia.org/wiki/Open_source"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:49:44.361173+00:00"
+instance: "kb-cron"
+---
+
+=== Ideologically-related movements ===
+The open-access movement is a movement that is similar in ideology to the open source movement. Members of this movement maintain that academic material should be readily available to provide help with "future research, assist in teaching and aid in academic purposes." The open-access movement aims to eliminate subscription fees and licensing restrictions of academic materials. The free-culture movement is a movement that seeks to achieve a culture that engages in collective freedom via freedom of expression, free public access to knowledge and information, full demonstration of creativity and innovation in various arenas, and promotion of citizen liberties. Creative Commons is an organization that "develops, supports, and stewards legal and technical infrastructure that maximizes digital creativity, sharing, and innovation." It encourages the use of protected properties online for research, education, and creative purposes in pursuit of a universal access. Creative Commons provides an infrastructure through a set of copyright licenses and tools that creates a better balance within the realm of "all rights reserved" properties. The Creative Commons license offers a slightly more lenient alternative to "all rights reserved" copyrights for those who do not wish to exclude the use of their material.
+The Zeitgeist Movement (TZM) is an international social movement that advocates a transition into a sustainable "resource-based economy" based on collaboration in which monetary incentives are replaced by commons-based ones with everyone having access to everything (from code to products) as in "open source everything". While its activism and events are typically focused on media and education, TZM is a major supporter of open source projects worldwide since they allow for uninhibited advancement of science and technology, independent of constraints posed by institutions of patenting and capitalist investment. P2P Foundation is an "international organization focused on studying, researching, documenting and promoting peer to peer practices in a very broad sense." Its objectives incorporate those of the open source movement, whose principles are integrated in a larger socio-economic model.
+
+=== Open-weight ===
+Open-weight refers to the release of an artificial intelligence model's trained parameters, or weights, for public use. Unlike fully open-source models, open-weight releases may not include the underlying source code, training data, or full documentation. The availability of weights allows researchers and developers to run, evaluate, or fine-tune the model, though license terms may restrict redistribution or commercial use. The term is commonly used in reference to large language models such as LLaMA and Mistral, which have released model weights under research or custom licenses.
+
+== See also ==
+
+=== Terms based on open source ===
+Open implementation
+Open security
+Open-source record label
+Open standard
+Shared Source
+Source-available software
+Software maintainer
+
+=== Other ===
+Open Sources: Voices from the Open Source Revolution (book)
+Commons-based peer production
+Digital rights
+Diseconomies of scale
+Free content
+Gift economy
+Glossary of legal terms in technology
+Mass collaboration
+Network effect
+Open Source Initiative
+Openness
+Proprietary software
+Digital public goods
+
+== Notes ==
+
+== References ==
+
+== Further reading ==
+
+Benkler, Yochai (2006). The Wealth of Networks: How Social Production Transforms Markets and Freedom (PDF). Yale University Press.
+Berry, David M. (2008). Copy, Rip, Burn: The Politics of Copyleft and Open Source. London: Pluto Press. ISBN 978-0745324142. OCLC 298460562. OL 9409091M.
+Dunlap, Isaac Hunter (2006). Open Source Database Driven Web Development: A Guide for Information Professionals. Oxford: Chandos. ISBN 978-1-84334-161-1. OCLC 679959533. OL 8930417M. Archived from the original on 4 February 2020. Retrieved 9 July 2021.
+Fogel, Karl (14 August 2020). Producing Open Source Software: How to Run a Successful Free Software Project. CreateSpace Independent Publishing Platform. ISBN 9781519343987. OCLC 609841129. OL 55306282M.
+Goldman, Ron; Gabriel, Richard P. (2005). Innovation Happens Elsewhere: Open Source as Business Strategy. Richard P. Gabriel. ISBN 978-1-55860-889-4.
+Kostakis, V.; Bauwens, M. (2014). Network Society and Future Scenarios for a Collaborative Economy. Palgrave Macmillan. ISBN 978-1-137-41506-6. (wiki)
+Nettingsmeier, Jörn. "So What? I Don't Hack!" eContact! 11.3 – Logiciels audio " open source " / Open Source for Audio Application (September 2009). Montréal: CEC.
+Ray, Partha Pratim; Rai, Rebika (2013). Open Source Hardware: An Introductory Approach. Lap Lambert Publishing House. ISBN 978-3-659-46591-8.
+Schrape, Jan-Felix (2019). "Open-source projects as incubators of innovation. From niche phenomenon to integral part of the industry". Convergence. 25 (3): 409–427. doi:10.1177/1354856517735795. ISSN 1354-8565. S2CID 149165772.
+Stallman, Richard M. Free Software Free Society: Selected essays of Richard M. Stallman. Archived from the original on 21 November 2010. Retrieved 9 July 2021.
+Various authors. eContact! 11.3 – Logiciels audio " open source " / Open Source for Audio Application (September 2009). Montréal: CEC.
+Various authors. "Open Source Travel Guide [wiki]". eContact! 11.3 – Logiciels audio " open source " / Open Source for Audio Application (September 2009). Montréal: CEC.
+Weber, Steve (2004). The Success of Open Source. Harvard University Press. ISBN 978-0-674-01292-9.
--- a/data/en.wikipedia.org/wiki/Open_synthetic_biology-0.md
+++ b/data/en.wikipedia.org/wiki/Open_synthetic_biology-0.md
@ -0,0 +1,28 @@
+---
+title: "Open synthetic biology"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Open_synthetic_biology"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T03:49:45.513393+00:00"
+instance: "kb-cron"
+---
+
+Open synthetic biology is the idea that scientific knowledge and data should be openly accessible through common rights licensing to enable the rapid development of safe, effective and commercially viable synthetic biology applications.
+
+
+== Concepts ==
+Its foundational concepts are open science and the Bermuda Principles.
+Open science is the idea that scientific research should be openly shared to enable massive collaboration (e.g., the Polymath Project). The Bermuda Principles is a private accord declaring that all DNA sequence data should be released in publicly accessible databases within 24 hours after generation.
+Open synthetic biology  is a theoretical framework supporting a global ecosystem of responsible and capable research scientists working collaboratively on synthetic biology application development projects to reduce cost, time, and risks of developing new synthetic biology applications (including open synthetic biology therapeutics) from the inception of primary science to applications reaching market readiness and commercial viability.
+Its general principle is that participating research scientists agree to share research, data, findings and results with the open synthetic biology community and the public generally. The Open SynBio community will set standards and expectations of the participants and their "science to market" process and the community will work collaboratively with downstream stakeholders (e.g., investors and business advisors) to ensure public safety and general availability of new synthetic biology applications.
+
+
+== Examples ==
+One example of open synthetic biology is when DNA2.0 donated several artificial gene sequences into an open-access repository run by the BioBricks Foundation.
+
+
+== References ==
+
+
+== Further reading ==