kb/data/en.wikipedia.org/wiki/Open_scientific_data-4.md

5.9 KiB

title chunk source category tags date_saved instance
Open scientific data 5/11 https://en.wikipedia.org/wiki/Open_scientific_data reference science, encyclopedia 2026-05-05T06:32:28.051815+00:00 kb-cron

=== Citation and indexation === The first digital databases of the 1950s and the 1960s have immediately raised issues of citability and bibliographic descriptions. The mutability of computer memory was especially challenging: in contrast with printed publications, digital data could not be expected to remain stable on the long run. In 1965, Ralph Bisco underlined that this uncertainty affected all the associated documents like code notebooks, which may become increasingly out of date. Data management have to find a middle ground between continuous enhancements and some form of generic stability: "the concept of a fluid, changeable, continually improving data archive means that study cleaning and other processing must be carried to such a point that changes will not significantly affect prior analyses" Structured bibliographic metadata for database has been a debated topic since the 1960s. In 1977, the American Standard for Bibliographic Reference adopted a definition of "data file" with a strong focus on the materiability and the mutability of the dataset: neither dates nor authors were indicated but the medium or "Packaging Method" had to be specified. Two years later, Sue Dodd introduced an alternative convention, that brought the citation of data closer to the standard of references of other scientific publications: Dodd's recommendation included the use of titles, author, editions and date, as well as alternative mentions for sub-documentations like code notebook. The indexation of dataset has been radically transformed by the development of the web, as barriers to data sharing were substantially reduced. In this process, data archiving, sustainability and persistence have become critical issues. Permanent digital object identifiers (or DOI) have been introduced for scientific articles to avoid broken links, as website structures continuously evolved. In the early 2000s, pilot programs started to allocate DOIs to dataset as well While it solves concrete issues of link sustainability, the creation of data DOI and norms of data citation is also part of legitimization process, that assimilate dataset to standard scientific publications and can draw from similar sources of motivation (like the bibliometric indexes) Accessible and findable datasets yield a significant citation advantage. A 2021 study of 531,889 articles published by PLOS estimated that there is a "25.36% relative gain in citation counts in general" for a journal article with "a link to archived data in a public repository". Diffusion of data as a supplementary materials does not yield a significant citation advantage which suggest that "the citation advantage of DAS [Data Availability Statement] is not as much related to their mere presence, but to their contents" As of 2022, the recognition of open scientific data is still an ongoing process. The leading reference software Zotero does not have yet a specific item for dataset.

=== Reuse and economic impact === Within academic research, storage and redundancy has proven to be a significant benefit of open scientific data. In contrast, non-open scientific data is weakly preserved and can only "be retrieved only with considerable effort by the authors" if not completely lost. Analysis of the uses of open scientific data run into the same issues as for any open content: while free, universal and indiscriminate access has demonstrably expanded the scope, range and intensity of the reception it has also made it harder to track, due to the lack of transaction process. These issues are further complicated by the novelty of data as a scientific publication: "In practice, it can be difficult to monitor data reuse, mainly because researchers rarely cite the repository" In 2018, a report of the European Commission estimated the cost of not opening scientific data in accordance with the FAIR principles: it amounted at 10.2 billion annually in direct impact and 16 billions in indirect impact over the entire innovation economy. Implementing open scientific open data at a global scale "would have a considerable impact on the time we spent manipulating data and the way we store data."

== Practices and data culture == The sharing of scientific data is rooted in scientific cultures or communities of practice. As digital tools have become widespread, the infrastructures, the practices and the common representations of research communities have increasingly relied of shared meanings of what is data and what can be done with it. Pre-existing epistemic machineries can be more or less predisposed to data sharing. Important factors may include shared values (individualistic or collective), data ownership allocation and frequent collaborations with external actors which may be reluctant to data sharing.

=== The emergence of an open data culture === The development of scientific open data is not limited to scientific research. It involves a diverse set of stakeholders: "Arguments for sharing data come from many quarters: funding agencies—both public and private—policy bodies such as national academies and funding councils, journal publishers, educators, the public at large, and from researchers themselves." As such, the movement for scientific open data largely intersects with more global movements for open data. Standards definition of open data used by a wide range of public nd private actors have been partly elaborated by researchers around concrete scientific issues. The concept of transparency has especially contributed to create convergences between open science, open data and open government. In 2015, the OECD describe transparency as a common "rationale for open science and open data". Christine Borgman has identified four major rationales for sharing data commonly used across the entire regulatory and public debate over scientific open data: