5.3 KiB
| title | chunk | source | category | tags | date_saved | instance |
|---|---|---|---|---|---|---|
| Open scientific data | 4/11 | https://en.wikipedia.org/wiki/Open_scientific_data | reference | science, encyclopedia | 2026-05-05T03:49:42.862927+00:00 | kb-cron |
=== Policy implementations (2010-…) === After 2010, national and supra-national institutions took a more interventionist stance. New policies have been implemented not only to ensure and incentivize the opening of scientific data, usually in continuation to existing open data program. In Europe, the "European Union Commissioner for Research, Science, and Innovation, Carlos Moedas made open research data one of the EU's priorities in 2015." First published in 2016, the FAIR Guiding Principles have become an influential framework for opening scientific data. The principles have been originally designed two years earlier during a policy ad research workshop at Lorentz, Jointly Designing a Data FAIRport. During the deliberations of the workshop, "the notion emerged that, through the definition of, and widespread support for, a minimal set of community-agreed guiding principles and practice" The principles do not attempt to define scientific data, which remains a relatively plastic concept, but strive to describe "what constitutes 'good data management'". They cover four foundational principles, "that serve to guide data producer": Findability, Accessibility, Interoperability, and Reusability. and also aim to provide a step toward machine-actionability by expliciting the underlying semantics of data. As it fully acknowledge the complexity of data management, the principles do not claim to introduce a set of rigid recommendations but rather "degrees of FAIRness", that can be adjusted depending on the organizational costs but also external restrictions in regards to copyright or privacy. The FAIR principles have immediately been coopted by major international organization: "FAIR experienced rapid development, gaining recognition from the European Union, G7, G20 and US-based Big Data to Knowledge (BD2K)" In August 2016, the European Commission set up an expert group to turn "FAIR Data into reality". As of 2020, the FAIR principles remain "the most advanced technical standards for open scientific data to date" In 2022, the French Open Science Monitor started to publish an experimental survey of research data publications from text mining tools. Retrospective analysis showed that the rate of publications mentioning sharing of their associated has nearly doubled in 10 years, from 13% (in 2013) to 22% (in 2021). By the end of the 2010s, open data policy are well supported by scientific communities. Two large surveys commissioned by the European Commission in 2016 and 2018 find a commonly perceived benefit: "74% of researchers say that having access to other data would benefit them" Yet, more qualitative observations gathered in the same investigation also showed that "what scientists proclaim ideally, versus what they actually practice, reveals a more ambiguous situation."
== Diffusion of scientific data ==
=== Publication and edition ===
Until the 2010s, the publication of scientific data referred mostly to "the release of datasets associated with an individual journal article" This release is documented by a Data Accessibility Statement or DAS. Several typologies or data accessibility statements have been proposed. In 2021, Colavizza et al. identified three categories or levels of access:
DAS 1: "Data available on request or similar" DAS 2: "Data available with the paper and its supplementary files" DAS 3: "Data available in a repository" Supplementary data files have appeared in the early phase of the transition to scientific digital publishing. While the format of publications have largely kept the constraints of the printing format, additional materials could be included in "supplementary information". As a publication supplementary data files have an ambiguous status. In theory they are meant to be raw documents, giving access to the background of research. In practice, the released datasets have often to be specially curated for publication. They will usually focus on the primary data sources, not on the entire range of observations or measurements done for the purpose of the research: "Identifying what are "the data" associated with any individual article, conference paper, book, or other publication is often difficult [as] investigators collect data continually." The selection of the data is also further influenced by the publisher. Editorial policy of the journal largely determines "goes in the main text, what in the supplemental information" and editors are especially weary on including large datasets which may be difficult to maintain in the long run. Scientific datasets have been increasingly acknowledged as an autonomous scientific publication. The assimilation of data to academic articles aimed to increase the prestige and recognition of published datasets: "implicit in this argument is that familiarity will encourage data release". This approach has been favored by several publishers and repositories as it made it possible to easily integrate data in existing publishing infrastructure and to extensively reuse editorial concepts initially created around articles Data papers were explicitly introduced as "a mechanism to incentivize data publishing in biodiversity science".