28 lines
6.6 KiB
Markdown
28 lines
6.6 KiB
Markdown
---
|
|
title: "Open scientific data"
|
|
chunk: 6/11
|
|
source: "https://en.wikipedia.org/wiki/Open_scientific_data"
|
|
category: "reference"
|
|
tags: "science, encyclopedia"
|
|
date_saved: "2026-05-05T03:49:42.862927+00:00"
|
|
instance: "kb-cron"
|
|
---
|
|
|
|
Research reproducibility: lack of reproducibility is frequently attributed to deficiencies in research transparency and data analysis process. Consequently, as "a rationale for sharing research data, [research reproducibility] is powerful yet problematic". Reproducibility only applies to "certain kinds of research", mostly in regards to experimental sciences.
|
|
Public accessibility: this rationale that "products of public funding should be available to the public" is "found in arguments for open government". While directly inspired by similar arguments made in favor of open access to publications, its range is more limited as scientific open data "has direct benefits to far fewer people, and those benefits vary by stakeholder"
|
|
Research valorization: open scientific data may bring a substantial value to the private sector. This argument is especially used to support "the need for more repositories that can accept and curate research data, for better tools and services to exploit data, and for other investments in knowledge infrastructure".
|
|
Increased research and innovation: open scientific data may significantly enhanced the quality of private and public research. This argument aims for "investing in knowledge infrastructure to sustain research data, curated to high standards of professional practices"
|
|
Yet collaboration between the different actors and stakeholders of the data lifecycle is partial. Even within academic institution, cooperation remains limited: "most researchers are making [data related search] without consulting a data manager or librarian."
|
|
The global open data movement has partly lost its cohesiveness and identity during the 2010s, as debates over data availability and licensing have been overcome by domain specific issues: "When the focus shifts from calling for access to data to creating data infrastructure and putting data to work, the divergent goals of those who formed an initial open data movement come clearly into view and managing the tensions that emerge can be complex." The very generic scope of open data definition that aims to embrace a very wide set of preexisting data cultures does not well take into account the higher threshold of accessibility and contextualization necessitated by scientific research: "open data in the sense of being free for reuse is a necessary but not sufficient condition for research purposes."
|
|
|
|
=== Ideal and implementation: the paradox of data sharing ===
|
|
Since the 2000s, surveys of scientific communities have underlined a consistent discrepancy between the ideals of data sharing and their implementation in practice: "When present-day researchers are asked whether they are willing to share their data, most say yes, they are willing to do so. When the same researchers are asked if they do release their data, they typically acknowledge that they have not done so." Open data culture does not emerge in a vacuum and has to content with preexisting culture of scientific data and a range of systemic factors that can discourage data sharing: "In some fields, scholars are actively discouraged from reusing data. (…) Careers are made by charting territory that was previously uncharted."
|
|
In 2011, 67% of 1329 scientists agree that lack of data sharing is a "major impediment to progress in science." and yet "only about a third (36%) of the respondents agree that others can access their data easily". In 2016, a survey of researchers in the environmental sciences finds overwhelming support for easily accessible open data (99% as at least somewhat important) and funder policies for open data (88%). Yet, "even with willingness to share data there are discrepancies with common practices, e.g. willingness to spend time and resources preparing and up-loading data". A 2022 study of 1792 data sharing statements from BioMed Central found that less 7% of the authors (123) actually provided the data upon requests.
|
|
The prevalence of accessible and findable data is even lower: "Despite several decades of policy moves toward open access to data, the few statistics available reflect low rates of data release or deposit." In a 2011 poll for Science, only 7.6% of researchers shared their data on community repositories with local websites hosted by universities or laboratories being favored instead. Consequently "many bemoaned the lack of common metadata and archives as a main impediment to using and storing data".
|
|
According to Borgmann, the paradox of data sharing is partly due to the limitation of open data policies which tends to focus on "mandating or encouraging investigators to release their data" without meeting the "expected demand for data or the infrastructure necessary to support release and reuse."
|
|
|
|
=== Incentives and barriers to scientific open data ===
|
|
In 2022, Pujol Priego, Wareham and Romasanta stressed that incentives for the sharing of scientific data were primarily collective and include reproducibility, scientific efficiency, scientific quality, along with more individual retributions such as personal credit Individual benefits include increased visibility: open dataset yield a significant citation advantage but only when they have been shared on an open repository
|
|
Important barriers include the need to publish first, legal constraints and concerns about loss of credit of recognition. For individual researchers, datasets may be major assets to barter for "new jobs or new collaborations" and their publication may be difficult to justify unless they "get something of value in return".
|
|
Lack of familiarity with data sharing, rather than a straight rejection of the principles of open science is also ultimately a leading obstacle. Several surveys in the early 2010s have shown that researchers "rarely seek data from other investigators and (…) they rarely are asked for their own data." This creates a negative feedback loop as researchers make little effort to ensure data sharing which in turns discouraged effective use whereas "the heaviest demand for reusing data exists in fields with high mutual dependence." The reality of data reuse may also be underestimated as data is not considered to be a prestigious data publication and the original sources are not quoted.
|
|
According to a 2021 empirical study of 531,889 articles published by PLOS show that soft incentives and encouragements have a limited impact on data sharing: "journal policies that encourage rather than require or mandate DAS [Data Availability Statement] have only a small effect". |