kb/data/en.wikipedia.org/wiki/Open_scientific_data-7.md

5.3 KiB

title chunk source category tags date_saved instance
Open scientific data 8/11 https://en.wikipedia.org/wiki/Open_scientific_data reference science, encyclopedia 2026-05-05T03:49:42.862927+00:00 kb-cron

=== Ownership === Copyright issues with scientific datasets have been further complicated by uncertainties regarding ownership. Research is largely a collaborative activity that involves a wide range of contributions. Initiatives like CRediT (Contributor Roles Taxonomy) have identified 14 different roles, of which 4 are explicitly related to data management (Formal Analysis, Investigation, Data curation and Visualization). In the United States, ownership of research data is usually "determined by the employer of the researcher", with the principal investigator acting as the caretaker of the data rather than the owner. Until the development of research open data, US institutions have been usually more reluctant to waive copyrights on data than on publications, as they are considered strategic assets. In the European Union, there is no largely agreed framework on the ownership of data. The additional rights of external stakeholders has also been raised, especially in the context of medical research. Since the 1970s, patients have claimed some form of ownership of the data produced in the context of clinical trials, notably with important controversies concerning 'whether research subjects and patients actually own their own tissue or DNA."

=== Privacy === Numerous scientific projects rely on data collection of persons, notably in medical research and the social sciences. In such cases, any policy of data sharing has to be necessarily balanced with the preservation and protection of personal data. Researchers and, most specifically, principal investigators have been subjected to obligations of confidentiality in several jurisdictions. Health data has been increasingly regulated since the late 20th century, either by law or by sectorial agreements. In 2014, the European Medicines Agency have introduced important changes to the sharing of clinical trial data, in order to prevent the release of all personal details and all commercially relevant information. Such evolution of the European regulation "are likely to influence the global practice of sharing clinical trial data as open data". Research management plans and practices have to be open, transparent and confidential by design.

=== Free licenses === Open licenses have been the preferred legal framework to clear the restrictions and ambiguities in the legal definition of scientific data. In 2003, the Berlin Declaration called for a universal waiver of reuse rights on scientific contributions that explicitly included "raw data and metadata". In contrast with the development of open licenses for publications which occurred on short time frame, the creation of licenses for open scientific data has been a complicated process. Specific rights, like the sui generis database rights in the European Union or specific legal principles, like the distinction between simple facts and original compilation have not been initially anticipated. Until the 2010s, free licenses could paradoxically add more restrictions to the reuse of datasets, especially in regard with attributions (which is not required for non-copyrighted objects like raw facts): "in such cases, when no rights are attached to research data, then there is no ground for licencing the data" To circumvent the issue several institutions like the Harvard-MIT Data Center started to share the data in the Public Domain. This approach ensures that no right is applied on non-copyrighted items. Yet, the public domain and some associated tools like the Public Domain Mark are not a properly defined legal contract and varies significantly from one jurisdiction to another. First introduced in 2009, the Creative Commons Zero (or CC0) license has been immediately contemplated for data licensing. It has since become "the recommended tool for releasing research data into the public domain". In accordance with the principles of the Berlin Declaration it is not a license but a waiver, as the producer of the data "overtly, fully, permanently, irrevocably and unconditionally waives, abandons, and surrenders all of Affirmer's Copyright and Related Rights". Alternative approaches have included the design of new free license to disentangle the attribution stacking specific to database rights. In 2009, the Open Knowledge Foundation published the Open Database License which has been adopted by major online projects like OpenStreetMap. Since 2015, all the different Creative Commons licenses have been updated to become fully effective on dataset, as database rights have been explicitly anticipated in the 4.0 version.

== Open scientific data management == Data management has recently become a primary focus of the policy and research debate on open scientific data. The influential FAIR principles are voluntarily centered on the key features of "good data management" in a scientific context. In a research context, data management is frequently associated to data lifecycles. Various models of lifecycles in different stage have been theorized by institutions, infrastructures and scientific communities. However, "such lifecycles are a simplification of real life, which is far less linear and more iterative in practice."