6.6 KiB
| title | chunk | source | category | tags | date_saved | instance |
|---|---|---|---|---|---|---|
| Open scientific data | 3/11 | https://en.wikipedia.org/wiki/Open_scientific_data | reference | science, encyclopedia | 2026-05-05T06:32:28.051815+00:00 | kb-cron |
=== Sharing scientific data on the web (1990-1995) === The World Wide Web was originally conceived as an infrastructure for open scientific data. Sharing of data and data documentation was a major focus in the initial communication of the World Wide Web when the project was first unveiled in August 1991 : "The WWW project was started to allow high energy physicists to share data, news, and documentation. We are very interested in spreading the web to other areas, and having gateway servers for other data". The project stemmed from a close knowledge infrastructure, ENQUIRE. It was an information management software commissioned to Tim Berners-Lee by the CERN for the specific needs of high energy physics. The structure of ENQUIRE was closer to an internal web of data: it connected "nodes" that "could refer to a person, a software module, etc. and that could be interlined with various relations such as made, include, describes and so forth". While it "facilitated some random linkage between information" Enquire was not able to "facilitate the collaboration that was desired for in the international high-energy physics research community". Like any significant computing scientific infrastructure before the 1990s, the development of ENQUIRE was ultimately impeded by the lack of interoperability and the complexity of managing network communications: "although Enquire provided a way to link documents and databases, and hypertext provided a common format in which to display them, there was still the problem of getting different computers with different operating systems to communicate with each other". The web rapidly superseded pre-existing closed infrastructure for scientific data, even when they included more advanced computing features. From 1991 to 1994, users of the Worm Community System, a major biology database on worms, switched to the Web and Gopher. While the Web did not include many advanced functions for data retrieval and collaboration, it was easily accessible. Conversely, the Worm Community System could only be browsed on specific terminals shared across scientific institutions: "To take on board the custom-designed, powerful WCS (with its convenient interface) is to suffer inconvenience at the intersection of work habits, computer use, and lab resources (…) The World-Wide Web, on the other hand, can be accessed from a broad variety of terminals and connections, and Internet computer support is readily available at most academic institutions and through relatively inexpensive commercial services." Publication on the web completely changed the economics of data publishing. While in print "the cost of reproducing large datasets is prohibitive", the storage expenses of most datasets is low. In this new editorial environment, the main limiting factors for data sharing becomes no longer technical or economic but social and cultural.
=== Defining open scientific data (1995-2010) === The development and the generalization of the World Wide Web lifted numerous technical barriers and frictions had constrained the free circulation of data. Yet, scientific data had yet to be defined and new research policy had to be implemented to realize the original vision laid out by Tim Berners-Lee of a web of data. At this point, scientific data has been largely defined through the process of opening scientific data, as the implementation of open policies created new incentives for setting up actionable guidelines, principles and terminologies. Climate research has been a pioneering field in the conceptual definition of open scientific data, as it has been in the construction of the first large knowledge infrastructure in the 1950s and the 1960s. In 1995 the GCDIS articulated a clear commitment On the Full and Open Exchange of Scientific Data: "International programs for global change research and environmental monitoring crucially depend on the principle of full and open data exchange (i.e., data and information are made available without restriction, on a non-discriminatory basis, for no more than the cost of reproduction and distribution). The expansion of the scope and the management of knowledge infrastructures also created to incentives to share data, as the "allocation of data ownership" between a large number of individual and institutional stakeholders has become increasingly complex. Open data creates a simplified framework to ensure that all contributors and users of the data have access to it. Open data has been rapidly identified as a key objective of the emerging open science movement. While initially focused on publications and scholarly articles, the international initiatives in favor of open access expanded their scope to all the main scientific productions. In 2003 the Berlin Declaration supported the diffusion of "original scientific research results, raw data and metadata, source materials and digital representations of pictorial and graphical and scholarly multimedia materials" After 2000, international organizations, like the OECD (Organisation for Economic Co-operation and Development), have played an instrumental role in devising generic and transdisciplinary definitions of scientific data, as open data policies have to be implemented beyond the specific scale of a discipline of a country. One of the first influential definition of scientific data was coined in 1999 by a report of the National Academies of Science: "Data are facts, numbers, letters, and symbols that describe an object, idea, condition, situation, or other factors". In 2004, the Science Ministers of all nations of the OECD signed a declaration which essentially states that all publicly funded archive data should be made publicly available. In 2007 the OECD "codified the principles for access to research data from public funding" through the Principles and Guidelines for Access to Research Data from Public Funding which defined scientific data as "factual records (numerical scores, textual records, images and sounds) used as primary sources for scientific research, and that are commonly accepted in the scientific community as necessary to validate research findings." The Principles acted as soft-law recommendation and affirmed that "access to research data increases the returns from public investment in this area; reinforces open scientific inquiry; encourages diversity of studies and opinion; promotes new areas of work and enables the exploration of topics not envisioned by the initial investigators."