--- title: "Open scientific data" chunk: 2/11 source: "https://en.wikipedia.org/wiki/Open_scientific_data" category: "reference" tags: "science, encyclopedia" date_saved: "2026-05-05T03:49:42.862927+00:00" instance: "kb-cron" --- The emergence of scientific data is associated with a semantic shift in the way core scientific concepts like data, information and knowledge are commonly understood. Following the development of computing technologies, data and information are increasingly described as "things": "Like computation, data always have a material aspect. Data are things. They are not just numbers but also numerals, with dimensionality, weight, and texture". After the Second World War large scientific projects have increasingly relied on knowledge infrastructure to collect, process and analyze important amount of data. Punch-cards system were first used experimentally on climate data in the 1920s and were applied on a large scale in the following decade: "In one of the first Depression-era government make-work projects, Civil Works Administration workers punched some 2 million ship log observations for the period 1880–1933." By 1960, the meteorological data collections of the US National Weather Records Center has expanded to 400 millions cards and had a global reach. The physically of scientific data was by then fully apparent and threatened the stability of entire buildings: "By 1966 the cards occupied so much space that the Center began to fill its main entrance hall with card storage cabinets (figure 5.4). Officials became seriously concerned that the building might collapse under their weight". By the end of the 1960s, knowledge infrastructure have been embedded in a various set of disciplines and communities. The first initiative to create a database of electronic bibliography of open access data was the Educational Resources Information Center (ERIC) in 1966. In the same year, MEDLINE was created – a free access online database managed by the National Library of Medicine and the National Institute of Health (USA) with bibliographical citations from journals in the biomedical area, which later would be called PubMed, currently with over 14 million complete articles. Knowledge infrastructures were also set up in space engineering (with NASA/RECON), library search (with OCLC Worldcat) or the social sciences: "The 1960s and 1970s saw the establishment of over a dozen services and professional associations to coordinate quantitative data collection". === Opening and sharing data: early attempts (1960-1990) === Early discourses and policy frameworks on open scientific data emerged immediately in the wake of the creation of the first large knowledge infrastructure. The World Data Center system (now the World Data System), aimed to make observation data more readily available in preparation for the International Geophysical Year of 1957–1958. The International Council of Scientific Unions (now the International Council for Science) established several World Data Centers to minimize the risk of data loss and to maximize data accessibility, further recommending in 1955 that data be made available in machine-readable form. In 1966, the International Council for Science created CODATA, an initiative to "promote cooperation in data management and use". These early forms of open scientific data did not develop much further. There were too many data frictions and technical resistance to the integration of external data to implement a durable ecosystem of data sharing. Data infrastructures were mostly invisible to researchers, as most of the research was done by professional librarians. Not only were the search operating systems complicated to use, but the search has to be performed very efficiently given the prohibitive cost of long-distance telecommunication. While their conceptors have originally anticipated direct uses by researcher, that could not really emerge due to technical and economic impediment: The designers of the first online systems had presumed that searching would be done by end users; that assumption undergirded system design. MEDLINE was intended to be used by medical researchers and clinicians, NASA/RECON was designed for aerospace engineers and scientists. For many reasons, however, most users through the seventies were librarians and trained intermediaries working on behalf of end users. In fact, some professional searchers worried that even allowing eager end users to get at the terminals was a bad idea. Christine Borgman does not recall any significant policy debates over the meaning, the production and the circulation of scientific data save for a few specific fields (like climatology) after 1966. The insulated scientific infrastructures could hardly be connected before the advent of the web. Projects, and communities relied on their own unconnected networks at a national or institutional level: "the Internet was nearly invisible in Europe because people there were pursuing a separate set of network protocols". Communication between scientific infrastructures was not only challenging across space, but also across time. Whenever a communication protocol was no longer maintained, the data and knowledge it disseminated was likely to disappear as well: "the relationship between historical research and computing has been durably affected by aborted projects, data loss and unrecoverable formats".