kb/Open_Science_Infrastructure-2.md at a513e0480c18a12606fdbb4d71d57ea17efe5836

turtle89431 f1f480b165 Scrape wikipedia-science: 92 new, 848 updated, 968 total (kb-cron)

2026-05-04 20:49:48 -07:00

6.2 KiB

Raw Blame History

title	chunk	source	category	tags	date_saved	instance
Open Science Infrastructure	3/8	https://en.wikipedia.org/wiki/Open_Science_Infrastructure	reference	science, encyclopedia	2026-05-05T03:49:40.430585+00:00	kb-cron

Scientific projects have been among the earliest use case for digital infrastructure. The theorization of scientific knowledge infrastructure even predates the development of computing technologies. The knowledge network envisioned by Paul Otlet or Vannevar Bush already incorporated numerous features of online scientific infrastructures. After the Second World War, the United States faced a "periodical crisis": existing journals could not keep up with the rapidly increasing scientific output. The issue became politically relevant after the successful launch of Sputnik: "The Sputnik crisis turned the librarians' problem of bibliographic control into a national information crisis." The emerging computing technologies were immediately considered as a potential solution to make a larger amount of scientific output readable and searchable. Access to foreign language publication was also a key issue that was expected to be solved by machine translation: in the 1950s, a significant amount of scientific publications were not available in English, especially the one coming from the Soviet bloc. Influent members of the National Science Foundation like Joshua Ledeberg advocated for the creation of a "centralized information system", SCITEL that would at first coexist with printed journals and gradually replace them altogether on account of its efficiency. In the plan laid out by Ledeberg to Eugen Garfield in November 1961, the deposit would index as much as 1,000,000 scientific articles per year. Beyond full-text searching, the infrastructure would also ensure the indexation of citation and other metadata, as well as the automated translation of foreign language articles. Although it anticipates key features of online scientific platforms, the SCITEL plan was technically irrealistic at the time. The first working prototype on an online retrieval system developed in 1963 by Doug Engelhart and Charles Bourne at the Stanford Research Institute was heavily constrained by memory issues: no more than 10,000 words of a few documents could be indexed.

Instead of a general purpose publishing platform, the early scientific computing infrastructures focused on specific research areas, such as MEDLINE for medicine, NASA/RECON for space engineering or OCLC Worldcat for library search: "most of the earliest online retrieval system provided access to a bibliographic database and the rest used a file containing another sort of information—encyclopedia articles, inventory data, or chemical compounds." This early development of scientific computing affected a large variety of disciplines and communities, including the social sciences: "The 1960s and 1970s saw the establishment of over a dozen services and professional associations to coordinate quantitative data collection". Yet these infrastructures were mostly invisible to researchers, as most of the research was done by professional librarians. Not only were the search operating systems complicated to use, but the search has to be performed very efficiently given the prohibitive cost of long-distance telecommunication. To become technically feasible, scientific infrastructure could never be open and became fundamentally hidden to their end users:

The designers of the first online systems had presumed that searching would be done by end users; that assumption undergirded system design. MEDLINE was intended to be used by medical researchers and clinicians, NASA/RECON was designed for aerospace engineers and scientists. For many reasons, however, most users through the seventies were librarians and trained intermediaries working on behalf of end users. In fact, some professional searchers worried that even allowing eager end users to get at the terminals was a bad idea. The development of digital infrastructure for scientific publication was largely undertaken by private companies. In 1963, Eugene Garfield created the Institute for Scientific Information that aimed to transform the projects initially envisioned with Lederberg into a profitable business. The Science Citation Index relied on a computational processing of citation data. It had a massive and lasting influence on the structuration of global scientific publication in the last decades of the 20th century, as its most important metrics, the Journal Impact Factor, "ultimately came to provide the metric tool needed to structure a competitive market among journal. Garfield also successfully launched Current Contents, a periodic compilation of scientific abstracts that acted as a simplified commercial version of the central deposit envisioned within SCITEL. Rather than being replaced by a centralized information system, leading scientific publishers have been able to develop their own information infrastructure that ultimately reinforced their business position. By the end of the 1960s, the dutch publisher Elsevier and the german publisher Springer have started to computarize their internal data, as well as the management of the journal reviews. Until the advent of the web, the landscape of scientific infrastructures remained fragmented. Projects, and communities relied on their own unconnected networks at a national or institutional level: "the Internet was nearly invisible in Europe because people there were pursuing a separate set of network protocols". The birthing place of the World Wide Web, the CERN, had its own version of Internet, CERN-Net and also supported its own protocol for e-mail exchange. The European Space Agency used its own iteration of the RECON system also used by NASA engineers (ESRO/RECON). The insulated scientific infrastructures could hardly be connected before the advent of the web. Communication between scientific infrastructures was not only challenging across space, but also across time. Whenever a communication protocol was no longer maintained, the data and knowledge it disseminated was likely to disappear as well: "the relationship between historical research and computing has been durably affected by aborted projects, data loss and unrecoverable formats".

6.2 KiB Raw Blame History

6.2 KiB

Raw Blame History