kb/data/en.wikipedia.org/wiki/Uses_of_open_science-3.md

5.6 KiB

title chunk source category tags date_saved instance
Uses of open science 4/9 https://en.wikipedia.org/wiki/Uses_of_open_science reference science, encyclopedia 2026-05-05T03:50:26.910892+00:00 kb-cron

=== Log Analysis === Academic publications have been among the earliest corpora used for log analysis. The first applied studies in the area long predate the web, as interconnected scientific infrastructures were already widely used in North America and Europe by the 1970s and 1980s. In 1983, several studies, pioneered by the Online Computer Library Center, analyzed "transaction logs" left by database users. Logs were stored on magnetic tapes at the time, and a large part of the analysis was devoted to the reformatting and standardization of the data. Standard methods of log analysis were already implemented in these early studies, such as the use of probabilistic approaches based on Markov Chains, in order to identify the more regular patterns of user behavior or the comparison with more user surveys. The use of logs and other reader metrics to measure the reception of academic work has remained marginal. Large commercial databases, like the Web of Science and Scopus, had no incentives to divulge reading statistics and mostly use them for internal purposes. Bibliometric indices based on aggregated citation counts, like the impact factor or the h-index, have been favored as the leading measures of academic impact. Beyond the restrictions imposed by leading publishers, log analysis has raised significant methodological issues. Data logging processes differ significantly depending on the structure of the interface: "The number of full-text downloads may be artificially inflated when publishers require users to view HTML versions before accessing PDF versions or when linking mechanisms". Automated access, including search engine indexers or robots, can also largely distort aggregated visit counts. This uncertainty impedes the comparability of data: "issues such as journal interfaces continue to affect how users interact with content users, making even standardized reports difficult, if not impossible, to compare." Log analysis has been revived in the 2010s due to technological developments and the emergence of large open science platforms. Standards for the retrieval of academic log data have been introduced in the early 2010s, such as COUNTER, PIRUS or MESUR. These standards were, by design, limited to specialized research use due to their integration into academic infrastructures. The development of open-source web analytics software like Matomo has established an emerging standard for log collection. During the same period, publicly funded scientific platforms have started to share use data openly, as part of their enlarged commitment to open science. In Latin America, both Redalyc and SciELO "provide such usage statistics to the public", although they have remained largely underused: "It is surprising that given the availability of these data, nobody has conducted a study analyzing different dimensions of downloads, beyond the overall view counts and "top 10" lists of articles available from time to time on the respective Web portals." In 2011, Michael J. Kurtz and Johan Bollen called for the development of usage bibliometrics, an emerging field that "provides unique opportunities to address the known shortcomings of citation analysis". Increased access to log data from open science platforms has made it possible to publish extensive case studies on SciELO and Redalyc, Érudit, OpenEdition.org, Journal.fi or The Conversation

=== Crosslinking === The web itself and some of its key components (such as search engines) were partly a product of bibliometrics theory. In its original form, it was derived from a bibliographic scientific infrastructure commissioned to Tim Berners-Lee by the CERN for the specific needs of high energy physics, ENQUIRE. The onset of the World Wide Web in the mid-1990s made Garfield's citationist dream more likely to come true. In the world network of hypertexts, not only is the bibliographic reference one of the possible forms taken by a hyperlink inside the electronic version of a scientific article, but the Web itself also exhibits a citation structure, links between web pages being formally similar to bibliographic citations." Consequently, bibliometrics concepts have been incorporated in major communication technologies the search algorithm of Google: "the citation-driven concept of relevance applied to the network of hyperlinks between web pages would revolutionize the way Web search engines let users quickly pick useful materials out of the anarchical universe of digital information." While the web immediately affected reading practices, by creating seamless connections between texts, it did not transform to a similar extent the quantitative analysis of citation data, which remained mostly focused on academic connections. Global analysis of hyperlinking and backlinks makes it possible to extend the citation analysis beyond scholarly publications and recover the expanding scope of open science circulations: "We have witnessed a proliferation of means of disseminating scholarly publications via academic blogs, scientific magazines destined to a wider audience." In 2011, a log analysis of the Kyoto University website identified a highly diversified set of links to scientific publications. In 2019, a study supported by the Aix-Marseille University of crosslinkings to the French open science platform OpenEdition highlighted that "scientific literature from a largely open access hosting platform is re-appropriated and repurposed for various uses in the public arena."