kb/data/en.wikipedia.org/wiki/Open_science_monitor-2.md

6.6 KiB

title chunk source category tags date_saved instance
Open science monitor 3/4 https://en.wikipedia.org/wiki/Open_science_monitor reference science, encyclopedia 2026-05-05T06:32:26.766487+00:00 kb-cron

=== Local infrastructures and repositories === Local infrastructures include Current Research Information Systems directly managed by scientific institutions and universities that "help manage, understand, and evaluate research activities". At the institutional level they can bring the most extensive coverage of scientific output, especially taking into account locally published journals that would not necessarily be indexed in global scientific infrastructures. Due to their direct connections with scientific communities, local infrastructures can incentivize researchers to "enter their publications into those systems" and implement a more various range of indicators than what is commonly available in international databases. Local infrastructures are managed in a decentralized way, with varying levels of coverage and information depending on the institutions. In some cases, local repository are "fed solely by the large commercial databases" and will not have any added value. The integration of diverse local sources of data into a common and standardized schemes is a major challenge for open science monitors. The preexistence of ambitious funding policy considerably ease this process, as institutions will be already encouraged to adopt specific norms and metadata requirements. While local infrastructures are generally thoughts as providers of data for an open science monitor, the relationship can go both way. In France of the University of Lorraine implemented its own open science monitor that worked as a local expansion of the French Open Science Monitor.

=== Proprietary databases === Proprietary databases like the Web of Science or Scopus, have long been leading providers of publication metadata and analytics. Yet their integration into open science monitor is not consensual. Proprietary databases have long raised issues of data bias, that are especially problematic in the national context of most open science monitors. Their coverage is usually centered on English-speaking publications and neglects resources with a significant local impact. Moreover, reliance on proprietary platforms create long term dependency with added costs and risks of unsustainability: "Commercial providers require licences to access their services, which vary in price and access type" The French Open Science Monitor is committed to the exclusive use of "public or open datasources". Conversely the German Open Access Monitor currently relies on Dimensions, Web of Science and Scopus, especially to recover "corresponding author information", even though it "looks out for emerging new data sources, especially open sources"

== Methodology == Open science monitors generally aim to bring diverse sources of publication metadata and data into a "central interface" that "enables continuous monitoring at a national level and provides a basis for fact-based decisions and actions." Due to "the complexity of the scholarly publishing system", the building of effective open science monitors and is "no trivial task and involves a multitude of decisions".

=== Data reconciliation === The combination of various bibliometric sources create several challenges. Key metadata can be missing. Entries are also frequently duplicated, as articles are indexed both in local and international databases. Persistent identifiers (PIDs) are a critical component of open science monitors. In theory they make it possible to "unambiguously identify publications, authors, and associated research institutions". Publications in scientific journals can be associated with internationally recognized standards such as DOIs (for the actual publications) or ORCID (for authors), managed by leading international infrastructures like Crossref. Despite the preexistence of international standards, open sciences monitor usually have to introduce their own standardization schemes and identifiers. Limiting the analysis to theses standards would immediately "rule out a certain number of journals that do not adhere to this very general technology of persistent identifiers". Furthermore, other forms of scientific outputs or scientific activities (like funding) do not have the same level of standardization. Even when sources already include persistent identifiers, "some manual standardisation is required", as the original metadata is not always consistent or will not have the same focus. Author affiliation is a crucial information for most of open science monitor, as it makes it possible to discriminate the scientific production of a given country. Yet it will not always be commonly available nor in a systematic manner.

=== Text & data mining === Open science monitor have recently experimented a range of text mining methods to reconstruct missing metadata. Even leading databases can miss key information: on Crossref, institutional affiliations are missing for "75% of the indexed content". Since 2022, the French Open Science Monitor has successfully experimented the use of natural language processing methods and models to detect disciplines or institutional affiliations. For discipline classification, this has led to the development of scientific-tagger, a word embedding model based on Fasttext and trained on two annotated databases, PASCAL and FRANCIS. In 2022, Chaignon and Egret published a systematic reproduction and assessment of the methology of the Monitor in Quantitative Science Studies. Using a mix of proprietary and open databases, they found nearly the same rate of open access publications for the year 2019 (53% vs. 54%) Overall, the open-source strategy used by the BSO proved to be the most efficient approach in comparison with alternative proprietary sources: "The open-source strategy used by the BSO effectively identifies the vast majority of publications with a persistent identifier (DOI) for open science monitoring." Additionally the BSO makes it possible to provide metadata at a "sufficiently fine level to shed light on the geographical, thematic, linguistic, etc. disparities that affect bibliometric studies" Text and data mining methods are especially promising for the indexation of a wider range of open science outputs. Datasets, code, reports or clinical trials have never been systematically cataloged. Since 2022, the national French plan for open science, aims to implement indicators beyond publications and consequently the French Open Science Monitor is working on the data extraction of "references to software and research data" in full text article with experimental deep learning models.

== Uses and impact ==