diff --git a/_index.db b/_index.db
index 79a0f8325..889dec46b 100644
Binary files a/_index.db and b/_index.db differ
diff --git a/data/en.wikipedia.org/wiki/2025_United_States_government_online_resource_removals-0.md b/data/en.wikipedia.org/wiki/2025_United_States_government_online_resource_removals-0.md
new file mode 100644
index 000000000..558fd6810
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/2025_United_States_government_online_resource_removals-0.md
@@ -0,0 +1,32 @@
+---
+title: "2025 United States government online resource removals"
+chunk: 1/4
+source: "https://en.wikipedia.org/wiki/2025_United_States_government_online_resource_removals"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:16:22.498115+00:00"
+instance: "kb-cron"
+---
+
+The 2025 United States government online resource removals are a series of web page and dataset deletions and modifications across multiple United States federal agencies beginning in January 2025. Following executive orders from President Donald Trump's administration, government organizations removed or modified over 8,000 web pages and approximately 3,000 datasets. The changes primarily affected content related to diversity, equity, and inclusion (DEI) initiatives, gender identity, public health research, environmental policy, and various social programs, and other topics Trump and the Republican Party has expressed opposition to. Major affected agencies included the Centers for Disease Control and Prevention, which saw over 3,000 pages altered or removed, and the Census Bureau, which removed about 3,000 pages of research materials. While some content was later restored, the modifications represented significant changes to federal government data accessibility and sparked legal challenges from healthcare advocacy groups.
+
+== Background ==
+Agencies of the United States government share open data for many uses. There are many civic technology, research, and business applications which rely on access to government data. Dataset deletion can be useful maintenance or the result of poor archiving practice. There is little government regulation on dataset management, so it can be challenging to determine when content deletions occur. Determining the reasons for removals and their significance is also difficult. All administrations make modifications to public websites, but there is little research on how much change is typical. There has been past speculation that previous government changes would result in removed access to data, but those removals did not happen.
+In 2009, Data.gov was established to improve public access to high value, machine-readable datasets generated by the Executive Branch of the Federal Government. In 2019, the OPEN Government Data Act ordered agencies to share data that could be used to evaluate the effectiveness of their programs and to guide policymaking. Various federal agencies release data on their own websites.
+In 2019, Trump signed into law the Foundation for Evidence-Based Policymaking Act, which established a system for utilizing data to construct evidence-based policy. Trump's second administration showed a dramatic pivot from this law passed during his first administration.
+In late January 2025, organizations under the Department of Health and Human Services (HHS) paused their external communication during a review.
+
+== Removed and modified content ==
+
+On January 29, 2025, the Office of Personnel Management (OPM) ordered agencies to comply with President Trump's executive order, "Defending Women," which requires federal agencies to "recognize women are biologically female, and men are biologically male". The organizations were told to terminate any programs and remove any outward facing media, documents, materials, communications, and statements that promote "gender ideology" by January 31.
+Agencies also moved quickly to comply with the executive order "Ending Radical Government DEI Programs" by removing forbidden terms from their websites. Census.gov went offline as it attempted to comply with the executive orders "Reevaluating Foreign Aid" and "Defending Women". The information removals and modifications reflected policy changes championed by the Donald Trump 2024 presidential campaign.
+Data removal included topics related to DEI (diversity, equity, and inclusion), long COVID, HIV/AIDS, vaccines, transgender and gender identity-related topics, foreign aid, environmental justice, emergency management, employment, and the January 6 United States Capitol attack. By February 2, 2025, the content removal included more than 8,000 web pages across more than a dozen government websites. According to The New York Times, the removed pages made up approximately 0.1% of all U.S. government web pages.
+Some web pages and documents remain accessible, but were stripped of terminology relating to the prohibited topics. Terms have been replaced across many government web pages; "climate change" was often replaced by "climate resilience", "LGBTQ" replaced by "LGB", and "pregnant people" replaced by "pregnant women". According to The Washington Post, the most common change to web pages was removing DEI-related terms.
+The website modifications also affected older web pages, such as the description of a 2021 conference and a 2022 letter from cabinet secretaries. The Washington Post reported that some pages seemed to be mistakenly modified; the word "diverse" was removed from a page describing the extent of the Department of the Interior's museum collection.
+
+=== CDC website ===
+As of February 2, more than 3,000 pages from the website for the Centers for Disease Control and Prevention have been altered or removed. This included thousands of research papers relating to chronic medical conditions, sexually transmitted infections, Alzheimer's disease, drug overdose prevention, adolescent health, and reproductive care.
+Vaccine guidelines for pregnant people were also removed from the CDC website, which The New York Times noted may have been due to use of the gender-neutral term "pregnant people". One employee said that since HIV-related webpages commonly referenced gender, they had to "take everything down in order to meet the deadline."
+Some data was restored later, such as the Atlas Tool for tracking infectious diseases such as HIV and STIs and information on the Youth Risk Behavior Surveillance System. As of February 6, the CDC website had the notice, "CDC's website is being modified to comply with President Trump's Executive Orders."
+
+=== Science and research websites ===
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/2025_United_States_government_online_resource_removals-1.md b/data/en.wikipedia.org/wiki/2025_United_States_government_online_resource_removals-1.md
new file mode 100644
index 000000000..dff2e1a56
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/2025_United_States_government_online_resource_removals-1.md
@@ -0,0 +1,39 @@
+---
+title: "2025 United States government online resource removals"
+chunk: 2/4
+source: "https://en.wikipedia.org/wiki/2025_United_States_government_online_resource_removals"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:16:22.498115+00:00"
+instance: "kb-cron"
+---
+
+In January, NASA undertook a comprehensive removal of DEI-related content from its public-facing websites. An internal directive instructed employees to "drop everything" and immediately eliminate references to terms such as "DEIA", "indigenous people", "environmental justice", "underrepresented groups/people", and content specifically targeting women, including content about "women in leadership". This purge resulted in the deletion of various materials, including interviews with Black and female NASA employees, LGBTQ-related content, and two NASA-created comic books about women astronauts.
+More than 3,000 pages from the Census Bureau website were removed as of February 2, primarily including articles filed under research and methodology. Pages relating to data stewardship as well as survey and data set documentation were also removed.
+The Food and Drug Administration (FDA) removed more than 100 pages as of February 2, including dozens of regulatory guidelines on topics such as increasing diversity in clinical trials and the potential for addiction and abuse in drug studies.
+Close to 50 research papers from the Office of Scientific and Technical Information – part of the Department of Energy – were removed as of February 2. The removed papers covered a range of subjects, such as chemistry, optics, and experimental medicine.
+Twenty pages from the National Institute of Standards and Technology (NIST) website were removed as of February 2, including a page documenting the organization's zero-tolerance harassment policy.
+The environmental justice mapping and screening tool, EJScreen, was removed from the Environment Protection Agency (EPA) on February 5, along several related pages. Public Environmental Data Project (PEDP) published a reconstruction of one of its earlier versions.
+In March 2025, an unknown executive order signed by President Donald Trump resulted in the NOAA Radar Next Program Overview document being removed from NOAA servers.
+The NOAA maintains a list of resources and products it retires. On May 31, the entire climate.gov team was fired, likely shutting down the site. The National Climate Assessment reports, congressionally mandated under the Global Change Research Act of 1990, were taken offline, and the 400 scientists working on the 2027 assessment were fired.
+
+=== Justice and crime websites ===
+At least 1,000 pages from the Office of Justice Programs, a crime prevention research organization, were removed as of February 2. This included information on violence in teenage dating, and a blog post regarding grants that went toward combating hate crimes.
+The Department of Justice (DOJ) removed over 180 pages as of February 2, including all state-level crime data and seven pages with information on anti-LGBTQ hate crimes.
+The Marshals Service saw two pages removed, relating to correctional facility standards and fitness readiness requirements.
+The National Law Enforcement Accountability Database, which tracked federal police officer misconduct, was removed as of February 20.
+In March, the Department of Justice deleted the page about a study showing that undocumented immigrants commit less crime than citizens.
+In September 2025, a study conducted by the National Institute of Justice showing that white supremacist and far-right violence were the most common forms of terrorism and domestic violent extremism in the United States was deleted.
+The Not One More Report, on missing and murdered Native Americans, disappeared from the Department of Justice's website in February 2025; the administration said that the report, mandated by Congress by the Not Invisible Act, was removed to ensure compliance with one of Trump's executive orders.
+
+=== Healthcare and social services websites ===
+Head Start, a U.S. federal aid program for low-income childcare, had over 200 pages removed as of February 2, including advice on establishing familial routines and guidance to help prevent postpartum depression. The removals followed a freeze of federal funds to the program days earlier.
+As of February 2, nearly 150 pages had been removed from the Substance Abuse and Mental Health Services Administration website. This included more than 50 press releases about using a helpline following shootings or natural disasters.
+The Health Resources and Services Administration deleted 18 pages from their website as of February 2, including information on the Mpox vaccine and opioid addiction among women.
+Three pages from the Department of Veterans Affairs were removed as of February 2, including information on healthcare for minority and LGBTQ veterans, as well as the equity of the Southeast Louisiana Veterans Health Care System.
+ReproductiveRights.gov, an HHS website providing information on reproductive care, was taken offline. The website was launched by the Biden administration following the overruling of Roe v. Wade.
+On February 13, Garey Rice, the principal deputy assistant secretary for operations at HHS, declared that DOGE employees grafted to the agency have "full access to all
+unclassified agency records and software and IT systems" and are tasked, among other things, with the obligation to "destroy or erase copied HHS data or information when no longer needed for official purposes."
+As of April 4, 2025, over 20 National Institutes of Health (NIH) data repositories displayed headers stating "This repository is under review for potential modification in compliance with Administration directives." These repositories contain petabytes of data that are used for public health research in diverse areas, including cancer, brain imaging, sleep studies, Alzheimer's, aging, COVID-19, and HIV. Many of the datasets cannot be archived by outside researchers because they are regulated by Data Use Agreements that must be consistent with the Health Insurance Portability and Accountability Act (HIPAA). In April 2025, the Trump administration removed the online hub for federal COVID-19 resources, including COVID.gov and COVIDtests.gov, replacing it with a landing page promoting the COVID-19 lab leak theory.
+
+=== Other websites ===
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/2025_United_States_government_online_resource_removals-2.md b/data/en.wikipedia.org/wiki/2025_United_States_government_online_resource_removals-2.md
new file mode 100644
index 000000000..b78c5ae48
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/2025_United_States_government_online_resource_removals-2.md
@@ -0,0 +1,31 @@
+---
+title: "2025 United States government online resource removals"
+chunk: 3/4
+source: "https://en.wikipedia.org/wiki/2025_United_States_government_online_resource_removals"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:16:22.498115+00:00"
+instance: "kb-cron"
+---
+
+The Internal Revenue Service removed more than 25 pages as of February 2, including a form that private schools are required to submit annually to certify that they had not engaged in racial discrimination.
+The Consumer Financial Protection Bureau (CFPB) removed 386 videos from its official YouTube channel.
+As of February 2, there were 18 pages removed from the United States Patent and Trademark Office website, including information about veteran inventors and entrepreneurs, and a high school program teaching about intellectual property.
+The Department of the Interior removed eight pages from their website as of February 2, including several with information on environmental policy initiatives. The New York Times speculated that some of the pages may have been removed due to the use of the phrase "environmental justice". The National Park Service removed all references to the existence of transgender people from its webpages covering the Stonewall National Monument, the Stonewall riots, and LGBTQ+ history, changing the acronym from "LGBTQ+" to "LGB".
+As of February 3, four pages from the Nuclear Regulatory Commission have been deleted, including an overview of the commission's equal employment opportunity and diversity initiatives.
+The FDA's Office of Minority Health and Health Equity website was removed, and the NIH's Office of Equity, Diversity and Inclusion website now redirects to an equal employment opportunity web page.
+All Spanish-language content on whitehouse.gov was removed, as it was following President Trump's first inauguration. The Association of Academies of the Spanish Language issued a joint statement criticizing the removal, noting the importance of Spanish as the second most spoken language in the United States, especially in Puerto Rico. Signatories included the North American and Puerto Rican Academies of the Spanish Language.
+International travel advisories on the Department of State website replaced their language on "LGBTQ+ Travelers" with language around "LGB Travelers" and removed reference to safety and other issues faced by transgender Americans in other countries.
+Thousand of images were reported flagged for removal by the Defense Department.
+Arlington National Cemetery removed dozens of pages from its website. Some identified gravesites of notable Black, Hispanic and female service members, and others included educational material.
+On March 18, more than 300 posts were removed from the FTC business guidance blogs, including those reporting on lawsuits by Lina Khan against the tech giants.
+On April 6, 2025, The Washington Post reported that the National Park Service had revised a web page about the Underground Railroad to remove a quote and image of Harriet Tubman, and to remove the word "slavery" from the opening paragraph. Following an outcry after widespread reporting of the revisions, the changes were reverted the following day. A spokesperson for the National Park Service stated that "Changes to the Underground Railroad page on the National Park Service's website were made without approval from NPS leadership nor Department leadership".
+In August 2025, the government website for the Constitution of the United States was modified, removing large parts of Section 8 and entirely deleting Sections 9 and 10 from Article 1 of the document. On August 6, the Library of Congress said the deletion of text was "due to a coding error", and was working to correct the issue.
+As of November 2025, the USDA has deleted its contingency plan to fund SNAP.
+
+=== Datasets ===
+In January 2025, the government removed about 3,000 datasets from various platforms. Many deleted datasets came from the Department of Energy, the National Oceanic and Atmospheric Administration, and the Environmental Protection Agency.
+
+== Legal responses ==
+Doctors for America sued the U.S. government to restore health information, arguing "The removal of this information deprives researchers of access to information that is necessary for treating patients ... and for developing practices and policies that protect the health of vulnerable populations and the country as a whole." In response, a federal judge issued a restraining order on February 11, 2025, requiring certain websites from the Department of Health and Human Services, the CDC, and the FDA to be restored.
+American Federation of Teachers and Minority Veterans of America and Public Citizen Litigation Group also filed lawsuits.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/2025_United_States_government_online_resource_removals-3.md b/data/en.wikipedia.org/wiki/2025_United_States_government_online_resource_removals-3.md
new file mode 100644
index 000000000..b57400672
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/2025_United_States_government_online_resource_removals-3.md
@@ -0,0 +1,30 @@
+---
+title: "2025 United States government online resource removals"
+chunk: 4/4
+source: "https://en.wikipedia.org/wiki/2025_United_States_government_online_resource_removals"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:16:22.498115+00:00"
+instance: "kb-cron"
+---
+
+== Reactions ==
+Representatives from the Population Association of America, the Council of Professional Associations on Federal Statistics, and the Association of Public Data Users (APDU) expressed disapproval of the data deletion. President of the APDU, Amy O'Hara, described a "mad scramble" as researchers searched for copies of the deleted data.
+Former US Department of Labor chief evaluation officer and board member of Data Foundation Molly Irwin warned of the dangers of deleting critical government data. She highlighted government data is a critical component of how politicians and analysts evaluate whether a particular policy is working as intended.
+The stock market, bond market, and Federal Reserve all continuously make decisions based on labor data. This data is typically stable, but changes to it reduce confidence in data about the economy. Uncertainty also encourages conspiracy theories which view government data as intentionally incorrect for malicious purposes. Furthermore, businesses rely on the essentially free government data as a part of their own operations, planning, products and services. Zillow, for example, utilizes government data to generate their product of real estate information and analysis.
+Scientists reacted by saying that they would restore access to some data, but doing so is not easy. The Internet Archive has been successful in archiving many health datasets. Internet Archive is also a contributor to the consortium effort of developing the End of Term Web Archive, which attempts to copy every government publication at the end of every presidential term. Organizations like IPUMS, which provides data curation, integration, harmonization, is serving as an important source for previously deleted data.
+The Harvard Law School Library hosts the Data.gov Archive. The Chan School mirrored public health records. The law library's Innovation Lab said that it had managed to preserve 311,000 datasets copied between 2024 and 2025. A coalition of data organizations launched the Data Rescue Project "as a clearinghouse for data rescue-related efforts".
+George Benjamin, head of the American Public Health Association, said that the removals could make it more difficult to track infectious diseases such as HIV and Mpox. He also expressed concern that even if the data was restored, new data might not be collected which would impair future research.
+Director of the National Institutes of Health (NIH) Executive Secretariat, Nate Brought, said that Trump's orders were in conflict with extensive research and conclusions by the NIH pertaining to sexuality and gender. In a letter to the NIH director and other senior officials, Brought urged them to refuse implementing the President's directives.
+Groups like Free Government Information has voiced strong dissent with the Trump administration's removal of government data and resource, and have started organizing efforts to collect and preserve federal government data. Similarly, the Preservation of Electronic Government Information (PEGI) group has voiced similar urgency to collect and save data before they are removed from official government sites. Grassroot, collective efforts like the Data Rescue Project has launched efforts to coordinate these various data saving endeavors.
+
+== References ==
+
+== Further reading ==
+Jetelina, Katelyn (February 4, 2025). "Data and communication are gold". Your Local Epidemiologist.
+MD, Jeremy Faust (February 1, 2025). "Massive censorship escalation at CDC. Trump Administration now choosing the public health data you can see". Inside Medicine.
+
+== External links ==
+ Media related to 2025 United States government online resource removals at Wikimedia Commons
+Books removed from U.S. Naval Academy Nimitz Library
+Trump's war on public data
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Academic_Spring-0.md b/data/en.wikipedia.org/wiki/Academic_Spring-0.md
index d745e52a7..d9d497581 100644
--- a/data/en.wikipedia.org/wiki/Academic_Spring-0.md
+++ b/data/en.wikipedia.org/wiki/Academic_Spring-0.md
@@ -4,7 +4,7 @@ chunk: 1/1
 source: "https://en.wikipedia.org/wiki/Academic_Spring"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T06:31:27.019255+00:00"
+date_saved: "2026-05-05T10:14:41.679545+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Academic_Studies_Press-0.md b/data/en.wikipedia.org/wiki/Academic_Studies_Press-0.md
new file mode 100644
index 000000000..2a5cb9cb9
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Academic_Studies_Press-0.md
@@ -0,0 +1,29 @@
+---
+title: "Academic Studies Press"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Academic_Studies_Press"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:14:42.867762+00:00"
+instance: "kb-cron"
+---
+
+Academic Studies Press, (ASP) is an independent scholarly publisher of books and journals, based in Boston, Massachusetts.
+
+
+== History ==
+Founded in 2007, ASP emphasizes Jewish studies and Slavic studies, but also publishes titles in religious studies, comparative literature, and history more broadly.  Authors include Jacob Neusner, Fania Oz-Salzberger, Ellendea Proffer Teasley, Maxim D. Shrayer, Mark Lipovetsky, David Berger, Menachem Kellner, Viktor Zhivov, Jerold Auerbach, and Geoffrey Alderman, while works in translation include those of Maimonides, Ahad Ha'am, Mordecai Kaplan, Eliezer Schweid, and Yury Tynyanov. The press also specializes in Ukrainian translations, many of which received renewed interest during the 2022 Russian invasion of Ukraine. Its titles have won awards from the Jewish Book Council, the Modern Language Association, AATSEEL, and the Koffler Centre of the Arts, and have also appeared on reading lists published by Mosaic and The Washington Post.
+In 2017, ASP was the recipient of funding from the National Endowment for the Humanities to aid in the "creation of freely accessible e-books for 42 seminal titles in Russian literary and cultural history." In the same year, ASP collaborated jointly with the NEH and the Ukrainian Research Institute at Harvard University to publish Words for War, an anthology of contemporary Ukrainian poetry, featuring poems and commentary from figures such as Serhiy Zhadan and Ilya Kaminsky. 
+In 2018, ASP published the "first authorized English-language translation" of Akram Aylisli's controversial novella Stone Dreams.
+
+
+== Academic journals and publishing ==
+In addition to its books list, the press publishes three peer-reviewed academic journals, including Evolutionary Studies in Imaginative Culture, edited by Joseph Carroll, as well as the Journal of Contemporary Antisemitism, edited since 2018 by Lesley Klaff. Citing her appointment as editor-in-chief of the Journal of Contemporary Antisemitism, the Algemeiner Journal named Klaff among their list of the "top 100 people positively influencing Jewish life" in 2018. In 2020, ASP begin publication of a fourth peer-reviewed journal, Latin American Jewish Studies, on behalf of the Latin American Jewish Studies Association.
+Since 2012, ASP has also served as the distributor and printer for books published by Touro College Press. In 2019, ASP entered into an e-book distribution partnership with the German academic publisher Walter de Gruyter.
+
+
+== References ==
+
+
+== External links ==
+Official website
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Academic_Torrents-0.md b/data/en.wikipedia.org/wiki/Academic_Torrents-0.md
index 7853dde1a..fffe17a46 100644
--- a/data/en.wikipedia.org/wiki/Academic_Torrents-0.md
+++ b/data/en.wikipedia.org/wiki/Academic_Torrents-0.md
@@ -4,7 +4,7 @@ chunk: 1/1
 source: "https://en.wikipedia.org/wiki/Academic_Torrents"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T09:52:40.258053+00:00"
+date_saved: "2026-05-05T10:16:23.948246+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/AllTrials-0.md b/data/en.wikipedia.org/wiki/AllTrials-0.md
index 94013fb06..6c0ea9908 100644
--- a/data/en.wikipedia.org/wiki/AllTrials-0.md
+++ b/data/en.wikipedia.org/wiki/AllTrials-0.md
@@ -4,7 +4,7 @@ chunk: 1/2
 source: "https://en.wikipedia.org/wiki/AllTrials"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T06:46:40.642240+00:00"
+date_saved: "2026-05-05T10:16:25.207611+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/AllTrials-1.md b/data/en.wikipedia.org/wiki/AllTrials-1.md
index 19b83f1b8..fcf4c16f4 100644
--- a/data/en.wikipedia.org/wiki/AllTrials-1.md
+++ b/data/en.wikipedia.org/wiki/AllTrials-1.md
@@ -4,7 +4,7 @@ chunk: 2/2
 source: "https://en.wikipedia.org/wiki/AllTrials"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T06:46:40.642240+00:00"
+date_saved: "2026-05-05T10:16:25.207611+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/American_Art_Collaborative-0.md b/data/en.wikipedia.org/wiki/American_Art_Collaborative-0.md
new file mode 100644
index 000000000..d2b6cf2fb
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/American_Art_Collaborative-0.md
@@ -0,0 +1,38 @@
+---
+title: "American Art Collaborative"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/American_Art_Collaborative"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:16:26.351822+00:00"
+instance: "kb-cron"
+---
+
+The American Art Collaborative (AAC) is a consortium of 14 art museums in the United States, whose mission is the establishment of "a critical mass of linked open data (LOD) on the semantic web."
+As of 2024, the AAC has converted over 230,000 museum object records to linked open data.
+
+
+== Membership ==
+As of 2024, the 14 members are:
+
+Amon Carter Museum of American Art
+Archives of American Art, Smithsonian Institution
+Autry Museum of the American West
+Colby College Museum of Art
+Crystal Bridges Museum of American Art
+Dallas Museum of Art (DMA)
+Indianapolis Museum of Art (IMA)
+Thomas Gilcrease Institute of American History and Art
+National Portrait Gallery, Smithsonian Institution
+National Museum of Wildlife Art
+Princeton University Art Museum
+Smithsonian American Art Museum
+Walters Art Museum
+Yale Center for British Art
+
+
+== References ==
+
+
+== External links ==
+Official website
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Art_&_Architecture_Thesaurus-0.md b/data/en.wikipedia.org/wiki/Art_&_Architecture_Thesaurus-0.md
new file mode 100644
index 000000000..c66ce98c4
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Art_&_Architecture_Thesaurus-0.md
@@ -0,0 +1,68 @@
+---
+title: "Art & Architecture Thesaurus"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Art_&_Architecture_Thesaurus"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:16:27.520520+00:00"
+instance: "kb-cron"
+---
+
+The Art & Architecture Thesaurus (AAT) is a controlled vocabulary used for describing items of art, architecture, and material culture.  The AAT contains generic terms, such as "cathedral", but no proper names, such as "Cathedral of Notre Dame."  The AAT is used by, among others, museums, art libraries, archives, catalogers, and researchers in art and art history.  The AAT is a thesaurus in compliance with ISO and NISO standards including ISO 2788, ISO 25964 and ANSI/NISO Z39.19.
+The AAT is a structured vocabulary of 55,661 concepts (as of January 2020), including 131,000 terms, descriptions, bibliographic citations, and other information relating to fine art, architecture, decorative arts, archival materials, and material culture.
+
+
+== History ==
+The AAT project began in the late 1970s in response to the gradual automation of records by art libraries, art journal indexing services, and catalogers of museum objects and visual resources.  Automation required consistency in cataloging as well as more efficient retrieval of information; a controlled vocabulary was a solution to both these problems.  The project was conceived by library directors and architectural experts Toni Petersen, Dora Crouch, and Pat Molholt and was originally headquartered part-time at Rensselaer Polytechnic Institute in Troy, NY, then at Bennington College in Bennington, VT and later moved to Williamstown, Massachusetts, with the J. Paul Getty Trust providing technical advice and funding.  In 1983 the Getty Trust took over editorial responsibility.  The AAT offices relocated to the Getty's Los Angeles headquarters in order to better coordinate with two other similar Getty projects, the Union List of Artist Names (ULAN) and Getty Thesaurus of Geographic Names (TGN) soon after its publication.
+The AAT was published in 1990 and 1994 in both print and electronic form. By 1997, the size and frequency of updates  made hard-copy publication unfeasible and the decision was made to publish via a searchable online Web interface and in data files available for licensing. The online Web interface is freely-accessible from any computer connected to the Internet. Final editorial control of the AAT is maintained by the Getty Vocabulary Program, part of the Getty Research Institute.
+Since 2008, Taiwan e-Learning and Digital Archives Program (TELDAP) collaborated with Getty Research Institute (GRI) in developing the Chinese-language Art & Architecture Thesaurus (AAT-Taiwan). The initial goal of this project is to provide multilingual search and corresponding images in integrate digital archives systems of Taiwan, and broaden the inclusion of terms related to Asian art, architecture and material culture in AAT.
+The AAT can be used in several ways:
+
+at the data entry stage, by catalogers or indexers who are describing works of art, architecture, material culture, archival materials, visual surrogates, or bibliographic materials;
+as knowledge bases, providing information for researchers;
+as search assistants to enhance end-user access to online resources;
+as target for enriching free-text descriptions of cultural objects;
+as a pivot vocabulary for coreferencing (interlinking) other art vocabularies
+AAT is available as Linked Open Data at vocab.getty.edu since February 2014 and is updated bi-weekly.
+
+
+== Terms ==
+The initial core set of terms was derived from authority lists and the literature of art and architectural history; this core set was reviewed, approved and added to by an advisory team made up scholars from all relevant disciplines, including art and architectural historians, architects, librarians, visual resource curators, archivists, museum personnel, and specialists in thesaurus construction.  Its hierarchy was inspired by the Medical Subject Headings.  All eras from antiquity to the present are covered, and it is not limited geographically.
+As of January 2007, the AAT contained approximately 131,000 terms. While the thesaurus contains many variations on a term, such as singular and plural forms, spelling variants, various forms of speech, and synonyms, one is always flagged as the preferred term.  Terms are updated biweekly and regular users are encouraged to propose new terms.
+In 2015 AAT contains 354,000 terms. They are available in 4 major languages (English, Dutch, Spanish and Chinese), and some terms in various native languages.
+
+
+== Design ==
+The AAT is a faceted classification system as well as a hierarchical one.  There are seven facets:
+
+Associated Concepts – abstract concepts, such as beauty, balance, connoisseurship, metaphor, freedom, socialism (Hierarchy: Associated concepts)
+Physical Attributes – perceptible or measurable characteristics such as size, shape, chemical properties, texture and hardness, such as strapwork, borders, round, waterlogged, brittleness. (Hierarchies: Attributes and Properties, Conditions and Effects, Design Elements, Color)
+Styles and Periods – stylistic groupings and distinct chronological periods, such as French, Louis XIV, Tang dynasty, Chippendale (Hierarchy: Styles and Periods)
+Agents – people, groups of people, and organizations such as printmakers, landscape architects, corporations, religious orders. (Hierarchies: People, Organizations)
+Activities – areas of endeavor, physical and mental actions or methods, such as archaeology, engineering, analyzing, contests, exhibitions, running, drawing (image-making), corrosion. (Hierarchies: Disciplines, Functions, Events, Physical and Mental Activities, Processes and Techniques)
+Materials – physical substances, such as iron, clay, adhesive, emulsifier, artificial ivory, millwork, nylon. (Hierarchy: Materials)
+Objects – objects either fabricated or given form by human activity, such as paintings, amphorae, facades, cathedrals, Brewster Chairs, gardens (Hierarchies: Object Groupings and Systems, Object Genres, Components; Built Environment: Settlements and Landscapes, Built Complexes and Districts, Single Built Works, Open Spaces and Site Elements; Furnishings and Equipment: Furnishings, Costume, Tools and Equipment, Weapons and Ammunition, Measuring Devices, Containers, Sound Devices, Recreational Artifacts, Transportation Vehicles; Visual and Verbal Communication: Visual Works, Exchange Media, Information Forms)
+
+
+== Online records of concepts showing the hierarchy in the database ==
+The record for each concept includes its place in the hierarchy (with a link to its parent), as well as links to related terms, related concepts, sources and contributors for the data, and notes.
+
+
+== See also ==
+Categories for the Description of Works of Art (CDWA)
+Cultural Objects Name Authority (CONA)
+Getty Thesaurus of Geographic Names (TGN)
+Getty Vocabulary Program
+Union List of Artist Names (ULAN)
+
+
+== References ==
+
+
+== External links ==
+Art & Architecture Thesaurus Online Search the AAT online for free.
+About the Getty Vocabularies Archived 2010-07-20 at the Wayback Machine
+About AAT
+Getty Vocabulary Editorial Guidelines The editorial guidelines for the AAT, ULAN, and TGN contain rules and guidelines intended for use by the editors of the Getty Vocabulary Program using the in-house editorial system, VCS (Vocabulary Coordination System). Contributors to the Getty Vocabularies and implementers of the licensed vocabulary data may consult these guidelines as well.
+Training materials and presentations created by the Getty Vocabulary Program The documents on this page include presentations and other training materials for the Getty Thesaurus of Geographic Names (TGN), the Union List of Artist Names (ULAN), the Art & Architecture Thesaurus (AAT), Cataloging Cultural Objects (CCO), Categories for the Description of Works of Art (CDWA), and standards in general.
+AAT as Linked Open Data, documentation
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Article_processing_charge-0.md b/data/en.wikipedia.org/wiki/Article_processing_charge-0.md
new file mode 100644
index 000000000..45f6471c4
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Article_processing_charge-0.md
@@ -0,0 +1,32 @@
+---
+title: "Article processing charge"
+chunk: 1/2
+source: "https://en.wikipedia.org/wiki/Article_processing_charge"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:14:46.180705+00:00"
+instance: "kb-cron"
+---
+
+An article processing charge (APC), also known as a publication fee, is a fee which is sometimes charged to authors. Most commonly, it is involved in making an academic work available as open access (OA), in either a full OA journal or in a hybrid journal. This fee may be paid by the author, the author's institution, or their research funder. 
+Sometimes, publication fees are also involved in traditional journals or for paywalled content.
+Some publishers waive the fee in cases of hardship or geographic location, but this is not a widespread practice. An article processing charge does not guarantee that the author retains copyright to the work, or that it will be made available under a Creative Commons license.
+
+== Background ==
+
+Journals use a variety of ways to generate the income required to cover publishing costs (including editorial costs, any costs of administering the peer review system), such as subsidies from institutions and subscriptions. A majority of open access journals do not charge article processing charges, but a significant and growing number of them do. They are the most common funding method for professionally published open access articles.
+APC fees applied to academic research are usually expensive, effectively limiting open access publishing to wealthier institutions, scholars, and students.
+The APC model of open access, among other controversies, is part of the wider and increasingly global Open Access OA's ethics debate.
+Most journals do not charge APCs. The global average per-journal APC is US$1,626, its recent increase indicating "that authors choose to publish in more expensive journals".
+A 2019 analysis has shown 75% of European spending on scientific journals goes to "big five" publishers (Elsevier, Springer Nature, Wiley, Taylor & Francis and the American Chemical Society (ACS)). Together they accounted for 56% of articles published.
+
+== Other publishing fees ==
+Author fees or page charges have existed since at least the 1930s. Different academic publishers have widely varying levels of fees, from under $100 to over $5000, and even sometimes as high as €9500 ($10851) for the journal Nature. Meanwhile, an independent study indicated that the actual costs of efficiently publishing a scholarly article should be in the region of €200–€1000. High fees are sometimes charged by traditional publishers in order to publish in a hybrid open access journal, which make an individual article in a subscription journal open access. The average APC for hybrid journals has been calculated to be almost twice as high as APCs from full open access publishers. Journals with high impact factors from major publishers tend to have the highest APCs.
+Open access articles often have a surcharge compared to closed-access or paywalled content; for example, the Proceedings of the National Academy of Sciences charges $1590–$4215 per article (depending on length) for closed-access, with a surcharge of $1700–$2200 for open-access (depending on licence). Similarly, AGU's Journal of Geophysical Research charges $1000 for closed-access and $3500 for open-access.
+Even when publishers do not charge standard fees, excess or overlength fees might still apply after a certain number of pages or publication units is exceeded; additional color fees might apply for figures, primarily for print journals that are not online-only.
+While publication charges occur upon article acceptance, article submission fees are charged prior to the start of peer review; they are common among journals in some fields, e.g., finance and economics.
+Page charge may refer to either publication or submission fees.
+
+== Criticism ==
+
+=== Cost of research articles ===
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Article_processing_charge-1.md b/data/en.wikipedia.org/wiki/Article_processing_charge-1.md
new file mode 100644
index 000000000..c31fa8dde
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Article_processing_charge-1.md
@@ -0,0 +1,41 @@
+---
+title: "Article processing charge"
+chunk: 2/2
+source: "https://en.wikipedia.org/wiki/Article_processing_charge"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:14:46.180705+00:00"
+instance: "kb-cron"
+---
+
+==== Cost to scientists and funding bodies ====
+Article processing charges shift the burden of payment from readers to authors (or their funders), which creates a new set of concerns. One concern is that if a publisher makes a profit from accepting papers, it has an incentive to accept anything submitted, rather than selecting and rejecting articles based on quality. This could be remedied, however, by charging for the peer-review rather than acceptance. Another concern is that institutional budgets may need to be adjusted in order to provide funding for the article processing charges required to publish in many open access journals (e.g. those published by BioMed Central). It has been argued that this may reduce the ability to publish research results due to lack of sufficient funds, leading to some research not becoming a part of the public record.
+Another concern is the redirection of money by major funding agencies such as the National Institutes of Health and the Wellcome Trust from the direct support of research to the support of open access publication. Robert Terry, Senior Policy Advisor at the Wellcome Trust, has said that he feels that 1–2% of their research budget will change from the creation of knowledge to the dissemination of knowledge.
+Research institutions could cover the cost of open access by converting to an open access journal cost-recovery model, with the institutions' annual tool access subscription savings being available to cover annual open access publication costs. A 2017 study by the Max Planck Society estimates the annual turnovers of academic publishers amount to approximately €7.6 billion. It is argued that this money comes predominantly from publicly funded scientific libraries as they purchase subscriptions or licenses in order to provide access to scientific journals for their members. The study was presented by the Max Planck Digital Library and found that subscription budgets would be sufficient to fund the open access publication charges, but does not address how unaffiliated authors or authors from institutions without funds will contribute to the scholarly record.
+Five large commercial publishers (Elsevier, Sage, Springer Nature, Taylor & Francis, and Wiley) have raised concerns within research community. These concerns stem primarily from two factors: the publishers' substantial profit margins, which are often derived from works funded by public research grants, and the high costs associated with their open access publishing fees under gold and hybrid journal models. For example, a Guardian article informed that in 2010, Elsevier's scientific publishing arm reported profits of £724 million on just over £2 billion in revenue. The margin was 36%, which exceeded the margins reported by Apple, Google, and Amazon that same year.
+
+==== Unequal access to publishing ====
+Unless discounts are available to authors from countries with low incomes, or external funding is provided to cover the cost, article processing charges can exclude authors from developing countries or less-funded research fields from publishing. Publishers often explain this charge by citing the cost of producing print materials, but some digital-only publications continue to charge article processing fees, which has garnered criticism from academics. Under the traditional model, the prohibitive costs of some non-open access journal subscriptions already place a heavy burden on the research community. Many open access publishers do offer discounts or publishing fee waivers to authors from developing countries or those suffering financial hardship.
+For these reasons, some funding bodies simply will not pay the extra fees for open access publishing: the European Union scientific research initiative Horizon Europe does not cover the APCs for articles in hybrid open-access journals.
+
+=== Diamond open access model ===
+Diamond open access is a term used to describe journals that have no article processing charges, and make articles available to read without restrictions. In 2020, diamond OA journals comprised 69% of the journals in the Directory of Open Access Journals, but published only 35% of the articles. In 2021, it was estimated that 17,000 to 29,000 diamond OA journals published 8–9% of all scholarly journal articles and 45% of open access articles. Nearly all Latin American OA journals use the diamond model, whereas a little over half of African and Western European OA journals are diamond OA. However, the percentage of diamond OA articles covered in Scopus and Web of Science for the same year was below 1%, suggesting that "Scopus- or Web of Science-based (data) are skewed towards toll access and article processing charges-based publishing, as Diamond journals are underrepresented in (these databases)". The same study also found that diamond OA articles comprised 81% of all OA articles in Humanities, but only 30% in Medicine and Sciences.
+
+== See also ==
+Copyright transfer agreement
+Royalty-free
+Royalty payment
+Predatory publishing
+Academic journal publishing reform
+Plan S
+
+== References ==
+
+== Further reading ==
+University of California Libraries (2016) Pay It Forward: Investigating a Sustainable Model of Open Access Article Processing Charges for Large North American Research Institutions. Mellon Foundation. Archived 2019-04-08 at the Wayback Machine.
+Robert Kiley (2013). "Colour and page charges: results of a brief survey" (PDF). Archived from the original (PDF) on 2016-03-04. Retrieved 2015-07-17.
+Curb, L. A.; Abramson, C.I. (2012). "An examination of author-paid charges in science journals". Comprehensive Psychology. 1: 4. doi:10.2466/01.17.CP.1.4.
+Guy, M., Holl, A. (2015) Article Processing Charges. Briefing Paper, PASTEUR4OA project
+
+== External links ==
+OpenAPC: open database of APC
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/BASE_(search_engine)-0.md b/data/en.wikipedia.org/wiki/BASE_(search_engine)-0.md
new file mode 100644
index 000000000..a3bced650
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/BASE_(search_engine)-0.md
@@ -0,0 +1,44 @@
+---
+title: "BASE (search engine)"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/BASE_(search_engine)"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:14:47.317292+00:00"
+instance: "kb-cron"
+---
+
+BASE (Bielefeld Academic Search Engine) is a multi-disciplinary search engine to scholarly internet resources, created by Bielefeld University Library in Bielefeld, Germany. It is based on free and open-source software such as Apache Solr and VuFind. It harvests OAI metadata from institutional repositories and other academic digital libraries that implement the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH), and then normalizes and indexes the data for searching. In addition to OAI metadata, the library indexes selected web sites and local data collections, all of which can be searched via a single search interface.
+
+
+== History ==
+BASE was developed at the German university of Bielefeld beginning in 2002. The project's initial goal was to develop a search engine that would provide users access to the university's research resources. Yet as the initiative advanced, the creators came to see the need for a more thorough search engine that might provide users access to academic resources outside of the university.
+The initial iteration of BASE was released as a prototype in 2004 and made accessible to the general public for testing. The search engine was created to index and offer access to scholarly materials such journals, institutional repositories, and digital collections as well as scientific publications. The search engine's creators emphasized on ensuring open access to scientific knowledge and made sure that its search results only included materials that were publicly available through the web.
+Over the next few years, BASE continued to grow and develop. The search engine was refined and improved, and it began to attract users from all over the world. In 2007, the project received funding from the German Research Foundation (DFG) to further develop and improve the search engine.
+Since then, BASE has become one of the largest and most comprehensive search engines for academic resources. It provides access to scholarly resources in a variety of languages and disciplines, and it has become an important tool for researchers, scholars, and students around the world.
+In addition to providing access to scholarly resources, BASE has also been involved in several projects and initiatives aimed at promoting open access and improving scholarly communication. For example, the search engine has been involved in the development of the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH), which is used to facilitate the exchange of metadata between digital repositories.
+Overall, BASE has played an important role in the development of open access and the democratization of knowledge. Its commitment to providing free and open access to scholarly resources has made it an important resource for researchers and scholars around the world.
+
+
+== Functionality ==
+Users can search bibliographic metadata including abstracts, if available. However, BASE does not currently offer full text search. It contrasts with commercial search engines in multiple ways, including in the types and kinds of resources it searches and the information it offers about the results it finds. Results can be narrowed down using drill down menus (faceted search). Bibliographic data is provided in several formats, and the results may be sorted by multiple fields, such as by author or year of publication.
+Paying customers include EBSCO Information Services who integrated BASE into their EBSCO Discovery Service (EDS). Non-commercial services can integrate BASE search for free using an API. BASE has become an increasingly important component of open access initiatives concerned with enhancing the visibility of their digital archive collections.
+On 6 October 2016, BASE surpassed the 100 million documents threshold having indexed 100,183,705 documents from 4,695 content sources. As of 2022, it had indexed over 315 million documents from over 10,000 sources.
+
+
+== See also ==
+List of academic databases and search engines
+CORE (research service)
+Open access in Germany
+
+
+== References ==
+
+
+== Literature ==
+Lossau, Norbert. 2004. "Search Engine Technology and Digital Libraries: Libraries Need to Discover the Academic Internet," D-Lib Magazine, Volume 10, Number 6, June 2004. doi:10.1045/june2004-lossau
+Summann, Friedrich and Norbert Lossau. 2004. "Search Engine Technology and Digital Libraries: Moving from Theory to Practice," D-Lib Magazine, Volume 10, Number 9, September 2004. doi:10.1045/september2004-lossau
+
+
+== External links ==
+Official website
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Beall's_List-0.md b/data/en.wikipedia.org/wiki/Beall's_List-0.md
new file mode 100644
index 000000000..91777759e
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Beall's_List-0.md
@@ -0,0 +1,40 @@
+---
+title: "Beall's List"
+chunk: 1/3
+source: "https://en.wikipedia.org/wiki/Beall's_List"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:14:49.561109+00:00"
+instance: "kb-cron"
+---
+
+Beall's List was a list of predatory open-access publishers that was maintained by University of Colorado Denver librarian Jeffrey Beall on his blog Scholarly Open Access. The list aimed to document open-access publishers who did not perform real peer review, effectively publishing any article as long as the authors pay the article processing charge. Originally started as a personal endeavor in 2008, Beall's List became a widely followed piece of work by the mid-2010s. The list was used by scientists to identify exploitative publishers and detect publisher spam. 
+The influence of Beall's List led some publishers on the list to threaten defamation lawsuits against Beall, as well as to lodge official complaints against Beall's work to the University of Colorado. In January 2017, Beall removed the list from his blog, scholarlyoa.com. Six months later, he published an article in the journal Biochemia Medica claiming that pressure from his employer led to the blog shutdown, although the university's official statement and a response by Beall's direct supervisor both disputed this account. The closure of Beall's List was cited by some as a loss of an important resource, and successors have set out to continue Beall's work.
+
+== Early history ==
+Beall first became interested in predatory open-access journals (a term he coined) in 2008, when he started to receive numerous requests from dubious journals to serve on their editorial boards. He said that he "immediately became fascinated because most of the e-mails contained numerous grammatical errors." Starting in 2008, he maintained a list of what he stated were "potential, possible, or probable predatory scholarly open-access publishers". 
+In 2011, Beall's list had 18 publishers on it; by December 29, 2016, this number had grown to 923.  Many of the journals listed were not actively publishing or published very few papers each year.
+The original list of 18 publishers published a total of 1,328 separate journals.  Beall originally classified all but one of the publishers he reviewed as being predatory.  A decade later, two of the original 18 had been acquired by reputable publishers, and three appeared to have gone out of business. The remaining 13 publishers had significantly increased the number of journals they were publishing, to a total of 1,650 individual journals (about 10% of the number of journals listed in Cabells' Predatory Reports in 2022), primarily due to the dramatic increase in the number of journals published by OMICS Publishing Group from 63 to 742.
+
+== Criteria for inclusion ==
+Beall considered multiple criteria before including a publisher or journal on his lists. Examples included:
+
+Two or more journals have the same editorial board.
+There is little or no geographical diversity among the editorial board members, especially for journals that claim to be international in scope or coverage.
+The publisher has no policies or practices for digital preservation, meaning that if the journal ceases operations, all of the content disappears from the internet.
+The publisher copy-proofs their PDFs, thus making it harder to check for plagiarism.
+The name of a journal is incongruent with the journal's mission.
+The publisher falsely claims to have its content indexed in legitimate abstracting and indexing services or claims that its content is indexed in resources that are not abstracting and indexing services.
+
+== Reception ==
+
+=== Legal threats ===
+In February 2013, the open-access publisher Canadian Center for Science and Education sent a letter to Beall stating that Beall's inclusion of its company on his list of questionable open-access publishers amounted to defamation. The letter also stated that if Beall did not remove the company from his list, it would subject him to "civil action".
+In 2013, the OMICS Publishing Group threatened to sue Beall for $1 billion for his "ridiculous, baseless, [and] impertinent" inclusion of it on his list, which "smacks of literal unprofessionalism and arrogance". An unedited sentence from the letter read: "Let us at the outset warn you that this is a very perilous journey for you and you will be completely exposing yourself to serious legal implications including criminal cases lunched against you in INDIA and USA." Beall responded that the letter was "poorly written and personally threatening" and expressed his opinion that the letter "is an attempt to detract from the enormity of OMICS's editorial practices". OMICS' lawyers stated that damages were being pursued under section 66A of India's Information Technology Act, 2000, which makes it illegal to use a computer to publish "any information that is grossly offensive or has menacing character" or to publish false information. The letter stated that three years in prison was a possible penalty, although a U.S. lawyer said that the threats seemed to be a "publicity stunt" that was meant to "intimidate".
+
+=== Use in sting operations ===
+
+==== Who's Afraid of Peer Review? ====
+
+In 2013, Science correspondent John Bohannon submitted 304 fake scientific articles to various open access journals, many of which were published by publishers on Beall's List. Among these publishers that completed the review process, 82% accepted the paper. Bohannon stated "the results show that Beall is good at spotting publishers with poor quality control". Beall stated that the results support his claim to be identifying "predatory" publishers. However, the remaining 18% of publishers identified by Beall as predatory rejected the fake paper, leading science communicator Phil Davis to state "That means that Beall is falsely accusing nearly one in five".
+Notable publishing groups to pass this sting operation include PLoS One, Hindawi, and Frontiers Media. Frontiers Media would later be added to Beall's list in 2015, sparking a controversy that is credited as a major reason for Beall eventually retracting his list.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Beall's_List-1.md b/data/en.wikipedia.org/wiki/Beall's_List-1.md
new file mode 100644
index 000000000..8de62ddd9
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Beall's_List-1.md
@@ -0,0 +1,27 @@
+---
+title: "Beall's List"
+chunk: 2/3
+source: "https://en.wikipedia.org/wiki/Beall's_List"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:14:49.561109+00:00"
+instance: "kb-cron"
+---
+
+==== "Dr Fraud" experiment ====
+In 2015, four researchers created a fictitious sub-par scientist named Anna O. Szust (oszust is Polish for "fraud"), and applied on her behalf for an editor position to 360 scholarly journals. Szust's qualifications were dismal for the role of an editor; she had never published a single article and had no editorial experience. The books and book chapters listed on her CV were made-up, as were the publishing houses that allegedly published the books.
+One-third of the journals to which Szust applied were sampled from Beall's List. Forty of these predatory journals accepted Szust as editor without any background vetting and often within days or even hours. By comparison, she received minimal to no positive response from the "control" journals which "must meet certain standards of quality, including ethical publishing practices." Among journals sampled from the Directory of Open Access Journals (DOAJ), 8 of 120 accepted Szust. The DOAJ has since removed some of the affected journals in a 2016 purge. None of the 120 sampled journals listed in Journal Citation Reports (JCR) offered Szust the position.
+The results of the experiment were published in Nature in March 2017, and widely presented in the press.
+
+=== Criticism ===
+The list's 82% accuracy rate in the Who's Afraid of Peer Review? sting operation led Phil Davis to state that "Beall is falsely accusing nearly one in five as being a 'potential, possible, or probable predatory scholarly open access publisher' on appearances alone." He wrote that Beall "should reconsider listing publishers on his 'predatory' list until he has evidence of wrongdoing. Being mislabeled as a 'potential, possible, or probable predatory publisher' by circumstantial evidence alone is like the sheriff of a Wild West town throwing a cowboy into jail just 'cuz he's a little funny lookin.' Civility requires due process."
+Joseph Esposito wrote in The Scholarly Kitchen that he had been following some of Beall's work with "growing unease", and that Beall's "broader critique (really an assault) of Gold OA and those who advocate it" had "crossed the line".
+City University of New York librarians Monica Berger and Jill Cirasella wrote that his views were biased against open-access journals from less economically developed countries. Berger and Cirasella argued that "imperfect English or a predominantly non-Western editorial board does not make a journal predatory". They stated that "the criteria he uses for his list are an excellent starting point for thinking about the hallmarks of predatory publishers and journals", and suggested that "given the fuzziness between low-quality and predatory publishers, whitelisting, or listing publishers and journals that have been vetted and verified as satisfying certain standards, may be a better solution than blacklisting." However, for researchers in developing countries, the list has also been described as having been particularly important, as a result of lower access to institutional support for guidance on predatory publishers.
+Rick Anderson, associate dean in the J. Willard Marriott Library, University of Utah, challenged the term "predatory open access publishing" itself: "what do we mean when we say 'predatory,' and is that term even still useful?... This question has become relevant because of that common refrain heard among Beall's critics: that he only examines one kind of predation—the kind that naturally crops up in the context of author-pays OA." Anderson suggested that the term "predatory" be retired in the context of scholarly publishing: "It's a nice, attention-grabbing word, but I'm not sure it's helpfully descriptive... it generates more heat than light." In its place, he proposed the term "deceptive publishing".
+Beall's List primarily assessed the predatory journals based on their compliance with procedural standards, even though the quality of a journal can be judged on at least six different dimensions. A 2020 review in BMC Medicine found that only 3% of "predatory checklists" found online met their study's criteria for being "evidence-based"; Beall's List was not amongst them. A 2021 study in The Journal of Academic Librarianship confirmed Beall's bias against OA journals.
+
+== Removal ==
+On January 15, 2017, the entire content of Beall's Scholarly Open Access website was removed, along with Beall's faculty page on the University of Colorado's website. The removal was first noticed on social media, with speculation on whether the removal was due to migration of the list to the stewardship of Cabell's International. The company later denied any relationship, and its vice president of business development declared that Beall "was forced to shut down blog due to threats and politics". The University of Colorado declared that the decision to take down the list was a personal decision from Beall. Beall later wrote that he had taken down his blog because of pressure from the University of Colorado, which threatened his job security. 
+Beall's supervisor, Shea Swauger, wrote that the university had supported Beall's work and had not threatened his academic freedom. A demand by Frontiers Media to open a research misconduct case against Beall, to which the University of Colorado acquiesced, is reported as the immediate reason for Beall to take down the list. The university's investigation was closed with no findings. In an interview in 2018, Beall stated that "my university began to attack me in several ways. They launched a research misconduct investigation against me (after seven months, the result of the investigation was that no misconduct had occurred). They also put an unqualified, mendacious supervisor over me, and he constantly attacked and harassed me. I decided I could no longer safely publish the list with my university threatening me in these ways." Beall has not reactivated the list.
+
+== Successors ==
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Beall's_List-2.md b/data/en.wikipedia.org/wiki/Beall's_List-2.md
new file mode 100644
index 000000000..052cbf401
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Beall's_List-2.md
@@ -0,0 +1,24 @@
+---
+title: "Beall's List"
+chunk: 3/3
+source: "https://en.wikipedia.org/wiki/Beall's_List"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:14:49.561109+00:00"
+instance: "kb-cron"
+---
+
+Since Beall's List closed, similar lists have been started by others, including CSIR-Structural Engineering Research Centre, and an anonymous group at Stop Predatory Journals. Cabell's International, a company that offers scholarly publishing analytics and other scholarly services, has also offered both a black list and a white list for subscription on their website. Since 2021, the Norwegian Scientific Index includes the category "level X" that includes journals suspected of being predatory; its establishment was linked to expressions of concern regarding the publisher MDPI. A site entitled Beall's List of Potential Predatory  Journals and Publishers states that it includes the original list as at 15 January 2017, with updates listed separately, maintained by an anonymous European postdoctoral researcher.
+
+== See also ==
+Journalology
+
+== References ==
+
+== Further reading ==
+Buschman, John (2020). "A Political Sociology of the Beall's List Affair". The Library Quarterly. 90 (3): 298–313. doi:10.1086/708959. S2CID 224809316.
+
+== External links ==
+Beall, Jeffrey. "List of Publishers: Potential, possible, or probable predatory scholarly open-access publishers" (last archived ed.). Archived from the original on January 12, 2017.
+Beall, Jeffrey. "List of Standalone Journals: Potential, possible, or probable predatory scholarly open-access journals" (last archived ed.). Archived from the original on January 11, 2017.
+Updated "Beall's List of Predatory Journals and Publishers" – maintained by an anonymous postdoctoral European researcher
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Bibsam_Consortium-0.md b/data/en.wikipedia.org/wiki/Bibsam_Consortium-0.md
new file mode 100644
index 000000000..361cda636
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Bibsam_Consortium-0.md
@@ -0,0 +1,30 @@
+---
+title: "Bibsam Consortium"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Bibsam_Consortium"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:14:50.757805+00:00"
+instance: "kb-cron"
+---
+
+Bibsam Consortium is a consortium in which 85 higher education and research institutions in Sweden participate to negotiate license agreements for electronic information resources. The consortium is headed by the National Library of Sweden and negotiates as well as administrates license agreements for e-resource packages. The participating institutions sign a power of attorney which allows the National Librarian to sign contracts with the e-resource providers.
+
+
+== History and scope ==
+The Bibsam Consortium was formed in 1996 in order to negotiate license agreements for electronic resources on behalf of Swedish Universities, research institutes and government agencies. The total turnover of the agreements in 2015 was € 33 million and € 35 million in 2017  with 73% of the turnover being generated by the ten largest universities in Sweden. There are six members in National Library of Sweden to negotiate and administer the 100 license agreements for approximately 40 e-resource packages.
+
+
+== Negotiation with Elsevier ==
+In 2018, the Bibsam Consortium terminated its agreement with Elsevier publishers in order to stop rising prices of publishing and to support open access publishing. The termination went into effect on 1 July 2018, as a result of which Swedish Universities and colleges had no access to any items published after this date in 2100 e-journals published by Elsevier. Sll articles published between 1 January 1995 and 30 June 2018 were still available. Astrid Söderbergh Widding, president of Stockholm University, chair of the Bibsam consortium steering committee and head of the negotiation team, said:
+
+Increasing costs of scientific information are straining university budgets on a global scale while publishers operate on high profit margins. An alternative to the current publishing and pricing model is 'open access', where institutions pay to publish their articles and the articles become open for everyone to read, immediately upon publication. We need to monitor the total cost of publication as we see a tendency towards a rapid increase of costs for both reading and publishing. The current system for scholarly communication must change and our only option is to cancel deals when they don’t meet our demands for a sustainable transition to open access.
+
+The requirements that Bibsam Consortium asks for were:
+
+Immediate open access to all articles published by researchers affiliated to participating organizations in Elsevier journals.
+Reading access for participating organisations to all articles in Elsevier’s journals.
+A sustainable price model that enables a transition to open access.
+
+
+== References ==
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Bus_Open_Data_Service-0.md b/data/en.wikipedia.org/wiki/Bus_Open_Data_Service-0.md
new file mode 100644
index 000000000..8bb83395b
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Bus_Open_Data_Service-0.md
@@ -0,0 +1,52 @@
+---
+title: "Bus Open Data Service"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Bus_Open_Data_Service"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:16:31.060926+00:00"
+instance: "kb-cron"
+---
+
+The Bus Open Data Service (BODS) is a government-funded service in England, established in 2020 as part of the Bus Services Act 2017. It was created in a partnership between Ito World, the Department for Transport and KPMG.
+The service was described by Ito World as "an international first", as it provides Open Data of bus timetables, fares and Automatic Vehicle Location of buses across England.
+An extension to the Bus Open Data Service, Analyse Bus Open Data Service (ABOD), was introduced in 2021 to provide free-to-access reporting and analytics to operators and authorities nationally. The extended service provides access to on-time performance analytics, vehicle journey replays, and corridor reporting.
+
+
+== Data implementation ==
+As part of the requirements set by the Department for Transport in The Public Service Vehicles (Open Data) (England) Regulations 2020, the Bus Open Data Service set deadlines for operators to provide data.
+The implementation requirements only applied in England
+
+31 December 2020 — Obligation to provide bus timetable data to the Bus Open Data Service.
+7 January 2021 — Obligation to provide vehicle location and basic fares and tickets data to the Bus Open Data Service.
+7 January 2023 — Obligation to provide complex fares and ticket data to the Bus Open Data Service.
+
+
+== Data provided ==
+The Bus Open Data Service makes available three types of bus service data, in a variety of formats:
+
+Timetable data in TransXChange, a XML-based data format for representing bus route and timetable information, and GTFS, a CSV-based format which represents schedule data, as well as routes, trips, stop times, and stop locations.
+Location data in SIRI-VM, an XML-based data format for representing live vehicle locations, and GTFS-RT (GTFS Realtime), a real-time extension of GTFS provided as Protocol Buffers messages.
+Fares data as NeTEx (NeTex Network Timetable Exchange), an XML-based offering which "allows for accurate representations of operators’ fares offerings to the market, which can then be accessed and used in journey planning applications" 
+
+
+== Uses ==
+Following the introduction of the Bus Open Data, there have been a number of uses for the system.
+
+The website bustimes.org utilises data from BODS to supply information such as timetable, fares, and vehicle location information via an API link, with the vehicle location information displaying on a map. This reliance does have a drawback however if a bus stop is removed or if the bus route information is inaccurate due to an outdated route information being supplied to BODS.
+The Traffic Commissioners for Great Britain, in their 2020/21 annual report, stated that use of the Bus Open Data Service would "make available more data than ever before on an operator’s performance."
+An article in TransportXtra explained how data from BODS can be used to plan an electrified bus fleet
+
+
+== Criticism ==
+Despite providing fare, time and vehicle location, the Department for Transport has ruled out including key accessibility information on bus stops, stations and vehicles despite the Bus Services Act making specific provision for open data, 'for the purpose of facilitating travel by disabled persons'.
+A number of operators have struggled to provide the data required by the deadlines provided by the Bus Open Data Service, requiring providers to implement alternative solutions.
+The Confederation of Passenger Transport, and operators of home-to-school transport, criticised the requirement for operators to provide data about registered home-to-school bus services, and the exemption of Section 22 community bus services.
+Writing in Buses magazine, Centrebus Group owner Julian Peddle called the service "a horrendously bureaucratic and over-engineered system designed by well-meaning but clueless officials in London. It’s running late, does not work properly, and has involved the industry and local authorities in vast amounts of needless work. It’s supposedly been running since January 2021, but has not improved things in the wilds of Shropshire, and never will, because government bureaucrats don’t understand the problem, so have no chance of solving it."
+
+
+== References ==
+
+
+== External links ==
+Bus Open Data Service (dft.gov.uk)
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Bustimes.org-0.md b/data/en.wikipedia.org/wiki/Bustimes.org-0.md
new file mode 100644
index 000000000..2fdf72174
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Bustimes.org-0.md
@@ -0,0 +1,28 @@
+---
+title: "Bustimes.org"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Bustimes.org"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:16:32.220241+00:00"
+instance: "kb-cron"
+---
+
+bustimes.org is a transportation information website created to take advantage of Bus Services Act 2017 requirement for bus operators in England to provide bus timetables, fares and vehicle locations in an open data format, which can be utilised by app and website developers. This DfT service is called the Bus Open Data Service.
+The website also provides information on bus services in parts of the UK to which the Bus Services Act 2017 information requirement does not apply, as well as in Ireland.
+Location data for operators partially or completely owned by Transport for Edinburgh, is supplied to the site via their Open Data system.
+The site uses data from AVL tracking to determine and transmit the geographic location of a vehicle, such as data from Ticketer machines and the iBus system, in order to display live bus positions on a map.
+The site also uses data from the National Public Transport Gazetteer, and bus stop locations from NaPTAN.
+The live tracking system was added in response to the Department for Transport stating that they wanted "to see more people taking the bus, and those who do take it to have the best possible experience." with fares for companies operating the Passenger MyTrip system being added in 2022. Vehicle details (such as liveries, registration plates and fleet numbers) are all added by individual contributors using the edit vehicle information section.
+
+
+== Criticism ==
+The website was criticised by Centrebus Group owner, Julian Peddle, as lacking authority, not being an "official website", and questioning if trust can be placed in its information in an article in Buses Magazine about bus timetable information.
+
+
+== References ==
+
+
+== External links ==
+bustimes.org
+bustimes.org on GitHub
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/CKAN-0.md b/data/en.wikipedia.org/wiki/CKAN-0.md
new file mode 100644
index 000000000..b3174068e
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/CKAN-0.md
@@ -0,0 +1,40 @@
+---
+title: "CKAN"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/CKAN"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:16:33.414538+00:00"
+instance: "kb-cron"
+---
+
+The Comprehensive Knowledge Archive Network (CKAN) is an open-source open data portal for the storage and distribution of open data. Initially inspired by the package management capabilities of Debian Linux, CKAN has developed into a powerful data catalogue system that is mainly used by public institutions seeking to share their data with the general public. 
+Since its inception, CKAN has evolved and is the leading open data platform software in the world, used by governments including the US and UK, to publish millions of public datasets. 
+Rufus Pollock developed its first version in 2005-2006. CKAN's codebase is maintained by the Open Knowledge Foundation.
+The system is used both as a public platform on Datahub and in various government data catalogues, such as the UK's data.gov.uk, the Dutch National Data Register, the United States government's Data.gov and the Australian government's "Gov 2.0". The state government of South Australia also makes government data freely available to the public on the CKAN platform. The Italian government makes available the open data of the Data & Analytics Framework on the CKAN platform.
+
+
+== Internal technology ==
+CKAN's back end, the part running on the Web server, is written mainly in Python. The web pages it offers to users browsers include JavaScript. CKAN maintains information about the data sets to be offered to users in PostgreSQL databases. Searches are implemented by Solr. CKAN installations can be queried through Web APIs.
+
+
+== Future of the project ==
+The CKAN Stewardship proposal jointly put forward by Link Digital and Datopian received support from the Open Knowledge Foundation Board. In appointing joint stewardship put up jointly by Link Digital and Datopian, the Board felt there was a clear practical path with strong leadership and committed funding to see CKAN grow and prosper in the years to come.
+The Open Knowledge Foundation will remain the ‘purpose trustee’ to ensure the Stewards remain true to the purpose and ethos of the CKAN project.
+
+
+== Similar projects and alternatives ==
+Piveau is the prevailing (meta)data management tool used by the EU. 
+A variant, HDEU-Hub, is specifically used for European health data 
+Dataverse provides similar functions and is widely used for open data.
+DKAN is a Drupal-based open data portal based on CKAN.
+
+
+== References ==
+
+
+== External links ==
+Official website 
+Open Knowledge Foundation
+South Australian Government Data Directory
+Commonly Used Open Data Platforms
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/COPIM-0.md b/data/en.wikipedia.org/wiki/COPIM-0.md
new file mode 100644
index 000000000..5717cb9be
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/COPIM-0.md
@@ -0,0 +1,45 @@
+---
+title: "COPIM"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/COPIM"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:14:54.247395+00:00"
+instance: "kb-cron"
+---
+
+The Copim community is an international group of researchers, universities, librarians, open access book publishers and infrastructure providers. It is building community-owned, open systems and infrastructures to enable open-access book publishing to flourish. The collaboration is being funded by Research England and Arcadia Fund, via two consecutive projects between November 2019 and April 2026.
+The community's name is derived from the original project acronym of COPIM (Community-led Open Publication Infrastructures for Monographs). During its first project phase (11/2019-04/2023), the community has been involved in the foundational project of the same name. As of 05/2023, this is now followed by a second project phase under the title of Open Book Futures, through which the Copim community aims to expand and accelerate the uptake of the infrastructures developed during its initial project phase.
+Following the principle of 'Scaling Small', the project has developed a set of proof-of-concepts of non-profit and community-owned, open infrastructures to enable open access book publishing to prosper.
+Copim has been named as a Supporting Action in UKRI's 2020 Open Access Review Consultation.
+
+
+== Work Packages ==
+In seven distinct Work Packages, the COPIM project explored:
+
+how to scope and build support for an integration of open access books in libraries;
+how to build a collective of librarians, publishers and researchers invested in sustainable OA through a not-for-profit, community-governed OA book revenue management and information exchange platform;
+how to establish funding models that enable a transition of legacy publishers' existing business models to non-BPC OA;
+research on, and implementation of robust governance models for not-for-profit, community-owned digital infrastructures such as those being developed in other work packages;
+channels of OA book discovery and dissemination, culminating in the development of an open-source OA book metadata creation and dissemination system and service;
+ways to more closely align existing software, tools and technologies, workflows and infrastructures for experimental publishing with the workflows of OA book publishers;
+how to establish more robust ways to tackle the technical and legal impediments to a more streamlined process of archiving and preservation of OA books technical and legal solutions.
+At the end of the first project phase (04/2023), the list of key outputs, activities and proof-of-concepts delivered across the initial project's lifespan include:
+
+publication of 13 major scoping reports, 3 annual project reports, plus a variety of research papers published in peer-reviewed journals, the successful organisation and documentation of 26 workshops, with more than 220 national and international stakeholders representing 25 countries, and the presentation of COPIM work at more than 120 international conferences, workshops, and events.
+set-up an iterative extension of an Outreach and Dissemination network that is combining a variety of channels, including social media and open community platforms.
+following the platform's beta launch in 2021, the successful inception of Thoth, COPIM's Open Dissemination System, as a Community Interest Company under the name of Thoth Open Metadata CIC. Thoth now makes open access book metadata available in an open, transparent, and participatory way via its open API, and publishers can use the platform's interface to create rich, open metadata for direct dissemination in a variety of global channels.
+launch of the Open Book Collective platform and community of OA book publishers, infrastructure providers, and libraries that are collaborating to bring about a future for OA book publishing free from inequitable Book Processing Charges. The Open Book Collective has successfully reached its originally-envisioned revenue target, and has also implemented a robust legal, financial, and governance model to ensure longer-term stability of the Open Book Collective legal entity.
+further strengthening of the Opening the Future revenue model via the two publishers, CEU Press and Liverpool University Press, that COPIM has been working with. Through Opening the Future, both presses to date (04/2023) have released 15 new monographs between them, and have accrued enough funding through the programme for approximately 45 titles to be published OA in the coming months and years.
+launch of the Experimental Publishing Compendium, as a comprehensive online resource bringing together tools, practices, and books to promote and support the publication of experimental book publications.
+establishing the Thoth Archiving Network, a community-led collaboration between university repositories and national libraries to facilitate archiving and preservation of OA books via COPIM's Open Dissemination System Thoth, particularly those published by small and medium-sized publishers that might not have the resources to invest in other, more expensive means of archiving.
+As part of the second project phase of Open Book Futures (OBF), the work package structure has been slightly adapted to accommodate the shift in focus towards accelerating the uptake of the proof-of-concepts that have been delivered during the first phase.
+In doing so, Open Book Futures's overall goal is to increase COPIM's long-term impact and ensure that a wide range of voices have the opportunity to shape the future of open access book publishing. In order to amplify bibliodiverse and equitable community-led approaches to OA book publishing, OBF aims not just to strengthen existing networks in the UK and North America, but also to engage further with publishers, universities, and infrastructure providers in a diverse set of national and linguistic contexts, including Africa, Australasia, Continental Europe, and Latin America.
+
+
+== Opening the Future ==
+Opening the Future, a revenue model developed in COPIM's Business Models Work Package, is a collective subscription model through which subscribing libraries can get unlimited access to a selection of a chosen publisher's backlist, with perpetual access after three years. The generated membership revenue is used by the publisher solely to produce new Open access monographs.
+The model is currently being piloted in collaboration with CEU Press and Liverpool University Press under the remit of COPIM.
+
+
+== References ==
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/COVID_Moonshot-0.md b/data/en.wikipedia.org/wiki/COVID_Moonshot-0.md
index 942936a0b..8bf3deb3c 100644
--- a/data/en.wikipedia.org/wiki/COVID_Moonshot-0.md
+++ b/data/en.wikipedia.org/wiki/COVID_Moonshot-0.md
@@ -4,7 +4,7 @@ chunk: 1/2
 source: "https://en.wikipedia.org/wiki/COVID_Moonshot"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T06:31:49.398528+00:00"
+date_saved: "2026-05-05T10:16:38.156021+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/COVID_Moonshot-1.md b/data/en.wikipedia.org/wiki/COVID_Moonshot-1.md
index 1392fe6a5..6c8f4fc72 100644
--- a/data/en.wikipedia.org/wiki/COVID_Moonshot-1.md
+++ b/data/en.wikipedia.org/wiki/COVID_Moonshot-1.md
@@ -4,7 +4,7 @@ chunk: 2/2
 source: "https://en.wikipedia.org/wiki/COVID_Moonshot"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T06:31:49.398528+00:00"
+date_saved: "2026-05-05T10:16:38.156021+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Centre_pour_l'Édition_Électronique_Ouverte-0.md b/data/en.wikipedia.org/wiki/Centre_pour_l'Édition_Électronique_Ouverte-0.md
new file mode 100644
index 000000000..7be50503d
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Centre_pour_l'Édition_Électronique_Ouverte-0.md
@@ -0,0 +1,44 @@
+---
+title: "Centre pour l'Édition Électronique Ouverte"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Centre_pour_l'Édition_Électronique_Ouverte"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:14:53.040562+00:00"
+instance: "kb-cron"
+---
+
+The Centre pour l'Édition Électronique Ouverte (CLEO, Cléo; transl. Centre for Open Electronic Publishing), based in Marseille, France, is overseen by Aix-Marseille University, the Centre National de la Recherche Scientifique, School for Advanced Studies in the Social Sciences, and University of Avignon and the Vaucluse. It produces the open access academic publishing portal OpenEdition.org, which includes platforms Calenda, Hypotheses, OpenEdition Books, and OpenEdition Journals. OpenEdition focuses on publications in the academic fields of humanities and social sciences. The centre also issues a blog about open access.
+
+
+== OpenEdition Books ==
+include:
+
+Bak-Geller, Sarah (28 July 2022). "Patrimonio alimentario y ciudadanía indígena. El caso coca de Mezcala, Jalisco (México)". In Rebaï, Nasser; Bilhaut, Anne-Gaël; de Suremain, Charles-Édouard; Katz, Esther; Paredes, Myriam (eds.). Patrimonios alimentarios en América Latina : Recursos locales, actores y globalización (in Spanish). IRD Éditions. pp. 191–214. ISBN 978-2-7099-2943-1. Retrieved 27 April 2023 – via Centre pour l'Édition Électronique Ouverte. Introducción
+
+
+== OpenEdition Journals ==
+
+The following list includes some examples of titles in Journals.openedition.org (prior to December 2017 known as Revues.org):
+
+
+== See also ==
+OpenEdition access via Wikipedia Library
+Open access journal
+Open access in France
+List of academic databases and search engines
+
+
+== References ==
+
+
+== Bibliography ==
+Jean-Christophe Peyssard (2011), OpenEdition Freemium: developing a sustainable library-centered economic model for open access (PDF), International Federation of Library Associations and Institutions
+Open Access Rules in France: Persée, érudit, and revues.org. USA: Villanova University. 2012 – via Falvey Library Blogs: History & Political Science.
+
+
+== External links ==
+Centre pour l'édition électronique ouverte Archived 2017-07-18 at the Wayback Machine official site
+OpenEdition.org official site
+Books.openedition.org (OpenEdition Books) official site
+Journals.openedition.org (OpenEdition Journals) official site
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Concepticon-0.md b/data/en.wikipedia.org/wiki/Concepticon-0.md
index 03bd6bfd9..8815f6a26 100644
--- a/data/en.wikipedia.org/wiki/Concepticon-0.md
+++ b/data/en.wikipedia.org/wiki/Concepticon-0.md
@@ -4,7 +4,7 @@ chunk: 1/1
 source: "https://en.wikipedia.org/wiki/Concepticon"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T08:11:23.469181+00:00"
+date_saved: "2026-05-05T10:16:34.587218+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Coronavirus_Tech_Handbook-0.md b/data/en.wikipedia.org/wiki/Coronavirus_Tech_Handbook-0.md
new file mode 100644
index 000000000..edc03befb
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Coronavirus_Tech_Handbook-0.md
@@ -0,0 +1,18 @@
+---
+title: "Coronavirus Tech Handbook"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Coronavirus_Tech_Handbook"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:16:35.788490+00:00"
+instance: "kb-cron"
+---
+
+The Coronavirus Tech Handbook was a website designed to crowdsource information about the SARS-CoV-2 coronavirus. It was developed at Newspeak House, a hackerspace for politics in London, England.
+The site, which launched in March 2020, was hosted as an interlinked collection of user-editable online documents, which made it effectively a wiki. As of October 2020 it had expanded to provide tools for consumers, businesses, local governments, and developers, amongst others, to help combat the COVID-19 pandemic.
+Its stated aim was to provide:
+
+a space for technologists, civic organisations, public & private institutions, researchers and specialists of all kinds to collaborate on a rapid and sophisticated response to the coronavirus outbreak and subsequent impacts.
+
+
+== References ==
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Covid_Act_Now-0.md b/data/en.wikipedia.org/wiki/Covid_Act_Now-0.md
new file mode 100644
index 000000000..39f5235b0
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Covid_Act_Now-0.md
@@ -0,0 +1,44 @@
+---
+title: "Covid Act Now"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Covid_Act_Now"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:16:36.944943+00:00"
+instance: "kb-cron"
+---
+
+Covid Act Now (CAN) is an independent, 501(c)(3) nonprofit that provides local-level disease intelligence and data analysis on the COVID-19 pandemic in the United States, via a website and an API.
+CAN assists partners ranging from local county health departments to multinational corporations in developing COVID response plans. Its API is used by many of the Fortune 500 to make data-driven reopening decisions.
+The organization's first product was a traditional SEIR model for predicting the rate of COVID spread in the U.S. The model was based on open-source code by Alison Hill, an assistant professor at the Johns Hopkins’ Institute for Computational Medicine. Rebecca Katz and her team have served as critical advisors.
+CAN's modelling and data partners include Grand Rounds, a digital healthcare company, and USA Facts. Its university affiliates are Georgetown University Medical Center, Stanford Medicine, and Harvard Global Health Institute.
+
+
+== History ==
+CAN began as a collaboration between four volunteers — Max Henderson (a former Google employee), Igor Kofman (a former Dropbox engineer), Zachary Rosen, and Jonathan Kreiss-Tomkins — publishing the first version of their model on March 20, 2020. The team was soon joined by public health experts, data scientists, and other professionals. The initial model raised awareness of the critical shortage of hospital capacity that the U.S. would face if the spread of COVID-19 was not mitigated.
+
+
+== Features ==
+The platform provides a range of features, including:
+
+A realtime map to rate the COVID-19 risk level of each U.S. state and county, incorporating both disease prevalence and the quality of local response.
+Data on available intensive care unit beds.
+Information on the rate of positive COVID-19 tests in different regions.
+Metrics on the effectiveness of contact tracing efforts.
+Vaccination eligibility and rates.
+A 22-second animation depicting the initial spread of COVID-19 through the U.S.
+
+
+== Impact ==
+CAN's models and data visualizations were used by officials in multiple states to aid in decision-making related to lockdowns, reopening, and resource allocation. The organization's work was cited in policy discussions and media reports throughout the pandemic. In mid-2021, Business Insider described CAN as a "leading US non-profit". As of October 2023, the organization claims to have served tens of millions of users and to have supported hundreds of federal, state, and county officials as well as numerous multinational corporations and NGOs.
+
+
+== Criticism ==
+Like many models during the early days of the COVID-19 pandemic, CAN's initial projections faced scrutiny for assumptions made and data used. However, the team responded to feedback by refining their models and adding more sources of data over time.
+
+
+== References ==
+
+
+== External links ==
+Covid Act Now (CAN)
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/DBpedia-0.md b/data/en.wikipedia.org/wiki/DBpedia-0.md
new file mode 100644
index 000000000..e4c7254cb
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/DBpedia-0.md
@@ -0,0 +1,34 @@
+---
+title: "DBpedia"
+chunk: 1/2
+source: "https://en.wikipedia.org/wiki/DBpedia"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:16:50.111747+00:00"
+instance: "kb-cron"
+---
+
+DBpedia (from "DB" for "database") is a project aiming to extract structured content from the information created in the Wikipedia project. This structured information is made available on the World Wide Web using OpenLink Virtuoso. DBpedia allows users to semantically query relationships and properties of Wikipedia resources, including links to other related datasets.
+The project was heralded as "one of the more famous pieces" of the decentralized linked data effort by Tim Berners-Lee, the inventor of the World Wide Web. As of June 2021, DBpedia contained over 850 million semantic triples.
+
+== Background ==
+The project was started by people at the Free University of Berlin and Leipzig University in collaboration with OpenLink Software, and is now maintained by people at the University of Mannheim, Leipzig University, and the University of Pennsylvania. The first publicly available dataset was published in 2007. The data is made available under free licenses (CC BY-SA), allowing others to reuse the dataset; it does not use an open data license to waive the sui generis database rights.
+Wikipedia articles consist mostly of free text, but also include structured information embedded in the articles, such as "infobox" tables (the pull-out panels that appear in the top right of the default view of many Wikipedia articles, or at the start of the mobile versions), categorization information, images, geo-coordinates and links to external web pages. This structured information is extracted and put in a uniform dataset which can be queried.
+
+== Dataset ==
+The 2016-04 release of the DBpedia data set describes 6.0 million entities, out of which 5.2 million are classified in a consistent ontology, including 1.5 million persons, 810,000 places, 135,000 music albums, 106,000 films, 20,000 video games, 275,000 organizations, 301,000 species and 5,000 diseases. DBpedia uses the Resource Description Framework (RDF) to represent extracted information and consists of 9.5 billion RDF triples, of which 1.3 billion were extracted from the English Wikipedia and 5.0 billion from other language editions.
+From this data set, information spread across multiple pages can be extracted. For example, book authorship can be put together from pages about the work, or the author.
+One of the challenges in extracting information from Wikipedia is that the same concepts can be expressed using different parameters in infobox and other templates, such as |birthplace= and |placeofbirth=. Because of this, queries about where people were born would have to search for both of these properties in order to get more complete results.  As a result, the DBpedia Mapping Language has been developed to help in mapping these properties to an ontology while reducing the number of synonyms. Due to the large diversity of infoboxes and properties in use on Wikipedia, the process of developing and improving these mappings has been opened to public contributions.
+Version 2014 was released in September 2014. A main change since previous versions was the way abstract texts were extracted. Specifically, running a local mirror of Wikipedia and retrieving rendered abstracts from it made extracted texts considerably cleaner. Also, a new data set extracted from Wikimedia Commons was introduced.
+As of June 2021, DBpedia contains over 850 million triples.
+
+== Examples ==
+DBpedia extracts factual information from Wikipedia pages, allowing users to find answers to questions where the information is spread across multiple Wikipedia articles. Data is accessed using an SQL-like query language for RDF called SPARQL. 
+For example, if one were interested in the Japanese shōjo manga series Tokyo Mew Mew, and wanted to find the genres of other works written by its illustrator Mia Ikumi. DBpedia combines information from Wikipedia's entries on Tokyo Mew Mew, Mia Ikumi and on this author's works such as Super Doll Licca-chan and Koi Cupid. Since DBpedia normalises information into a single database, the following query can be asked without needing to know exactly which entry carries each fragment of information, and will list related genres:
+
+== Use cases ==
+DBpedia has a broad scope of entities covering different areas of human knowledge. This makes it a natural hub for connecting datasets, where external datasets could link to its concepts. The DBpedia dataset is interlinked on the RDF level with various other Open Data datasets on the Web. This enables applications to enrich DBpedia data with data from these datasets. As of September 2013, there are more than 45 million interlinks between DBpedia and external datasets including: Freebase, OpenCyc, UMBEL, GeoNames, MusicBrainz, CIA World Factbook, DBLP, Project Gutenberg, DBtune Jamendo, Eurostat, UniProt, Bio2RDF, and US Census data. The Thomson Reuters initiative OpenCalais, the Linked Open Data project of The New York Times, the Zemanta API and DBpedia Spotlight also include links to DBpedia. The BBC uses DBpedia to help organize its content. Faviki uses DBpedia for semantic tagging. Samsung also includes DBpedia in its "Knowledge Sharing Platform".
+Such a rich source of structured cross-domain knowledge is fertile ground for artificial intelligence systems. DBpedia was used as one of the knowledge sources in IBM Watson's Jeopardy! winning system.
+Amazon provides a DBpedia Public Data Set that can be integrated into Amazon Web Services applications.
+Data about creators from DBpedia can be used for enriching artworks' sales observations.
+The crowdsourcing software company, Ushahidi, built a prototype of its software that leveraged DBpedia to perform semantic annotations on citizen-generated reports. The prototype incorporated the "YODIE" (Yet another Open Data Information Extraction system) service developed by the University of Sheffield, which uses DBpedia to perform the annotations. The goal for Ushahidi was to improve the speed and facility with which incoming reports could be validated managed.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/DBpedia-1.md b/data/en.wikipedia.org/wiki/DBpedia-1.md
new file mode 100644
index 000000000..8f5009f5f
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/DBpedia-1.md
@@ -0,0 +1,31 @@
+---
+title: "DBpedia"
+chunk: 2/2
+source: "https://en.wikipedia.org/wiki/DBpedia"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:16:50.111747+00:00"
+instance: "kb-cron"
+---
+
+== DBpedia Spotlight ==
+DBpedia Spotlight is a tool for annotating mentions of DBpedia resources in text. This allows linking unstructured information sources to the linked open data cloud through DBpedia. DBpedia Spotlight performs named entity extraction, including entity detection and name resolution (in other words, disambiguation). It can also be used for named entity recognition, and other information extraction tasks. DBpedia Spotlight aims to be customizable for many use cases. Instead of focusing on a few entity types, the project strives to support the annotation of all 3.5 million entities and concepts from more than 320 classes in DBpedia. The project started in June 2010 at the Web Based Systems Group at the Free University of Berlin.
+DBpedia Spotlight is publicly available as a web service for testing and a Java/Scala API licensed via the Apache License. The DBpedia Spotlight distribution includes a jQuery plugin that allows developers to annotate pages anywhere on the Web by adding one line to their page. Clients are also available in Java or PHP. The tool handles various languages through its demo page and web services. Internationalization is supported for any language that has a Wikipedia edition.
+
+== Archivo ontology database ==
+From 2020, the DBpedia project provides a regularly updated database of web‑accessible ontologies written in the Web Ontology Language (OWL).  Archivo also provides a four star rating scheme for the ontologies it scrapes, based on accessibility, quality, and related fitness‑for‑use criteria.  For instance, SHACL compliance for graph‑based data is evaluated when appropriate.  Ontologies should also contain metadata about their characteristics and specify a public license describing their terms‑of‑use.  As of June 2021 the Archivo database contains 1368 entries.
+
+== History ==
+DBpedia was initiated in 2007 by Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak and Zachary Ives.
+
+== See also ==
+BabelNet
+Semantic MediaWiki
+Wikidata
+YAGO (database)
+
+== References ==
+
+== External links ==
+
+Official website
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Data.gov-0.md b/data/en.wikipedia.org/wiki/Data.gov-0.md
new file mode 100644
index 000000000..840658d53
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Data.gov-0.md
@@ -0,0 +1,58 @@
+---
+title: "Data.gov"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Data.gov"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:16:44.164134+00:00"
+instance: "kb-cron"
+---
+
+Data.gov is a U.S. government website launched in late May 2009 by the federal chief information officer of the United States, Vivek Kundra. Data.gov aims to improve public access to high value, machine-readable datasets generated by the Executive Branch of the federal government. The site is a repository for federal, state, local, and tribal government information made available to the public.
+
+
+== History and background ==
+On March 5, 2009, shortly after his appointment as the first federal chief information officer, Vivek Kundra announced the creation of Data.gov. The website is managed and hosted by the U.S. General Services Administration, Technology Transformation Services.
+The site introduced the philosophy of digital open data to the U.S. Federal government, an approach which according to the book Democratizing Data will have benefits for states including "rebuilding confidence in government and business".
+Data.gov has grown from 47 datasets at launch to over 370,000 datasets. Jeanne Holm, Chief Knowledge Architect for the National Aeronautics and Space Administration (NASA), was the Evangelist and knowledge architect for Data.gov, James Hendler, an artificial intelligence researcher at Rensselaer Polytechnic Institute, was at the time named the "Internet Web Expert" and tasked with helping Data.gov exploit advanced Web technologies.
+Data.gov was one of the first efforts to create an open data ecosystem—using data as the basis for connecting government agencies, researchers, businesses, and civil society. Communities of practice were created around key topics such as climate, providing a way for researchers to ask for data and to coordinate work across government agencies. By the end of 2010, most Federal agencies had published data on Data.gov. In November 2010, the Data.gov team hosted the first International Open Government Data Conference with 10 nations participating to expand the principles of open data. This conference grew to become the International Open Data Conference. 
+By 2012, open data from Data.gov was regularly used by civil society and business. Community led efforts like hackathons from Code for America and events such as the National Day of Civic Hacking, relied on government data provided by Data.gov. The Gov Lab created the Open Data 500 to showcase businesses built on open data provided by Data.gov. To ensure open data's sustainability, President Obama created an executive order on "Making Open and Machine Readable the New Default for Government Information" to formalize Data.gov as the permanent repository for open government data.
+McKinsey & Company published research showing that open data contributed $3 trillion to the U.S. economy. Two of the biggest datasets for economic impact have been global positioning satellite data from the U.S. Space Force and weather data from the National Weather Service. By 2014, all 175 Federal agencies and 77 other organizations had published data on the site, in both human understandable and machine-readable formats and with open APIs.
+On January 14, 2019, the OPEN Government Data Act, as part of the Foundations for Evidence Based Policymaking Act, became law. The OPEN Government Data Act makes Data.gov a requirement in statute, rather than a policy. It requires federal agencies to publish their information online as open data, using standardized, machine-readable data formats, with their metadata included in the Data.gov catalog. Data.gov is working with an expanded group of federal agencies to include their datasets in Data.gov as they implement the new law. 
+
+
+=== Open Government Directive ===
+The U.S. Open Government Directive of December 8, 2009, required that all agencies post at least three high-value data sets online and register them on Data.gov within 45 days.
+
+
+=== OPEN Government Data Act ===
+The Foundations for Evidence-Based Policymaking Act of 2018 (“Evidence Act”) signed into law on January 14, 2019, emphasizes collaboration and coordination to advance data and evidence-building functions in the Federal Government by statutorily mandating Federal evidence-building activities, open government data, and confidential information protection and statistical efficiency. 
+Title II of the Foundations for Evidence Based Policymaking Act, the OPEN Government Data Act, requires additional agencies to comply with the statute by providing access to free, open, and machine readable data. 
+Additionally, the Office of Management and Budget is required to collaborate with the Office of Government Information Services and the Administrator of General Services to develop and maintain an online repository of tools, best practices, and schema standards to facilitate the adoption of open data practices across the Federal Government.
+
+
+=== Data removal ===
+In January 2025, following the inauguration of Donald Trump as the 47th President, more than 2,000 datasets were removed from the website.
+
+
+== See also ==
+ClinicalTrials.gov
+Science.gov
+Government 2.0
+Open Government Initiative
+data.gov.uk
+data.gov.in
+CKAN
+Open data in the United States
+
+
+== References ==
+
+
+== External links ==
+data.gov
+Wired How-To Wiki - Open Up Government Data
+A wiki with RDF versions of many of the data.gov datasets hosted at RPI
+Case study description of data.gov development by REI Systems
+datagov.ideascale.com - Official consultation: Evolving data.gov with You
+French governmental Open Data Directory
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Data.gov.in-0.md b/data/en.wikipedia.org/wiki/Data.gov.in-0.md
new file mode 100644
index 000000000..313aa0315
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Data.gov.in-0.md
@@ -0,0 +1,50 @@
+---
+title: "Data.gov.in"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Data.gov.in"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:16:45.338996+00:00"
+instance: "kb-cron"
+---
+
+Open Government Data (OGD) Platform India or data.gov.in is a platform for supporting Open data initiative of Government of India. This portal is a single-point access to datasets, documents, services, tools and applications published by ministries, departments and organisations of the Government of India. It combines and expands the best features of India government's India.gov.in and the U.S. government's data.gov project.
+
+
+== History ==
+After announcing the launch of the site in June 2011, the site was launched in October 2012. part of the Open Government Initiative  was launched during October 2012, in compliance with the National Data Sharing and Accessibility Policy (NDSAP) of India, Gazette notified in March 2012.
+According to the preamble of NDSAP, there has been an increasing demand by the community that data collected with the deployment of public funds should be made more readily available to all, for enabling rational debate, better decision making and use in meeting civil society needs.
+The policy envisages proactive dissemination of data by Government ministries, departments, organizations.
+
+
+== Overview ==
+The site is based on Drupal Framework, and has four major modules:
+
+Data Management System (DMS): This facilitates publishing of datasets/applications by authorised users from Ministries/Departments/Organisations.
+Content Management System (CMS): This module is used to update or create content and functionalities for Data Portal India.
+Visitor Relationship Management (VRM): This module facilitates collation and dissemination of feedback/suggestions received on Data Portal India.
+Communities: People with specific interest can connect through online communities.
+The product is developed based on the Open Government platform and its source code is available on GitHub.
+
+
+== Open Government Data (OGD) Platform India ==
+Open Government Data (OGD) Platform India  was developed jointly by India & US government as a result of announcement made by President Obama and Prime Minister Shri Manmohan Singh during the Indo-US Open Government Dialogue  in 2010.
+Open data platform is a being implemented in US as their data.gov. In India the platform was further customised by  National Informatics Centre (NIC) in line with the National Data Sharing Accessibility Policy to develop the Data Portal India.
+Open data platform is also being offered to other countries. Ghana and Rwanda are also being powered by Open data platform.
+
+
+== See also ==
+National Data Sharing and Accessibility Policy – Government of India
+data.gov
+data.gov.uk
+India.gov.in
+My Gov
+USAFacts
+India Data Portal
+
+
+== References ==
+
+
+== External links ==
+Official website
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Data.gov.uk-0.md b/data/en.wikipedia.org/wiki/Data.gov.uk-0.md
new file mode 100644
index 000000000..c5bb3a3f5
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Data.gov.uk-0.md
@@ -0,0 +1,36 @@
+---
+title: "Data.gov.uk"
+chunk: 1/2
+source: "https://en.wikipedia.org/wiki/Data.gov.uk"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:16:46.481888+00:00"
+instance: "kb-cron"
+---
+
+data.gov.uk is a UK Government project to make available non-personal UK government data as open data.  It was launched as closed beta in 30 September 2009 (2009-09-30), and publicly launched in January 2010 (2010-01).  As of February 2015, it contained over 19,343 datasets, rising to over 40,000 in 2017, and more than 47,000 by 2023.  data.gov.uk is listed in the Registry of Research Data Repositories re3data.org.
+
+== Beta version and launch ==
+The beta version of data.gov.uk has been online since the 30 September 2009 (2009-09-30), and by January 2010 2,400 developers start experimenting with the data.  When the project was officially launched in January 2010, it contained 2,500 data sets.
+
+== Data available ==
+
+data.gov.uk contains over 30,000 data sets from many UK Government departments.  All data is non-personal, and provided in a format that allows it to be reused.  data.gov.uk intends to increase the use of Linked Data standards, to allow people to provide data to data.gov.uk in a way that allows for flexible and easy reuse.  As of April 2010, the following UK Government departments and agencies have provided data sets to data.gov.uk: BusinessLink, the Cabinet Office, the Department for Business, Innovation and Skills, the Department for Children, Schools and Families, the Department for Communities and Local Government, the Department for Culture, Media and Sport, the Department for Environment, Food and Rural Affairs, the Department for International Development, the Department for Transport, the Department for Work and Pensions, the Department of Energy and Climate Change, the Department of Health, the Foreign and Commonwealth Office, the Home Office, His Majesty's Treasury, Lichfield District Council, Runnymede Borough Council, the Ministry of Defence, the Ministry of Justice, the Northern Ireland Office, the Ordnance Survey, and the Society of Information Technology Management.
+
+=== Ordnance Survey data ===
+When data.gov.uk was officially launched in January 2010, Ordnance Survey (OS) data was something that Sir Tim Berners-Lee and Prof Nigel Shadbolt wanted to see opened up as part of the project.  Ordnance Survey data was included in data.gov.uk on 1 April 2010 It provides information on geographical locations.  According to Shadbolt, it "will make a real difference to the way that people make sense of the information".
+
+=== Combined Online Information System (COINS) data ===
+On the 3 June 2010, the Treasury released the Combined Online Information System (COINS) data for the financial years 2008/09 and 2009/10.  The Combined Online Information System, is known as COINS. The 4.3 GB of COIN data included 3.2 million items between 2009/10, and was released on BitTorrent.  At the time, the UK government stated that data for 2010/11 would be released in June 2011.  On 15 June, the UK Government published the COINS data for the financial years 2007/08, 2006/07, and 2005/06 on data.gov.uk. The data was made for the (now defunct) RA.Pid Gateway run by Rosslyn Analytics.
+In the past, the HM Treasury had refused requests.
+
+=== Data and interpretation to be added ===
+data.gov.uk is working with UK Government departments, agencies, and local authorities to release more data.  Shadbolt also wants local government data included in data.gov.uk.  The UK Parliament's Public Accounts Committee noted in 2012 that "more could be done to assist interpretation and to build on emerging interest".
+
+== Data use and licensing ==
+All data included in data.gov.uk is covered either by Crown copyright protections, or the database right, or copyright have been licensed to the Crown.  In turn, all data available on data.gov.uk is available under a worldwide, royalty-free, perpetual, non-exclusive licence which permits use of the data under the following conditions: the copyright and the source of the data should be acknowledged by including an attribution statement specified by data.gov.uk, which is 'name of data provider' data © Crown copyright and database right.  The inclusion of the same acknowledgement is required in sub-licensing of the data, and further sub-licences should require the same.  The data should not be used in a way that suggests that the data provider endorses the use of the data.  And the data or its source should not be misrepresented.
+The Open Government Licence (OGL) applies to Crown copyright data, and permits anyone to copy, distribute, and transmit the data, adapt the data, exploit the data commercially, whether by sub-licensing it, combining it with other data, or by including it in products and applications.  The terms of the licence are aligned with any Creative Commons Attribution 3.0 licence.  Hence data.gov.uk data can be mixed with information licensed under Creative Commons licences to create derivative work, which can be distributed under the Creative Commons Attribution 3.0 licence.  When users submit information to data.gov.uk, they grant the Crown a non-exclusive, irrevocable right to use and pass on all public information submitted, such as descriptions of ideas and screenshots of apps, as well as the right to re-use allow the re-use of that information.  All content on the site is placed under the same licence terms as the data, though user ideas and application remain their own.
+The Crown copyright licence does not affect fair dealing or fair use rights, or any other exceptions and limitations to copyright or database rights.  The data are licensed 'as is', and data.gov.uk does not accept liabilities in relation to the data or provide warranties.  Neither does data.gov.uk guarantee the continued supply of the data.
+
+== Government project ==
+Authorised by the UK Cabinet Office, and aimed for the release of public data to become 'business as usual' across public bodies, as set out in Putting the Frontline First: Smarter Government, which established the UK Government's approach to public data and the release of that data.  data.gov.uk amongst others, delivers on the commitment made in Putting the Frontline First to integrate data from the Publications Hub for National Statistics and to release more data relating to health.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Data.gov.uk-1.md b/data/en.wikipedia.org/wiki/Data.gov.uk-1.md
new file mode 100644
index 000000000..3a8bd6a8f
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Data.gov.uk-1.md
@@ -0,0 +1,45 @@
+---
+title: "Data.gov.uk"
+chunk: 2/2
+source: "https://en.wikipedia.org/wiki/Data.gov.uk"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:16:46.481888+00:00"
+instance: "kb-cron"
+---
+
+== Current technology infrastructure ==
+The site uses the CKAN platform for data publishing.  There is high variability in the format and presentation of the data; some data files are available as structured raw data in a machine-readable format such as CSV, while others are only available as analysed data in a human-friendly format such as a PDF file containing a pivot table.  data.gov.uk functions as a searchable data catalogue, with links to data that is hosted by the individual UK Government departments, and does not host data itself.
+
+== Previous technology infrastructure ==
+In addition to the data searchable through the data.gov.uk site, from 2016 until 2021, a very small number of datasets were  made available as 'registers' through the Registers Service.  Registers were structured raw datasets that are intended to be a canonical, reliable, and always up-to-date source of data.  Registers shared a common API, and can be read by both humans and machines.  They were offered as JSON, CSV, and RDF files, the latter allowing to link multiple registers together.  The Registers service was retired on 15 March 2021. 
+
+== Similar projects in other countries ==
+
+The European Public Sector Information (PSI) Platform maintains a list of PSI data catalogues provided by governments, and providing direct access to data.
+The European Commission (EC) has created two portals for the European Union (EU): the EU Open Data Portal, which gives access to open data from the EU institutions, agencies, and other bodies, and the PublicData portal that provides datasets from local, regional, and national public bodies across Europe.  In the Netherlands, the DataverseNL Network hosts data deposited by Dutch Universities and Institutes.
+
+== See also ==
+
+Government 2.0
+GOV.UK
+Linked data
+Merton Thesis
+Open access (publishing)
+Open content
+Open data
+Open research
+TheyWorkForYou
+Open.data.gov.sa
+
+== References ==
+
+== Further reading ==
+Davies, Tim (2010). The potential of open government data as a tool in democratic engagement and reform of public services : the case of data.gov.uk (Thesis). University of Oxford, England. OCLC 701462227.
+Nigel Shadbolt; Kieron O'Hara; Tim Berners-Lee; Nicholas Gibbins; Hugh Glaser; Wendy Hall; M. C. Schraefel (May 2012). "Linked Open Government Data: Lessons from data.gov.uk" (PDF). IEEE Intelligent Systems. 27 (3). IEEE Intelligent Systems, 27, 201205, 16: 16–24. doi:10.1109/MIS.2012.23. ISSN 1541-1672. OCLC 5872705607. S2CID 16865792.
+
+== External links ==
+data.gov.uk — official homepage
+Cabinet Office - Transparency
+10 Downing Street - Transparency Archived 19 September 2015 at the Wayback Machine
+Guardian video: Tim Berners-Lee on the UK national data website launch
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/DataViva-0.md b/data/en.wikipedia.org/wiki/DataViva-0.md
new file mode 100644
index 000000000..471c29c35
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/DataViva-0.md
@@ -0,0 +1,29 @@
+---
+title: "DataViva"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/DataViva"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:16:48.890666+00:00"
+instance: "kb-cron"
+---
+
+DataViva is an information visualization engine created by the Strategic Priorities Office of the government of Minas Gerais. DataViva makes official data about exports, industries, locations and occupations available for the entirety of Brazil through eight apps and more than 100 million possible visualizations.
+The first set of datum – also available at ALICEWEB – is provided by MDIC (Ministry of Development, Industry and Foreign Trade) / SECEX (Secretariat of Foreign Trade), an official institution of the Government of Brazil and shows foreign trade statistics for all exporting municipalities in the country. The other database, provided by Ministério do Trabalho e Emprego (MTE – Ministry of Labor and Employment), shows information about all the industries and occupations in Brazil (RAIS – Annual Social Information Report).
+The platform consists of eight core applications, each of which allows different ways of visualizing the data available. Some applications are descriptive, that is, showing data aggregated at various levels in a simple and comparative way, such as Treemapping. Others are prescriptive, using calculations that allow an analytic visualization of the data, based on theories such as the Product Space. All the applications are generated using D3plus, an open source JavaScript library built on top of D3.js by Alexander Simoes and Dave Landry.
+Inspired by The Observatory of Economic Complexity, DataViva is an open data, open-source, and free to use tool.
+It was developed in a partnership with Datawheel, co-founded by MIT Media Lab Professor César Hidalgo, and is maintained by the Government of Minas Gerais.
+
+
+== References ==
+
+Press coverage
+
+
+== External links ==
+
+Official website
+dataviva on GitHub
+DataViva Documentation
+The Necessity For Open Data
+D3plus (Visualization Library Powering DataViva)
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Data_Commons-0.md b/data/en.wikipedia.org/wiki/Data_Commons-0.md
new file mode 100644
index 000000000..2ac019ab0
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Data_Commons-0.md
@@ -0,0 +1,31 @@
+---
+title: "Data Commons"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Data_Commons"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:16:41.760415+00:00"
+instance: "kb-cron"
+---
+
+Data Commons is an open-source platform created by Google that provides an open knowledge graph, combining economic, scientific and other public datasets into a unified view. Ramanathan V. Guha, a creator of web standards including RDF, RSS, and Schema.org, founded the project, which is now led by Prem Ramaswami.
+The Data Commons website was launched in May 2018 with an initial dataset consisting of fact-checking data published in Schema.org "ClaimReview" format by several fact checkers from the International Fact-Checking Network. Google has worked with partners such as the United Nations  (UN) to populate the repository, which also includes data from the United States Census, the World Bank, the US Bureau of Labor Statistics, Wikipedia, the National Oceanic and Atmospheric Administration and the Federal Bureau of Investigation.
+The service expanded during 2019 to include an RDF-style knowledge graph populated from a number of largely statistical open datasets. The service was announced to a wider audience in 2019. In 2020 the service improved its coverage of non-US datasets, while also increasing its coverage of bioinformatics and coronavirus. In 2023, the service relaunched with a natural-language front end powered by a large language model. It also launched as the back end to the UN data portal with Sustainable Development Goals data.
+
+
+== Features ==
+Data Commons places more emphasis on statistical data than is common for linked data and knowledge graph initiatives. It includes geographical, demographic, weather and real estate data alongside other categories, describing states, Congressional districts, and cities in the United States as well as biological specimens, power plants, and elements of the human genome via the Encyclopedia of DNA Elements (ENCODE) project. It represents data as semantic triples each of which can have its own provenance. It centers on the entity-oriented integration of statistical observations from a variety of public datasets. Although it supports a subset of the W3C SPARQL query language, its APIs also include tools — such as a Pandas dataframe interface — oriented towards data science, statistics and data visualization.
+Data Commons is integrative, meaning that it does not provide a hosting platform for different datasets, but rather attempts to consolidate much of the information provided by the datasets into a single data graph.
+
+
+== Technology ==
+Data Commons is built on a graph data-model. The graph can be accessed through a browser interface and several APIs, and is expanded through loading data (typically CSV and MCF-based templates). The graph can be accessed by natural language queries in Google Search. The data vocabulary used to define the datacommons.org graph is based upon Schema.org. In particular the Schema.org terms StatisticalPopulation and Observation were proposed to Schema.org to support datacommons-like use cases.
+Software from the project is available on GitHub under Apache 2 license.
+
+
+== References ==
+
+
+== External links ==
+Official website 
+GitHub repository
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Data_collaboratives-0.md b/data/en.wikipedia.org/wiki/Data_collaboratives-0.md
index 6936d5930..d7cd0ab6c 100644
--- a/data/en.wikipedia.org/wiki/Data_collaboratives-0.md
+++ b/data/en.wikipedia.org/wiki/Data_collaboratives-0.md
@@ -4,7 +4,7 @@ chunk: 1/2
 source: "https://en.wikipedia.org/wiki/Data_collaboratives"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T06:31:53.027907+00:00"
+date_saved: "2026-05-05T10:16:40.563232+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Data_collaboratives-1.md b/data/en.wikipedia.org/wiki/Data_collaboratives-1.md
index 2d00ecdd3..a4916e056 100644
--- a/data/en.wikipedia.org/wiki/Data_collaboratives-1.md
+++ b/data/en.wikipedia.org/wiki/Data_collaboratives-1.md
@@ -4,7 +4,7 @@ chunk: 2/2
 source: "https://en.wikipedia.org/wiki/Data_collaboratives"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T06:31:53.027907+00:00"
+date_saved: "2026-05-05T10:16:40.563232+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Data_publishing-0.md b/data/en.wikipedia.org/wiki/Data_publishing-0.md
index ea37c7dbd..1c9bf1914 100644
--- a/data/en.wikipedia.org/wiki/Data_publishing-0.md
+++ b/data/en.wikipedia.org/wiki/Data_publishing-0.md
@@ -4,7 +4,7 @@ chunk: 1/1
 source: "https://en.wikipedia.org/wiki/Data_publishing"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T06:31:54.244819+00:00"
+date_saved: "2026-05-05T10:14:59.043159+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Data_sharing-0.md b/data/en.wikipedia.org/wiki/Data_sharing-0.md
index 92ea40dea..86417be73 100644
--- a/data/en.wikipedia.org/wiki/Data_sharing-0.md
+++ b/data/en.wikipedia.org/wiki/Data_sharing-0.md
@@ -4,7 +4,7 @@ chunk: 1/2
 source: "https://en.wikipedia.org/wiki/Data_sharing"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T07:02:27.558145+00:00"
+date_saved: "2026-05-05T10:16:42.943003+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Data_sharing-1.md b/data/en.wikipedia.org/wiki/Data_sharing-1.md
index ad6a9ff8d..2ab0d0ec6 100644
--- a/data/en.wikipedia.org/wiki/Data_sharing-1.md
+++ b/data/en.wikipedia.org/wiki/Data_sharing-1.md
@@ -4,7 +4,7 @@ chunk: 2/2
 source: "https://en.wikipedia.org/wiki/Data_sharing"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T07:02:27.558145+00:00"
+date_saved: "2026-05-05T10:16:42.943003+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Dataverse-0.md b/data/en.wikipedia.org/wiki/Dataverse-0.md
index 5d0dfcd04..7e819000d 100644
--- a/data/en.wikipedia.org/wiki/Dataverse-0.md
+++ b/data/en.wikipedia.org/wiki/Dataverse-0.md
@@ -4,7 +4,7 @@ chunk: 1/1
 source: "https://en.wikipedia.org/wiki/Dataverse"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T06:31:56.764149+00:00"
+date_saved: "2026-05-05T10:16:47.696642+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Deklarator-0.md b/data/en.wikipedia.org/wiki/Deklarator-0.md
new file mode 100644
index 000000000..49ff915e1
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Deklarator-0.md
@@ -0,0 +1,49 @@
+---
+title: "Deklarator"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Deklarator"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:16:51.256333+00:00"
+instance: "kb-cron"
+---
+
+The Declarator (Russian: Декларатор)  is a russian online service for processing anti-corruption declarations of officials. As of 2025, the service contains records of more than 1.3 million public officials.
+
+
+== Project background and development ==
+Declarator was created in 2011 with the aim of increasing the transparency and accessibility of information on the income and property of Russian officials. The project was a response to the introduction in Russia in 2008 of mandatory income and property declarations for civil servants. Initially, the project team was engaged in manually collating data from declarations into spreadsheets. However, this approach was not efficient enough to process large volumes of information. This led to the decision to develop a specialized database and a website for publishing the collected information.
+In 2013 to 2014, the project received support from the Civil Initiatives Committee of Russia, thanks to which declarations from a number of federal and regional authorities were processed and the technical base of the Declarator was improved. A significant contribution to the development of the project was made by the HSE Project and Training Laboratory of Anti-Corruption Policy, whose students and interns participated in the creation of the database architecture and information processing.
+In 2018, an API was launched that allows third-party projects to access and use the Declarator database.
+The key figure and speaker of the project is Andrey Zhvirblis. He stood at the origins of the database creation and has been coordinating the team's work since its launch.
+
+
+== Project goals and objectives ==
+The Declarator project aims to increase transparency and public control over the income and property of officials in Russia. The main task is to collect disparate information about the income, assets and property liabilities of officials at all levels (from the school principal to the president) and present it in a convenient, machine-readable format.
+Careful examination of financial declarations of public officials allows journalists, activists and citizens to identify potential conflicts of interest, illicit enrichment or discrepancies between declared income and assets.
+
+
+== Practical application ==
+The publication Important Stories has studied how civil servants receive real estate from the state, while many veterans are forced to wait for housing for years. The investigation indicates that some officials have purchased several apartments under preferential programs.
+The  Anti-Corruption Foundation (FBK) used the Declarator data to search for declarations of officials, which became the basis for a number of anti-corruption investigations, including into the ownership of elite real estate by government officials.
+Russia Post analyzed in a study how access to data on the incomes of officials in Russia has changed, citing the Declarator service.
+
+
+== Scientific research ==
+An analysis of government employee declarations using the Declarator was used in a study published in the American Journal of Political Science.
+
+
+== Restrictions and legislative changes ==
+Public access to some information is restricted by Russian law. Since the adoption of the Law on Combating Corruption in 2008 has undergone multiple revisions, some of which have significantly reduced the volume of open data.
+
+In 2022, a ban was introduced on the publication of declarations of employees of a number of government agencies during the Russian invasion of Ukraine.
+In 2023, a law was passed allowing deputies and senators not to publish income declarations, and also allowing municipal deputies not to submit declarations under certain conditions.
+The Federal Protective Service also restricts access to a number of data on high-ranking officials. Some agencies publish declarations in formats that are difficult to process automatically or delete archived information, which reduces the completeness and openness of information.
+In June 2025, the website of the Declarator project was blocked by Roskomnadzor following a court ruling in response to a complaint from the prosecutor's office regarding violations of personal data legislation.
+
+
+== References ==
+
+
+== External links ==
+Official website
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Diamond_open_access-0.md b/data/en.wikipedia.org/wiki/Diamond_open_access-0.md
new file mode 100644
index 000000000..d039c2863
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Diamond_open_access-0.md
@@ -0,0 +1,29 @@
+---
+title: "Diamond open access"
+chunk: 1/5
+source: "https://en.wikipedia.org/wiki/Diamond_open_access"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:14:37.993676+00:00"
+instance: "kb-cron"
+---
+
+Diamond open access refers to academic texts (such as monographs, edited collections, and journal articles) published/distributed/preserved with no fees to either reader or author. Alternative labels include platinum open access, non-commercial open access, cooperative open access or, more recently, open access commons. While these terms were first coined in the 2000s and the 2010s, they have been retroactively applied to a variety of structures and forms of publishing, from subsidized university publishers to volunteer-run cooperatives that existed in prior decades.
+In 2021, it is estimated that between 17,000 and 29,000 scientific journals rely on a diamond open access model. They make up 73% of the journals registered in the Directory of Open Access Journals and 44% of the articles, as their mean output is smaller than commercial journals. The diamond model has been especially successful in Latin America-based journals (95% of OA journals) following the emergence of large publicly supported platforms, such as SciELO and Redalyc. However, Diamond OA journals are under-represented in the major scholarly databases, such as Web of Science and Scopus. It is also noteworthy, that high-income countries "have the highest share of authorship in every domain and type of journal, except for diamond journals in the social sciences and humanities".
+In 2022, new national and international policies, such as the UNESCO recommendation on open science, and the Action Plan for Diamond Open Access promoted by the cOAlition S aim to support the development of non-commercial or community-driven forms of open access publishing.
+
+== Context and definition ==
+
+=== Historical roots of diamond models: knowledge clubs and commons ===
+
+Until the Second World War, academic publishing was mostly characterized by a wide range of community-driven scholarly structures with little concern for profitability. Most journals of the 19th century and the first part of the 20th century were collective initiatives led by a scientific movement or institution that largely relied on informal community norms rather than commercial regulations. These historical practices have been described as a form of knowledge commons, or, more specifically, as a knowledge club that holds an intermediary status between a knowledge commons and a private company: while managed by a community, journals are mostly used to the benefit of a selected set of authors and readers.
+In Western Europe and North America, direct ownership of journals by academic communities and institutions started to wane in the 1950s. The expansion of scientific publishing in the context of big science led to a perceived "crisis" of the historical model of scientific periodicals. Between 1950 and 1980, the new model of large commercial publishers came to dominate numerous fields of scientific publishing in western countries:
+
+The small society presses, struggling to cope with growing scale, were supported and then largely supplanted by the 'Big 5' commercial presses: Elsevier (which acquired Pergamon in 1991), Wiley, Springer, Taylor & Francis and Sage. These newly-empowered players brought an industrial approach to the publication and dissemination process, for the first time realising the benefits that these specialised capital and skills could provide by operating at a scale that was unprecedented to that date.
+This transformation had wide-ranging consequences over the way scientific journals were managed, not only at the economic but also at the editorial level with an increased standardization of publishing norms, peer-review process, or copyrights. Yet it was neither global nor general, and communal forms of journal ownership and management remained significant in large geographic areas (like Latin America) and in several disciplines, especially in the humanities and the social sciences.
+
+=== Development of "grassroots" open access (1990–2010) ===
+The open access movement emerged both as a consequence of the unprecedented access afforded by online publishing and as a reaction against the large corporate model that has come to dominate scientific publishing since the Second World War and the hyper-inflation of subscription prices. The early pioneers of open access electronic publishing were non-commercial and community-driven initiatives that built up on a trend of grassroot publishing innovation in the social sciences and the humanities:
+
+In the late '80s and early '90s, a host of new journal titles launched on listservs and (later) the Web. Journals such as Postmodern Cultures, Surfaces, the Bryn Mawr Classical Review and the Public-Access Computer Systems Review were all managed by scholars and library workers rather than publishing professionals.
+Specialized free software for scientific publishing like Open Journal Systems became available after 2000. This development entailed a significant expansion of non-commercial open access journals by facilitating the creation and the administration of journal websites and the digital conversion of existing journals. Among the journals registered in the Directory of Open Access Journals (DOAJ) without an article processing charge (APC), the number of annual creation has gone from 100 by the end of the 1990s to 800 around 2010, and has not evolved significantly since then.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Diamond_open_access-1.md b/data/en.wikipedia.org/wiki/Diamond_open_access-1.md
new file mode 100644
index 000000000..293ec7bc0
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Diamond_open_access-1.md
@@ -0,0 +1,23 @@
+---
+title: "Diamond open access"
+chunk: 2/5
+source: "https://en.wikipedia.org/wiki/Diamond_open_access"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:14:37.993676+00:00"
+instance: "kb-cron"
+---
+
+=== Debates over the identity of the open access commons (2003–2012) ===
+In the early debates over open access, the distinctions between commercial and non-commercial forms of scientific publishing and community-driven or corporate-owned structures seldom appear, possibly due to the lack of viable business model for open access. Open access publications were rather increasingly categorized into two different editorial forms: open access articles made immediately available by the publisher and pre-published articles hosted on an online archive (either as a pre-print or post-print). Starting in 2003, the ROMEO project started to devise a color-code system to better identify the policy of scientific publishers in regard to open sharing of scientific articles, from "yellow" (pre-print only) to "green" (no restriction in place): "the 'greenest' publishers are those that allow self-archiving not only of the author's accepted manuscript, but of the fully formatted and paginated publisher PDF". In 2004, Harnad et al. repurposed this classification scheme into a highly influential binary scale: articles directly made available by the publisher belong to "gold" open access (instead of "yellow") and online archives are defined as "green" open access. With this breakdown of open access into "green" and "gold", there is no distinction between commercial and non-commercial publishers. For Peter Suber the "gold" model embraces both journals supported by APCs or by other means of funding, as well as volunteer-run journals: "In the jargon, OA delivered by journals is called gold OA, and OA delivered by repositories is called green OA."
+Tom Wilson introduced the expression "Platinum Open Access" in 2007 following an heated debate with Stevan Harnad and other open access activists on the American Scientist Open Access Forum mailing list. On his blog, Wilson defended the necessity of enlarging the classification of open access publishing forms as well as stressed the danger of conflating commercial and non-commercial open access journals.
+
+[The "gold" and "green" classification] is not really the whole story and is in danger of perpetuating the myth that the only form of open access publishing is that made available through the commercial publishers, by author charging. This is why I distinguish between open access through author charging, which is what the Gold Route is usually promoted as being (…) and the Platinum Route of open access publishing which is free, open access to the publications and no author charges. In other words the Platinum Route is open at both ends of the process: submission and access, where as the Gold Route is seen as open only at the access end.
+The term "diamond open access" was coined later in 2012 by Marie Farge, a French mathematician and physicist and open access activist. Farge was involved in the Cost of Knowledge campaign led by Timothy Gowers against the excessive cost of scientific publishing. The reference to "diamond" was a hyperbolic pun on the "gold" metaphor that aims to suggest that non-commercial/free model were ultimately the best: "I have proposed to call this third way 'Diamond OA' by outbidding the 'Gold OA' terminology chosen by the publishers". "Free OA" was also contemplated as an alternative name.
+The Forum of Mathematics, an open access journals co-created by Timothy Gowers, was the first publication to explicitly claim to be a diamond journal: "For the first three years of the journal, Cambridge University Press will waive the publication charges. So for three years the journal will be what Marie Farge (who has worked very hard for a more rational publication system) likes to call diamond open access, a quasi-miraculous model where neither author nor reader pays anything".
+
+=== Defining the diamond model (2012–present) ===
+
+In 2013, Fuchs and Sandoval published one of the first systematic definitions of diamond open access: "Diamond open access Model, not-for-profit, non-commercial organizations, associations or networks publish material that is made available online in digital format, is free of charge for readers and authors and does not allow commercial and for-profit re-use." This definition is associated with a controversial stance against the leading definition of gold open access: "We argue for differentiating the concept of Gold Open Access Publishing because Suber and others mesh together qualitatively different models, i.e. for-profit and not-for-profit ones, into the same category, whereas others, especially policy makers, simply forget or exclude not-for-profit models that do not use author fees or reader fees." The debate over the relationship between "diamond" or "platinum" open access publications versus "Gold" open access has never settled and remains a point of contention, even after the publication of the OA Diamond Study. While valuing this study, Martin Paul Eve still considers diamond open access to be a "category error".
+Since 2013, the theoretical literature on the diamond model has been increasingly influenced by institutional analysis of the commons. Consequently, the "Open access commons" has recently emerged has an alternative label, although the term is used less as a descriptor and more as a programmatic ideal for the future of non-commercial open access. The conclusion of the OA Diamond study calls for the realization of The OA Commons as "a diverse, thriving, innovative and more interconnected and collaborative OA diamond journal ecosystem that supports bibliodiversity and serves many languages, cultures and domains in the future.". Similarly, Janneke Adema and Samuel Moore have proposed to "redefine the future of scholarly publishing in communal settings" through a "scaling small" that ensures the preservation and development of diverse editorial models.
+Analysis of the diamond model has been significantly deepened by the commission of large scale empirical studies such as the OA Cooperative Study (2016) by the Public Knowledge Project and the OA Diamond Study (2021) by the cOAlition S. Noteworthy, the 2021 study found:
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Diamond_open_access-2.md b/data/en.wikipedia.org/wiki/Diamond_open_access-2.md
new file mode 100644
index 000000000..9de8b542e
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Diamond_open_access-2.md
@@ -0,0 +1,37 @@
+---
+title: "Diamond open access"
+chunk: 3/5
+source: "https://en.wikipedia.org/wiki/Diamond_open_access"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:14:37.993676+00:00"
+instance: "kb-cron"
+---
+
+The number of Diamond OA journals is very large (>29,000), but only ~a third are registered in DOAJ, and only ~5% are indexed in either Scopus or Web of Science. Over half of these Diamond OA journals publish 25 or fewer articles per year.
+Between 2017 and 2019, paid-access journals published ~80% of all articles, paid-OA journals published ~11%, and Diamond OA journals published ~9%.
+The share of Diamond OA publications among all OA journal articles peaked in 2018 and has been declining since.
+Only 4.3% of Diamond OA journals are fully compliant with all Plan S criteria.
+Only 55% of Diamond OA journals provide DOI numbers for their articles.
+Only 25% of Diamond OA journals provide their content as XML or HTML (in addition to pdf).
+Only ~ half of Diamond OA journals provide download statistics for their content.
+2/3 of Diamond OA journals use double-blind peer review, higher than subscription journals, which prefer single-blind peer review.
+25% of Diamond OA journals operated at a loss, and just over 40% reported breaking even. The rest did not know their financial status.
+Although all Diamond OA journals rely heavily on volunteer work, they have some revenue sources, such as grants, collectively-organised funding, donations, shared infrastructure, membership fees, freemium services, etc.
+70% of Diamond OA journals declared operating costs below $/€10,000 per year. In contrast, before cancelling its subscription in 2012, Harvard alone paid $40,000 per year for just one (the most expensive) of Elsevier's journals.
+The most challenging area for Diamond OA journals is indexing and content visibility in the main research databases, such as Scopus, Web of Science, and SciFinder.
+
+== Distribution ==
+
+The OA Diamond Study gives an estimation of >29,000 diamond open access journals in 2021, which represent a significant share of the total number of scholarly journals. Diamond journals make up 73% of all open access journals registered on the Directory of Open Access Journals (DOAJ), with 10,194 entries out of 14,020 in September 2020. In 2013, Fuchs and Sandoval already noted that, as a far as the number of individual journals is concerned, diamond open access is the main form of open access publishing: "Diamond open access is not just an idea, but rather, as the empirical data provided in this paper shows, the dominant reality of open access."
+While the diamond model is prevalent among open access journals when looking at journal titles, this is not the case when looking at the aggregate number of articles, as they publish fewer articles overall. The OA Diamond Study finds that the 10,194 journals without publication fees registered on the Directory of Open Access Journals published 356,000 articles (8–9% of all scholarly articles) per year from 2017 to 2019, compared to 453,000 articles (10–11%) published by the 3,919 commercial journals with APCs. This discrepancy can mostly be attributed to a consistently lower output from diamond open access journals compared with commercial journals: "In DOAJ we find that the majority of OA diamond journals (54.4%) publish 24 or fewer articles per year; only 33.4% of APC-based journals have a similar size." Diamond journals also have a more diverse editorial production, including other forms of scholarly productions like book reviews or editorials, which may contribute to decreasing their share of the total number of research articles.
+From 2014 to 2019, the output of diamond open access journal has continued to grow in absolute terms, but has decreased relative to the output of commercial open access journals. The same period showed a significant development of APC-based large publishers as well as an increasing conversion of legacy subscription-based publishers to the commercial open access model.
+Any estimation of the number of diamond journals or articles is challenging as most non-commercial or community-run journals do not identify as diamond journals and this definition has to be deduced or reconstructed from the lack of APCs or any other commercial activity. Additionally, diamond journals more frequently struggle to be registered in academic indexes and remain largely uncharted.
+
+=== Geographic distribution ===
+
+The majority of diamond open access journals are published in Europe (around 45%) and Latin America (around 25%). In relative terms, the diamond model is especially prevalent in Latin America (95% of open access journals registered in DOAJ) and Eastern Europe (81%). In contrast with Western Europe and North America, the open access movement in Latin America was largely structured around publicly supported platforms like Redalyc or Scielo, rather than APC-based publishers:
+
+The Latin American region, as a result, owns an ecosystem characterized by the fact that "publishing" is conceived as acts of "making public", of "sharing", rather than the activity of a profit-driven publishing industry (...) Latin American academic journals are led, owned and financed by academic institutions. It is uncommon to outsource editorial processes.
+The OA Diamond Study attributes these differences to the absence of large, privately owned publishers, stating that "Most major, large commercial publishers are based in Western Europe or US/Canada, which explains some of the relative dominance of the APC-model in these regions. Without these publishers, Western Europe and US/Canada would be more similar to other regions." Additionally, Latin American journals have long been neglected in the main commercial indexes, which may have encouraged the development of local initiatives.
+The diamond model has come to embody an ideal of social justice and cultural diversity in emerging and developing countries. Diamond open access journals are more likely to be multilingual (38%): "while English is the most common language [...] Spanish, Portuguese and French play a much more important role for OA diamond journals than for APC-based ones. Generally, this holds for most languages other than English."
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Diamond_open_access-3.md b/data/en.wikipedia.org/wiki/Diamond_open_access-3.md
new file mode 100644
index 000000000..2067d40bd
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Diamond_open_access-3.md
@@ -0,0 +1,38 @@
+---
+title: "Diamond open access"
+chunk: 4/5
+source: "https://en.wikipedia.org/wiki/Diamond_open_access"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:14:37.993676+00:00"
+instance: "kb-cron"
+---
+
+=== Disciplines ===
+While diamond OA journals are available for most disciplines, they are more prevalent in the humanities and social science. The OA Diamond Study finds that, among the journals registered on the DOAJ, humanities and social science publications make up 60% of diamond open access journals and only 23.9% of APC-based journals. This distribution may be due to the differentiated evolution of scientific publishing during the 20th century, as "small HSS journals are often owned by universities and societies who often prefer OA diamond models, while many big science and medicine journals are owned by commercial publishers, more inclined to use APC models."
+However, the diamond model is still present in many disciplines, with 22.2% of diamond journals in STEM and 17.1% in Medicine. Medical diamond journals are often embedded in local communities, especially in non-western countries: "It becomes apparent that local diamond OA journals are not only important in HSS, but also in medicine."
+An additional survey led by the OA Diamond Survey of 1,619 diamond OA journals highlights a more complex disciplinary distribution: although the social sciences (27.2%) and humanities (19.2%) are well represented, more than a quarter of respondents did not favor one discipline in particular (15.1% for multidisciplinary and 12% for "other").
+
+== Organization and economics ==
+The OA Diamond Study introduced a taxonomy of 6 types of diamond OA journals based largely on their ownership status: institutional journals, learned-society journals, volunteer-run journals, publisher journals, platform journals, and large journals.
+Most diamond open access journals are managed by academic institutions, communities or platforms: "The majority of journals (42%) are owned by universities. The main alternatives are learned societies (14%) and, to a lesser extent, government agencies, university presses and individuals." This integration ensures the autonomy of the journals: they "are inherently independent from commercial publishers as they are not created by them and do not rely on them at the management level."
+The main sources of support for diamond OA journals are non-monetary: in-kind support from research institutions (such as hosting and software maintenance or copy-editing services) and voluntary contributions. Grant funding is significantly less-mentioned in surveys, possibly because it does not always ensure a regular source of support. Since the 1990s, shared platforms have become important intermediary actors for diamond journals, especially in Latin America (Redalyc, AmeliCA, ScIELO, Ariadna Ediciones) and some European countries such as France (OpenEdition Journals, via Lodel), or the Netherlands, Finland, Croatia, and Denmark (all via PKP's Open Journal System). Since the core definition of the diamond model is focused on the lack of APCs, a few diamond journals (less than 5–10% of respondents in the OA Diamond Survey) maintain commercial activities by charging for services or additional features (freemium).
+
+Operating costs of diamond journals are low: half of the 1,600 journals surveyed by the OA Diamond Study had costs below $/€1,000 per year. The median cost per articles is around $200, which is significantly lower than standard APCs for commercial open access journals. These low costs are accounted for by institutional support, limited expenses, and reliance on volunteer work: 60% of the journals surveyed in the OA Diamond Study were at least partly run by volunteers.
+The governance models of diamond journals also have an impact on their economic models. Journals embedded in academic institutions are more like to benefit from direct funding or support, whereas "journals owned by learned societies rely significantly more on membership fees". Despite these supports, a significant number of diamond journals still lack funding for their basic operations. Finally, unlike APC-funded journals, research funding organizations tend not to support diamond OA journals, though there are proposals for new direct funding mechanisms.
+
+== Issues and perspectives ==
+
+=== Apparent limitations of focus ===
+Recent discussions of diamond open access have taken an increasingly narrow focus, limiting the definition to mostly refer to journals, instead of the full range of academic texts.
+Others argue that diamond open access should be a format-agnostic concept that can include all research outputs, including long form works like book chapters and monographs, which play an important role in the Humanities and Social Sciences.
+
+=== Preservation ===
+Long-term preservation is essential for all scholarly publications, and this is being studied for diamond open access journals. Results from a survey presented in the OA Diamond Journals Study indicate that 57% of diamond OA journals have no preservation policy. While libraries have an incentive to preserve articles published by subscription-based journals to ensure their investment is not lost, there is no similar motivation for free online content.
+Efforts are underway to solve this issue, such as Project JASPER, an ongoing project of the Directory of Open Access Journals, CLOCKSS, the Internet Archive, the KEEPERS Registry, and PKP-PN; as well as the automated preservation of published articles in LOCKSS when Open Journal Systems (OJS) is used. Of the diamond journals surveyed in the OA Diamond Journals Study, 60 use this open source software application for managing and publishing.
+
+=== Recognition ===
+While diamond open access journals make up a large share of all open access publications, they have long been overlooked by scientific funding mechanisms:
+
+This reality is however not enough acknowledged and taken into account in the open access journal debate. There is a danger that Diamond open access publishers' interests are overlooked and that a corporate model of OA will shape the future of academia. We therefore argue for a shift in the debate and that policy makers should take the Diamond Model serious by providing support for it.
+The launch of the cOAlition-S initiative in 2018 made the recognition of diamond journals more pressing. Support for open access publishing would now be conditioned on adherence to a series of editorial and economic standards which some diamond journals may struggle to conform to, given their limited means. One of the final recommendations of the OA Diamond Study was a call to fully integrate Diamond journals into the Plan S strategy:
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Diamond_open_access-4.md b/data/en.wikipedia.org/wiki/Diamond_open_access-4.md
new file mode 100644
index 000000000..8b23d5bbb
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Diamond_open_access-4.md
@@ -0,0 +1,16 @@
+---
+title: "Diamond open access"
+chunk: 5/5
+source: "https://en.wikipedia.org/wiki/Diamond_open_access"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:14:37.993676+00:00"
+instance: "kb-cron"
+---
+
+Some journals argue that research funders have the responsibility to support or even favour OA diamond journals since they are often excluded from discussions on funding OA. While, the Plan S Principle 5 states that "the Funders support the diversity of business models for Open Access journals and platforms", perceptions will change once funders focus on OA diamond in addition to Gold OA and legacy publishing. This action has a significant potential to cover existing gaps in OA publishing.
+In 2020 and 2021, the institutional recognition of the diamond model has significantly progressed with unprecedented commitments from national and international organizations. The 2021 UNESCO recommendation for Open Science calls for "supporting not-for-profit, academic and scientific community-driven publishing models as a common good". The second French Plan for Open Science encouraged a "diversification of economic models" that especially highlight the diamond model as it should enable "a transition from subscription towards open access with no publishing fees". In March 2022, an Action Plan for Diamond Open Access was published with the support of the cOAlition S, Science Europe, OPERAS, and the French National Research Agency. This plan aims to "expand a sustainable, community-driven Diamond scholarly communication ecosystem." In 2024 the Toluca-Cape Town Declaration on Diamond Open Access declared that "scholarly knowledge is a public good" and that "diamond open access is driven by social justice, equity and inclusivity".
+
+== References ==
+
+== Bibliography ==
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Directory_of_Open_Access_Journals-0.md b/data/en.wikipedia.org/wiki/Directory_of_Open_Access_Journals-0.md
new file mode 100644
index 000000000..abfb30b46
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Directory_of_Open_Access_Journals-0.md
@@ -0,0 +1,48 @@
+---
+title: "Directory of Open Access Journals"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Directory_of_Open_Access_Journals"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:15:03.862197+00:00"
+instance: "kb-cron"
+---
+
+The Directory of Open Access Journals (DOAJ) is a website that hosts a community-curated list of open access journals, maintained by Infrastructure Services for Open Access (IS4OA). It was launched in 2003 with 300 open access journals and now has expanded to over 22,886 indexed open access journals and 12,746,462 articles.
+The mission of DOAJ is to "increase the visibility, accessibility, reputation, usage and impact of quality, peer-reviewed, open access scholarly research journals globally, regardless of discipline, geography or language."
+In 2015, DOAJ launched a reapplication process based on updated and expanded inclusion criteria. At the end of the process (December 2017), close to 5,000 journals, out of the 11,600 indexed in May 2016, had been removed from their database, in majority for failure to reapply. 
+Notwithstanding the substantial cleanup, the number of journals included in DOAJ has continued to grow, to reach 14,299 as of 3 March 2020. As of April 2025, the independent database contains more than 21,480 open access journals and 11,045,921 articles covering all areas of science, technology, medicine, social sciences and the humanities.
+DOAJ provides a change log on Google Sheets that has been updated since March 2014 and identifies the journals added and the journals removed with the justification for the removal.
+Founder, Lars Bjørnshauge, announced his retirement in 2021 and from January 2022, DOAJ has a new Managing Director, Joanna Ball.
+
+
+== History ==
+
+The Open Society Institute funded various open access related projects after the Budapest Open Access Initiative; the Directory was one of those projects. The idea for the DOAJ came out of discussions at the first Nordic Conference on Scholarly Communication in 2002. Lund University became the organization to set up and maintain the DOAJ. It continued to do so  until January 2013, when Infrastructure Services for Open Access (IS4OA) took over.
+The Infrastructure Services for Open Access (IS4OA) C.I.C. was founded in 2012 in the UK as a community interest company by open access advocates Caroline Sutton and Alma Swan. It runs the DOAJ and, until 2017, the Open Citations Corpus.
+In a 2015 comparison with MEDLINE, PubMed Central, EMBASE and SCOPUS, DOAJ resulted to have the highest number of open access journals listed, but less than a half of them had actively published contents on DOAJ.
+There is a partnership between DOAJ and OpenAIRE since October 2022.
+
+
+== Criteria for journals ==
+A number of criteria are used for inclusion of open access journals. This includes aspects such as:
+
+Journal can be in any language
+Must be active in publishing scholarly research (at least five articles per year)
+Actively publishing for at least one year or has published at least 10 open access articles
+Journal must have a dedicated website and open access policy
+
+
+== See also ==
+List of academic databases and search engines
+List of open-access journals
+Open Access Scholarly Publishers Association
+Free Journal Network
+Paperity - aggregator of open access journals
+
+
+== References ==
+
+
+== External links ==
+Official website
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Disciplinary_repository-0.md b/data/en.wikipedia.org/wiki/Disciplinary_repository-0.md
new file mode 100644
index 000000000..d2bc91ed3
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Disciplinary_repository-0.md
@@ -0,0 +1,37 @@
+---
+title: "Disciplinary repository"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Disciplinary_repository"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:15:05.025845+00:00"
+instance: "kb-cron"
+---
+
+A disciplinary repository (or subject repository) is an online archive, often an open-access repository, containing works or data associated with these works of scholars in a particular subject area. Disciplinary repositories can accept work from scholars from any institution. A disciplinary repository shares the roles of collecting, disseminating, and archiving work with other repositories, but is focused on a particular subject area. These collections can include academic and research papers.
+Disciplinary repositories can acquire their content in many ways. Many rely on author or organization submissions, such as SSRN. Others such as CiteSeerX crawl the web for scholar and researcher websites and download publicly available academic papers from those sites. AgEcon, established in 1995, grew as a result of active involvement of academia and societies.
+A disciplinary repository generally covers one broad based discipline, with contributors from many different institutions supported by a variety of funders; the repositories themselves are likely to be funded from one or more sources within the subject community. Deposit of material in a disciplinary repository is sometimes mandated by research funders.
+Disciplinary repositories can also act as stores of data related to a particular subject, allowing documents along with data associated with that work to be stored in the repository.
+What was believed to be the first public Workshop on Disciplinary Repositories was held on June 16 and 17, 2011, at the ACM Joint Conference on Digital Libraries in Ottawa, Ontario, Canada.
+
+
+== Importance ==
+Beyond the core functions of collecting, disseminating, and archiving scholarly works, disciplinary repositories offer significant benefits to the overall academic ecosystem. Here is a closer look at their contributions:
+
+Increased accessibility: Many research articles are published in pay-walled journals, hindering access for new researchers. Disciplinary repositories make these publications more readily available at no cost, fostering wider dissemination of knowledge.
+Enhanced discoverability: By categorizing scholarly works by subject area, disciplinary repositories enable researchers to locate relevant studies quickly and efficiently. This targeted organization streamlines the research process.
+Improved credibility: Several disciplinary repositories implement a pre-print quality control process. This initial vetting enhances the overall credibility of the hosted research materials.
+Preservation: Disciplinary repositories serve as digital archives, safeguarding valuable research from loss or degradation.
+Scholarly impact measurement: Citation data associated with publications within disciplinary repositories can be used to evaluate research impact and the contribution of individual studies to a particular field.
+In conclusion, disciplinary repositories play a vital role in promoting research, scholarship, and knowledge development across academic disciplines.
+
+
+== See also ==
+Institutional repository
+
+
+== References ==
+
+
+== External links ==
+Open Access Directory - Disciplinary repositories
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/EU_Open_Data_Portal-0.md b/data/en.wikipedia.org/wiki/EU_Open_Data_Portal-0.md
new file mode 100644
index 000000000..f38834492
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/EU_Open_Data_Portal-0.md
@@ -0,0 +1,61 @@
+---
+title: "EU Open Data Portal"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/EU_Open_Data_Portal"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:16:54.811316+00:00"
+instance: "kb-cron"
+---
+
+Before data.europa.eu, the EU Open Data Portal was the point of access to public data published by the EU institutions, agencies and other bodies. On April 21, 2021 it was announced to be merged with the European Data Portal to create a unified data.europa.eu portal.[1] 
+Public data can be used and reused for commercial or non‑commercial purposes. The portal was a key instrument of the EU open data strategy. By ensuring easy and free access to data, their innovative use and economic potential can be enhanced. The goal of the portal was also to make the institutions and other EU bodies more transparent and accountable.
+
+
+== Legal basis and launch of the portal ==
+Launched in December 2012, the portal was formally established by Commission Decision of 12 December 2011 (2011/833/EU) on the reuse of Commission documents to promote accessibility and reuse.
+Based on this decision, all the EU institutions were invited - and are still today - to publish information such as open data and to make it accessible to the public whenever possible.
+The operational management of the portal was the task of the Publications Office of the European Union. Implementation of EU open data policy was the responsibility of the Directorate General for Communications Networks, Content and Technology (DG CONNECT) of the European Commission. This is still true today with data.europa.eu.
+
+
+== Features ==
+The portal enabled users to search, explore, link, download and easily re-use data for commercial or non-commercial purposes, through a common metadata catalogue. From the portal, users could access data published on the websites of the various institutions, agencies and other bodies of the EU.
+Semantic technologies offered additional functionalities. The metadata catalogue could be searched via an interactive search engine and through SPARQL queries.
+Users could suggest data they think is missing on the portal and give feedback on the quality of data obtainable.
+The interface was in 24 EU official languages, but most metadata was available in a limited number of languages (English, French and German). Some of the metadata (e.g. names of the data providers and geographical coverage) was in 24 languages.
+
+
+== Terms of use ==
+Most of the data accessible via the EU Open Data Portal was covered by the legal notice of the Europa website. Generally, data could be used for free for commercial and non-commercial purposes, provided the source is acknowledged. Specific conditions for reuse, relating mostly to the protection of data privacy and intellectual property, applied to a small amount of data. A link to these conditions could be found for each dataset.
+The terms of use could be found on the site.  As of November 2020, most data was covered by the Creative Commons CC‑BY‑4.0 license and the site metadata by the Creative Commons CC0‑1.0 public domain waiver.
+
+
+== Available data ==
+The portal contained a very wide variety of high-value open data across EU policy domains, including the economy, employment, science, environment and education. The importance of these was confirmed by the G8 Open Data Charter.
+At the time it was merged into data.europa.eu, around 70 EU institutions, bodies or departments (e.g. Eurostat, the European Environment Agency, the Joint Research Centre and other European Commission Directorates General and EU Agencies) had made datasets available, making a total of over 13,000.
+The portal also contained a gallery of applications and a visualisations catalogue (launched in March 2018). 
+In the apps gallery users could find applications using EU data and developed by the EU institutions, agencies or other bodies or by third parties. The applications were displayed as much for their information value as for giving examples of what applications can be made using the data.
+The visualisations catalogue offered a collection of visualisation tools, training and re-usable visualisations for all levels of data visualisation expertise, from beginner to expert.
+
+
+== Architecture of the portal ==
+
+The portal was built using open source solutions such as the Drupal content management system and CKAN, the data catalogue software developed by the Open Knowledge Foundation. It used Virtuoso as an RDF database and has a SPARQL endpoint.
+Its metadata catalogue applies international standards such as: Dublin Core, the data catalogue vocabulary DCAT-AP Archived 2018-12-21 at the Wayback Machine and the Asset Description Metadata Schema (ADMS).
+To promote linked open data, the portal makes extensive use of controlled vocabularies, such as EuroVoc.
+
+
+== See also ==
+Open data
+Open Data Directive
+European Union
+European Commission
+Institutions of the European Union
+Agencies of the European Union
+
+
+== References ==
+
+
+== External links ==
+EU Open Data Portal
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Economics_of_open_data-0.md b/data/en.wikipedia.org/wiki/Economics_of_open_data-0.md
new file mode 100644
index 000000000..d0026e9f7
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Economics_of_open_data-0.md
@@ -0,0 +1,40 @@
+---
+title: "Economics of open data"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Economics_of_open_data"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:16:52.452447+00:00"
+instance: "kb-cron"
+---
+
+The economics of open data refers to the production, or loss, of wealth related to the use of open data. The cost of open data is a primary concern that can deter governments any companies from the opening up of data. While open data may theoretically have a low production cost, the cost of creating the original data set as well as maintaining that data once it is produced can be expensive. Though the creation of data may be expensive governments around the world such as France, the United States, and Japan, are anticipating substantial economic growth.
+
+
+== Open Data vs. Paid Data ==
+Open data has the capability to increase economic benefit through both individuals' and companies' use of the information. As of March 2016, it was estimated that open data was generating 0.5% more GDP compared to paid data. The creation of open data relies on either funding from government to create and maintain the data or funding in the form of grants and volunteers. Data made open by governments largely relies on the publication of public service research. Because the data has already been created for a purpose, there is no creation cost for it to be made available to the public.
+The opening of data requires current and advance technologies as well as the employment of users who are skilled enough to complete such work. When data is collected it cannot be presented to the public in its raw form and may be inaccessible due to the program is uses or how the data is presented may be unusable. Time and funding is required to be reallocated by those who create the original dataset in order to make the data more accessible and usable for citizens to understand and engage with. When government is the main source of funding for the production of data it does not necessarily mean that they are the singular entity creating or managing the data. Governments sometimes contract out the creation or management of data to a third party. In some cases the third party may provide access to the data in exchange for a nominal fee. Citizen led initiatives face similar issues, such as the requirement of time and funding. For these types of initiatives it can be especially difficult because they do not have access to a guaranteed steady income such as taxpayer money; these organizations largely depend on donations.
+Paying for the use of public data would cover some of the costs associated with creating, maintaining and formatting data, although it would reduced the economic value of what once was opened by 50%. Paying for the data that is now available for free would result in a lack of innovation, decreasing the GDP, as well as an increase in the cost of services created from use of purchased data. The opening of data reduces costs associated with licensing that is usually associated with paid data, as it costs more money to license a dataset than to have no license at all, though there are open datasets that use licensing as well. The opening of data itself does not simply create economic prosperity; systematic reforms would take place in order for open data innovations to find a place.
+
+
+== Open Data and Economic Opportunity ==
+Open data gains additional economic value when governments support open data initiatives, although increased uptake and citizen engagement is vital to the economic success of open data. Greater economic impact depends on revenue growth, cost reduction, and job creation. Revenue can be increased through the use of open data with the creation of new businesses, new good or services, or improved goods and services. When businesses profit from the creation of goods or services that rely upon open data not only does their company reap the financial benefits but the government does as well, through the increase of tax revenue. Cost reduction helps to increase revenue for private sector businesses but is also an asset to government. Cost reduction in government, whether through reduction of services required or labor requirements, reduces government spending in some areas allowing for investment in others. Open data can also increase economic benefit through the creation of jobs. Jobs can be created through innovative entrepreneurship or through the requirement of skilled labourers to use and understand data.
+
+
+== Examples of Open Data Financial Models ==
+There are numerous ways in which businesses currently support the generation, creation and upkeep of their data. In most cases businesses, or data brokers, will sell this information to third parties for a profit. As charging a fee for data would defeat the purpose of open data, governments and businesses must rely on different financial models. Normally a government or business would finance a public sector body to generate the data and profit or cost recovery would be achieved through users paying a licensing fee back to the public sector body. In turn, the profit made by the users could then be taxed and return finances to the government.
+
+
+=== Budget Financing Model ===
+Budget financing a specific amount of funds is allocated toward the open data project from general revenue. In this case the funding invested in to the project is only expected to cover the minimal costs. Businesses and Governments often expect to see a return from the opening up of data such as increased efficiency within their work environment or more positive citizen perceptions of the company or government.
+
+
+=== Community Model ===
+This model relies on individual citizens to invest their time and skills into generating and maintaining open data. A very successful example of this would be OpenStreetMap, which is continually updated and expanded by everyday users. The community model can also easily incorporate interactive benefits such as user feedback and the improvement of data quality. In this case there is a lot of room for innovation and conversation though the strong dependence on citizen engagement means individuals must actively be engaged with the data on a regular basis or risk the decrease in data quantity and quality.
+
+
+=== Advertising Model ===
+The advertising model is a popular method that can be seen across many online publications. In order to cover operating costs an open data publisher relies on revenue from advertisers. In these cases citizens are exposed to add banners and pop ups displayed on the same site as the data they are attempting to access. While this may prove sustainable for individual company websites some governments have policies against displaying advertisements on government webpages, which could prevent them from adapting such a model.
+
+
+== References ==
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Economics_of_open_science-0.md b/data/en.wikipedia.org/wiki/Economics_of_open_science-0.md
index 28d3dfb90..25c9ac5eb 100644
--- a/data/en.wikipedia.org/wiki/Economics_of_open_science-0.md
+++ b/data/en.wikipedia.org/wiki/Economics_of_open_science-0.md
@@ -4,7 +4,7 @@ chunk: 1/15
 source: "https://en.wikipedia.org/wiki/Economics_of_open_science"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T06:31:58.091431+00:00"
+date_saved: "2026-05-05T10:15:06.320001+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Economics_of_open_science-1.md b/data/en.wikipedia.org/wiki/Economics_of_open_science-1.md
index 5a358b71b..9f88b2308 100644
--- a/data/en.wikipedia.org/wiki/Economics_of_open_science-1.md
+++ b/data/en.wikipedia.org/wiki/Economics_of_open_science-1.md
@@ -4,7 +4,7 @@ chunk: 2/15
 source: "https://en.wikipedia.org/wiki/Economics_of_open_science"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T06:31:58.091431+00:00"
+date_saved: "2026-05-05T10:15:06.320001+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Economics_of_open_science-10.md b/data/en.wikipedia.org/wiki/Economics_of_open_science-10.md
index 6f863606a..826ad8d43 100644
--- a/data/en.wikipedia.org/wiki/Economics_of_open_science-10.md
+++ b/data/en.wikipedia.org/wiki/Economics_of_open_science-10.md
@@ -4,7 +4,7 @@ chunk: 11/15
 source: "https://en.wikipedia.org/wiki/Economics_of_open_science"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T06:31:58.091431+00:00"
+date_saved: "2026-05-05T10:15:06.320001+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Economics_of_open_science-11.md b/data/en.wikipedia.org/wiki/Economics_of_open_science-11.md
index 1727f95f5..43f1f6016 100644
--- a/data/en.wikipedia.org/wiki/Economics_of_open_science-11.md
+++ b/data/en.wikipedia.org/wiki/Economics_of_open_science-11.md
@@ -4,7 +4,7 @@ chunk: 12/15
 source: "https://en.wikipedia.org/wiki/Economics_of_open_science"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T06:31:58.091431+00:00"
+date_saved: "2026-05-05T10:15:06.320001+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Economics_of_open_science-12.md b/data/en.wikipedia.org/wiki/Economics_of_open_science-12.md
index 1c065ba0e..f4e9d4d1e 100644
--- a/data/en.wikipedia.org/wiki/Economics_of_open_science-12.md
+++ b/data/en.wikipedia.org/wiki/Economics_of_open_science-12.md
@@ -4,7 +4,7 @@ chunk: 13/15
 source: "https://en.wikipedia.org/wiki/Economics_of_open_science"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T06:31:58.091431+00:00"
+date_saved: "2026-05-05T10:15:06.320001+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Economics_of_open_science-13.md b/data/en.wikipedia.org/wiki/Economics_of_open_science-13.md
index 984a41acd..24a413a5f 100644
--- a/data/en.wikipedia.org/wiki/Economics_of_open_science-13.md
+++ b/data/en.wikipedia.org/wiki/Economics_of_open_science-13.md
@@ -4,7 +4,7 @@ chunk: 14/15
 source: "https://en.wikipedia.org/wiki/Economics_of_open_science"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T06:31:58.091431+00:00"
+date_saved: "2026-05-05T10:15:06.320001+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Economics_of_open_science-14.md b/data/en.wikipedia.org/wiki/Economics_of_open_science-14.md
index 085624674..74e37f4a8 100644
--- a/data/en.wikipedia.org/wiki/Economics_of_open_science-14.md
+++ b/data/en.wikipedia.org/wiki/Economics_of_open_science-14.md
@@ -4,7 +4,7 @@ chunk: 15/15
 source: "https://en.wikipedia.org/wiki/Economics_of_open_science"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T06:31:58.091431+00:00"
+date_saved: "2026-05-05T10:15:06.320001+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Economics_of_open_science-2.md b/data/en.wikipedia.org/wiki/Economics_of_open_science-2.md
index 2a4db5e79..cefcd31d0 100644
--- a/data/en.wikipedia.org/wiki/Economics_of_open_science-2.md
+++ b/data/en.wikipedia.org/wiki/Economics_of_open_science-2.md
@@ -4,7 +4,7 @@ chunk: 3/15
 source: "https://en.wikipedia.org/wiki/Economics_of_open_science"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T06:31:58.091431+00:00"
+date_saved: "2026-05-05T10:15:06.320001+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Economics_of_open_science-3.md b/data/en.wikipedia.org/wiki/Economics_of_open_science-3.md
index 52fcfd5be..fbe73e362 100644
--- a/data/en.wikipedia.org/wiki/Economics_of_open_science-3.md
+++ b/data/en.wikipedia.org/wiki/Economics_of_open_science-3.md
@@ -4,7 +4,7 @@ chunk: 4/15
 source: "https://en.wikipedia.org/wiki/Economics_of_open_science"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T06:31:58.091431+00:00"
+date_saved: "2026-05-05T10:15:06.320001+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Economics_of_open_science-4.md b/data/en.wikipedia.org/wiki/Economics_of_open_science-4.md
index 61cacfbcd..b71112e7a 100644
--- a/data/en.wikipedia.org/wiki/Economics_of_open_science-4.md
+++ b/data/en.wikipedia.org/wiki/Economics_of_open_science-4.md
@@ -4,7 +4,7 @@ chunk: 5/15
 source: "https://en.wikipedia.org/wiki/Economics_of_open_science"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T06:31:58.091431+00:00"
+date_saved: "2026-05-05T10:15:06.320001+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Economics_of_open_science-5.md b/data/en.wikipedia.org/wiki/Economics_of_open_science-5.md
index 58f650ccf..5cf45b1a3 100644
--- a/data/en.wikipedia.org/wiki/Economics_of_open_science-5.md
+++ b/data/en.wikipedia.org/wiki/Economics_of_open_science-5.md
@@ -4,7 +4,7 @@ chunk: 6/15
 source: "https://en.wikipedia.org/wiki/Economics_of_open_science"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T06:31:58.091431+00:00"
+date_saved: "2026-05-05T10:15:06.320001+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Economics_of_open_science-6.md b/data/en.wikipedia.org/wiki/Economics_of_open_science-6.md
index a1682cd84..7bf27b09a 100644
--- a/data/en.wikipedia.org/wiki/Economics_of_open_science-6.md
+++ b/data/en.wikipedia.org/wiki/Economics_of_open_science-6.md
@@ -4,7 +4,7 @@ chunk: 7/15
 source: "https://en.wikipedia.org/wiki/Economics_of_open_science"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T06:31:58.091431+00:00"
+date_saved: "2026-05-05T10:15:06.320001+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Economics_of_open_science-7.md b/data/en.wikipedia.org/wiki/Economics_of_open_science-7.md
index 48a0d47d8..9e85bb3a7 100644
--- a/data/en.wikipedia.org/wiki/Economics_of_open_science-7.md
+++ b/data/en.wikipedia.org/wiki/Economics_of_open_science-7.md
@@ -4,7 +4,7 @@ chunk: 8/15
 source: "https://en.wikipedia.org/wiki/Economics_of_open_science"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T06:31:58.091431+00:00"
+date_saved: "2026-05-05T10:15:06.320001+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Economics_of_open_science-8.md b/data/en.wikipedia.org/wiki/Economics_of_open_science-8.md
index 32b34184e..af8ba027d 100644
--- a/data/en.wikipedia.org/wiki/Economics_of_open_science-8.md
+++ b/data/en.wikipedia.org/wiki/Economics_of_open_science-8.md
@@ -4,7 +4,7 @@ chunk: 9/15
 source: "https://en.wikipedia.org/wiki/Economics_of_open_science"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T06:31:58.091431+00:00"
+date_saved: "2026-05-05T10:15:06.320001+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Economics_of_open_science-9.md b/data/en.wikipedia.org/wiki/Economics_of_open_science-9.md
index d1c502f93..1220ef009 100644
--- a/data/en.wikipedia.org/wiki/Economics_of_open_science-9.md
+++ b/data/en.wikipedia.org/wiki/Economics_of_open_science-9.md
@@ -4,7 +4,7 @@ chunk: 10/15
 source: "https://en.wikipedia.org/wiki/Economics_of_open_science"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T06:31:58.091431+00:00"
+date_saved: "2026-05-05T10:15:06.320001+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Embargo_(academic_publishing)-0.md b/data/en.wikipedia.org/wiki/Embargo_(academic_publishing)-0.md
new file mode 100644
index 000000000..5730a3717
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Embargo_(academic_publishing)-0.md
@@ -0,0 +1,48 @@
+---
+title: "Embargo (academic publishing)"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Embargo_(academic_publishing)"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:15:07.515279+00:00"
+instance: "kb-cron"
+---
+
+In academic publishing, an embargo is a period during which access to academic journals is not allowed to users who have not paid for access (or have access through their institution). The purpose of this is to ensure publishers have revenue to support their activities, although the impact of embargoes on publishers is hotly debated, with some studies finding no impact while publisher experience suggests otherwise. A 2012 survey of libraries by the Association of Learned, Professional, and Society Publishers on the likelihood of journal cancellations in cases where most of the content was made freely accessible after six months suggests there would be a major negative impact on subscriptions, but this result has been debated.
+Various types exist: 
+
+A 'moving wall' is a fixed period of months or years.
+A fixed date is a particular time point that does not change.
+A current year (or other period) is setting a time point on Jan. 1 of the current year, so that all material earlier than that is available. Although fixed during the year, it will change each year.
+
+
+== Purpose ==
+There are various purposes:
+
+For delayed open access journals, the embargo separates the most recent period, for which a subscription is needed, from an older period, where a subscription is not needed and anyone may access the article. This can range from a few months to several years.
+For self-archiving, the embargo is a period of time set by the publisher in the copyright transfer agreement where access to the archived version of the article in a digital repository is restricted until the embargo period expires. Typical embargo periods range from 6 to 24 months, though some publishers may require an embargo of up to 48 months.
+In full-text databases, such as those of EBSCO Publishing or ProQuest, it separates the most recent period, where only a title or abstract is available, from an older one, which is openly accessible.
+
+
+== Moving wall ==
+
+In academic publishing, a moving wall is the time period between the last issue of an academic journal available in a given online database and the most recently published print issue of a journal. It is specified by publishers in their license agreements with databases (like JSTOR), and generally ranges from several months to several years.
+
+
+== Sustainability of embargo periods ==
+Currently used embargo times (often 6–12 months in STEM and over 12 months in social sciences and humanities), however, do not seem to be based on empirical evidence on the effect of embargoes on journal subscriptions. In 2013 the UK House of Commons Select Committee on Business, Innovation and Skills already concluded that "there is no available evidence base to indicate that short or even zero embargoes cause cancellation of subscriptions".
+There are some data available on the median "usage half life" (the median time it takes for scholarly articles to reach half of their total downloads) and the difference therein across disciplines, but this in itself does not prove that embargo length will affect subscriptions.
+The argument that immediate self-archiving risks subscription revenue is seen as ironic where archiving of postprints is concerned. If the value publishers add to the publication process beyond peer review (e.g. in typesetting, dissemination and archiving) were worth the price asked, people would still be willing to pay for the journal even if the unformatted postprint is available elsewhere. An embargo can be seen as a statement that in fact the prices levied for individual articles through subscriptions, are not commensurate with the value added to a publication beyond organizing the peer review process.
+Publishers have, in the past, lifted embargo periods for specific research topics in times of humanitarian crises, or have been asked to do so (e.g. outbreaks of Zika and Ebola). While considered commendable in itself by scholars, this is seen as an implicit acknowledgement that embargoes stifle the progress of science and the potential application of scientific research; particularly when it comes to life-threatening pandemics. While arguably, not all research is potentially critical for saving lives, it is hard to imagine a discipline where fellow researchers and societal partners would not benefit from un-embargoed access to research findings.
+Evidence suggests that traditional journals can peacefully coexist with zero-embargo self-archiving policies, and the relative benefits to both publishers and authors via increased dissemination and citations outweigh any putative negative impacts. For publishers, the fact that most preprint repositories encourage authors to link to or upload the published version of record (VOR) is effectively free marketing for the respective journal and publisher.
+Plan S has zero-length embargoes on self-archiving as one of its key principles. Where publishers have already implemented such policies, such as the Royal Society, Sage, and Emerald, there has been no documented impact on their finances so far. In a reaction to Plan S, Highwire suggested that three of their society publishers make all author manuscripts freely available upon submission and state that they do not believe this practice has contributed to subscription decline. Therefore there is little evidence or justification supporting the need for embargo periods.
+
+
+== See also ==
+Copyright policies of academic publishers
+
+
+== Notes ==
+
+
+== References ==
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/EmissionML-0.md b/data/en.wikipedia.org/wiki/EmissionML-0.md
new file mode 100644
index 000000000..2e0311763
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/EmissionML-0.md
@@ -0,0 +1,19 @@
+---
+title: "EmissionML"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/EmissionML"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:16:53.680528+00:00"
+instance: "kb-cron"
+---
+
+EmissionML (Emission Event Modeling Language) is an open and interoperable ontology and data model standard developed by the Open Geospatial Consortium (OGC) to enable consistent representation, sharing, and integration of emission event data across sectors and technologies. It provides a machine-readable, spatio-temporal data model for describing the release of pollutants into the atmosphere, making it easier to trace, audit, and reconcile emission reports with observational data sources.
+
+
+== History ==
+EmissionML was initiated in 2024 through the formal proposal of an OGC Standards Working Group (SWG). The proposal is open for public comment in August 2024. The EmissionML SWG was officially launched at the 132nd OGC Technical Committee meeting in Mérida, Mexico, in June 2025.
+In June 2025, the EmissionML SWG also welcomed collaboration with the Open Footprint Forum, with both standards seen as complementary: EmissionML focuses on raw emission event modeling, while the Open Footprint Data Model emphasizes corporate footprint accounting.
+
+
+== References ==
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Eprint-0.md b/data/en.wikipedia.org/wiki/Eprint-0.md
new file mode 100644
index 000000000..dc4e7b61c
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Eprint-0.md
@@ -0,0 +1,30 @@
+---
+title: "Eprint"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Eprint"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:15:08.666512+00:00"
+instance: "kb-cron"
+---
+
+In academic publishing, an eprint or e-print is a digital version of a research document (usually a journal article, but could also be a thesis, conference paper, book chapter, or a book) that is accessible online, usually as green open access, whether from a local institutional or 
+a central digital repository.
+When applied to journal articles, the term "eprints" covers both preprints (before peer review) and postprints (after peer review).
+Digital versions of materials other than research documents are not usually called e-prints, but some other name, such as e-books.
+
+
+== See also ==
+Electronic article
+Electronic journal
+Electronic publishing
+Open access
+Open science
+
+
+== References ==
+
+
+== External links ==
+What is an eprint? as defined in the FAQ section of eprints.org
+Eprints as defined by Stevan Harnad
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/European_Data_Portal-0.md b/data/en.wikipedia.org/wiki/European_Data_Portal-0.md
new file mode 100644
index 000000000..9ad6738f8
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/European_Data_Portal-0.md
@@ -0,0 +1,48 @@
+---
+title: "European Data Portal"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/European_Data_Portal"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:16:56.026214+00:00"
+instance: "kb-cron"
+---
+
+The European Data Portal is a web portal providing open data published by EU Institutions, national portals of EU Member states and non-member states, as well as international organisations of predominantly European scope, launched on April 21, 2021. The portal consolidates datasets previously available via the EU Open Data Portal and the European Data Portal into a single meta-catalogue. The European Data Portal, launched in its beta version on November 16, 2015, was an initiative of the European Commission, and part of the Digital Single Market.
+Currently, more than 1,600,000 datasets are published on the portal, originating from 178 catalogues. The portal is a metadata catalogue: in it, metadata from other data and geospatial data catalogues are published following a common ontology, namely the DCAT Application Profile for data portals in Europe (DCAT-AP) with the aim of fostering and facilitating re-use of open data, promotion and support for the publication of (meta)data of high quality and use of Linked Open Data.
+The contents of the portal are available in all 24 EU Official Languages and can be freely re-used for any purpose as Open Data, following the specific license terms of datasets.
+
+
+== Legal basis ==
+Directive 2003/98/EC on the re-use of public sector information set the path for both EU and member state portals.
+Decision 2006/291/EC on the reuse of Commission documents provided the rules for the opening of the European Commission's data for re-use and was later amended by Commission Decision 2011/833/EU, which committed to making data available in machine-readable formats and established the creation of an EU Open Data Portal, publishing data from all EU Institutions, agencies and bodies.
+In 2013, Directive 2013/37/EU and later Directive (EU) 2019/1024, revising the 2003 Directive, established that public sector information shall be available to public for free or at a very low cost by default.
+Alongside these Directives, in 2007 the INSPIRE Directive (Directive 2007/2/EC) defined an Infrastructure for Spatial Information in the European Community. The Directive sets forth, via a series of implementing rules, standards for making geo-spatial data interoperable and re-usable among member states and the geo-spatial data community. Many of the geodata portals harvested by data.europa.eu were first created in keeping with the Directive.
+The portal is funded by the EU and managed by the Publications Office of the European Union. The Directorate-General for Communications Networks, Content and Technology of the European Commission is responsible for the implementation of EU open data policy, in collaboration with the project's management.
+The delivery of the portal is contracted to a consortium of organisations led by Capgemini Invent, including Agiledrop, con terra, Data Excellence, Fraunhofer FOKUS, INTRASOFT International, OMMAX, the Lisbon Council and Timelex.
+
+
+== Features ==
+The portal allows users to access datasets originating from various catalogues, view metadata assessment reports and explore links to similar datasets. Datasets can be viewed as web-pages or as RDF linked data in any of the 24 EU official languages.
+In addition to datasets, the portal contains editorial articles related to open data, such as data-stories, news articles, studies and reports. In this latter category, the Open Data Maturity report, a yearly study assessing the level of open-data maturity of member states and EFTA countries, can be found.
+The data.academy section promotes (open) data literacy by providing free access to courses, videos and learning tools related to themes such as open data licensing, linked open data, data visualisation and more.
+A dedicated section offers links to external sources re-using the data, for example for building of dedicated apps.
+An API and SPARQL endpoints foster access to metadata in machine-readable format.
+
+
+== Architecture of the portal ==
+In keeping with EU requirements, the portal is built using open-source solutions as much as possible. For example, it uses Drupal as its editorial content management system. Virtuoso is used as a triplestore for the linked-data database, also offering a SPARQL endpoint. Custom software was written ad hoc when a suitable open-source solution could not be found.
+Because all metadata is stored using DCAT-AP, specific open-source solutions were developed by the portal to map data from portals using different data-models (e.g. INSPIRE-CSW, CKAN).
+
+
+== Terms of use ==
+Most of the data accessible via data.europa.eu is released by the respective data providers using an open licence. For the most part, data can be used for free for commercial and non-commercial purposes, provided the source is acknowledged. Specific conditions for reuse, relating mostly to the protection of data privacy and intellectual property, apply to a small amount of data. A link to these conditions can be found on every dataset page.
+Unless otherwise specified, editorial content published on the portal is released under a Creative Commons 'CC‑BY‑4.0' licence. The portal's copyright notice provides additional information on the terms of use.
+As of September 2021, the most common open licences used for contents of the portal are the Creative Commons 'CC‑BY‑4.0' licence, the 'Data licence Germany – attribution' licence or Etalab's Open Licence (used by the French government).
+
+
+== References ==
+
+
+== External links ==
+data.europa.eu
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/European_Genome-phenome_Archive-0.md b/data/en.wikipedia.org/wiki/European_Genome-phenome_Archive-0.md
new file mode 100644
index 000000000..83dc3d580
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/European_Genome-phenome_Archive-0.md
@@ -0,0 +1,27 @@
+---
+title: "European Genome-phenome Archive"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/European_Genome-phenome_Archive"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:16:57.213474+00:00"
+instance: "kb-cron"
+---
+
+European Genome-phenome Archive (EGA) is a repository for human biomolecular and phenotypic data in the United Kingdom and Spain. It involves the secure storage of all potentially identifiable genetic data, phenotypic and clinical data generated by biomedical research programs. 
+As of March 2022, it stores and harvest data regarding over 4,500 research studies from over 1,000 institutions worldwide.
+
+
+== History ==
+EGA was launched in 2008 by the European Molecular Biology Laboratory’s European Bioinformatics Institute (EMBL-EBI) to support the voluntary archiving and dissemination of human genomic data requiring secure storage and distribution only to authorized researchers in a manner that "respects the consent agreements signed by the study subjects."  Later, the EGA has expanded its scope of collaboration with the Centre for Genomic Regulation (CRG) in Barcelona.
+
+
+=== Controlled access ===
+It offers the essential security required to regulate access, safeguard patient confidentiality, and provide access to those researchers and clinicians authorized to view controlled access data. Nevertheless, decisions about data access are not made by the EGA but rather by the appropriate data access-granting organization (DAO).
+
+
+== External links ==
+ega-archive.org
+
+
+== References ==
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/FORCE11-0.md b/data/en.wikipedia.org/wiki/FORCE11-0.md
new file mode 100644
index 000000000..ca6048e92
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/FORCE11-0.md
@@ -0,0 +1,34 @@
+---
+title: "FORCE11"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/FORCE11"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:15:12.231934+00:00"
+instance: "kb-cron"
+---
+
+FORCE11 is an international coalition of researchers, librarians, publishers and research funders working to reform or enhance the research publishing and communication system. Initiated in 2011 as a community of interest on scholarly communication, FORCE11 is a registered 501(c)(3) organization based in the United States but with members and partners around the world. Key activities include an annual conference, the Scholarly Communications Institute and a range of working groups.
+
+
+== History ==
+FORCE11 grew out of the FORC Workshop held in Dagstuhl, Germany in August 2011. This meeting resulted in the collaborative creation of a white paper which summarized the problems of scholarly communication and proposed a vision to address them.
+
+
+== Activities ==
+Through various working groups FORCE11 has undertaken a range of activities to improve the standards, interoperability and functionality of digital research communications and developed various statements on principles and policies for best practice. These include:
+
+FAIR Data Principles: The development of a set of principles based on making data Findable, Accessible, Interoperable, and Reusable (FAIR)
+Research Resource Identification Initiative (RRID): supporting new guidelines and identifiers in biomedical publications
+Joint Declaration of Data Citation Principles (JDDCP): intended to help achieve widespread, uniform human and machine accessibility of deposited data through data citation
+Software citation principles
+
+
+== See also ==
+Australian Open Access Strategy Group Archived 2018-02-10 at the Wayback Machine (AOASG)
+Coalition for Networked Information (CNI)
+Open Access Scholarly Publishers Association (OASPA)
+Scholarly Publishing and Academic Resources Coalition (SPARC)
+
+
+== References ==
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/FactGrid-0.md b/data/en.wikipedia.org/wiki/FactGrid-0.md
new file mode 100644
index 000000000..e9a9752d6
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/FactGrid-0.md
@@ -0,0 +1,40 @@
+---
+title: "FactGrid"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/FactGrid"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:16:58.410814+00:00"
+instance: "kb-cron"
+---
+
+FactGrid provides projects with historical research interests with a collectively organized Wikibase graph database, allowing them to interconnect research data both on the platform and into external research repositories. Persistent identifiers for previously undocumented objects of research, the use of competing ontologies, the multilingual availability of data sets, and the long-term maintenance of the data sets, which remain collectively editable, are key features of the platform.
+The platform was initiated by Olaf Simons in 2017/2018 in a cooperation between the Gotha Research Centre of the University of Erfurt and Wikimedia Germany. It is currently (as of January 2026) supported by around 700 users in about 50 projects ranging from Assyriology to Contemporary History. The number of 1 million database items was passed on 13 October 2024. All services are free and - since March 2023 financed by the NFDI4Memory consortium of the German National Research Data Infrastructure (NFDI). The Historical Data Center of Saxony-Anhalt and the NFDI4Memory data connectivity team working there under Katrin Moeller are playing a key role in project support since April 2023.
+
+
+== Technical basis ==
+FactGrid uses Wikidata's Wikibase software without major modifications of the user interface (Horace-Bénédict de Saussure's Cyanometer is providing the logo motif). The instance is set up without a docker image. The MediaWiki platform includes a WordPress blog as well as the FactGrid Viewer developed by Bruno Belhoste, (a tool similar to Magnus Manske's Reasonator) which presents database information in structured compilations in a direct communication with the database. The FactGrid Viewer offers the special service of fusing transcript pages from MediaWiki text pages into the information stored on the Wikibase items.
+
+
+== Development ==
+The main incentive to create FactGrid as a sister platform to Wikidata was in 2016/2017 the idea of a platform that will focus on “original research” and that will work without further “notability criteria” - solely organised by the scientific community. Wikimedia projects would be able, so the aim, to cite FactGrid data as “externally published” together with information about the FactGrid projects and teams that produced these data. The greater freedom that is granted to users on FactGrid is balanced by the greater transparency under which user are acting on the platform: The use of registered real name accounts is mandatory and all projects are requested to state their research interests with data they are generating on the platform.
+First reservations about the risks of data theft and plagiarism on the openly visible platform have lost their initial importance. FactGrid data are CC0 licensed and open to any download, while they come with research metadata which can be easily quoted in external presentations; the platform is thus an interesting tool to move fresh data into public reception. The software, so the corresponding awareness, does not incite edit wars. Allowing the display of conflicting information with the respective sources Wikibase is rather an interesting medium to map complex data situations.
+The resource's size growth was approximately 100,000 database objects annually between 2018 and 2023. The current growth rate appears to be growing to 200,000 database objects per year with tailwind of the ongoing the NFDI process in Germany. The database fully supports the four languages of the bigger user groups: German, French, Spanish and English. Most of the properties are also available in Hungarian and Chinese.
+
+
+== Applications ==
+FactGrid Wikibase Datenbase: https://database.factgrid.de/wiki/Main_Page
+FactGrid Viewer: https://database.factgrid.de/viewer/
+Project blog: https://blog.factgrid.de/
+Projekt space: https://database.factgrid.de/wiki/FactGrid:Projects
+Sample queries https://database.factgrid.de/wiki/FactGrid:Sample_queries
+
+
+== Further reading ==
+Charles B. Faulhaber/ Óscar Perea Rodríguez, PhiloBiblon as a Digital Tool for Historians of Medieval Iberia, UC Berkeley. http://dx.doi.org/10.21001/itma.2023.16.15 Retrieved from https://escholarship.org/uc/item/9t279727
+Olaf Simons: Keine Selbstverständlichkeit: Citizen Science auf der FactGrid Wikibase-Plattform, in: René Smolarski/ Hendrikje Carius/ Martin Prell, Citizen Science in den Geschichtswissenschaften (Göttingen, 2023), S. 241–264. Google books
+Olaf Simons: Stadtgeschichte im digitalen Zeitalter – Der FactGrid-Gotha-Datens(ch)atz, in: Moderne Stadtgeschichte(n) und ihre Perspektiven, hrsg. von Alexander Krünes (Leipzig. Leipziger Universitätsverlag, 2023), S. 103–120.
+Patricia García Sánchez-Migallón: FactGrid, una base de datos para datos históricos, y su relación con Philobiblon, in Janus: Estudios sobre el Siglo de Oro, 5. Juni 2023. https://www.janusdigital.es/articulo.htm?id=244
+
+
+== References ==
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Fair_Access_to_Science_and_Technology_Research_Act-0.md b/data/en.wikipedia.org/wiki/Fair_Access_to_Science_and_Technology_Research_Act-0.md
new file mode 100644
index 000000000..fc71b9b9a
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Fair_Access_to_Science_and_Technology_Research_Act-0.md
@@ -0,0 +1,50 @@
+---
+title: "Fair Access to Science and Technology Research Act"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Fair_Access_to_Science_and_Technology_Research_Act"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:15:09.839963+00:00"
+instance: "kb-cron"
+---
+
+The Fair Access to Science and Technology Research Act (FASTR) is a bill in the United States that would mandate earlier public release of taxpayer-funded research. The bill has been introduced in 2013, 2015, and 2017. Sen. Ron Wyden (D-Ore.) and Sen. John Cornyn (R-Texas) introduced the Senate version, while the bill was introduced to the House by Reps. Zoe Lofgren (D-Calif.), Mike Doyle (D-Penn.) and Kevin Yoder (R-Kans.).  The bill is a successor to the Federal Research Public Access Act (FRPAA), which had been introduced in 2006, 2010, and 2012.
+Senator Wyden advocated for the passage of the bill by arguing that "taxpayer funded research should never be hidden behind a paywall."
+FASTR has been described as "The Other Aaron's Law", named for open-access activist Aaron Swartz who died in a dramatic case in support of open access research in January 2013.
+The Senate Committee on Homeland Security and Governmental Affairs unanimously approved the bill on July 29, 2015.  It was the first time that the bill or any of its predecessors had gained committee approval and been forwarded to a full house of Congress.
+The bill is often compared to and discussed in conjunction with the Public Access to Public Science (PAPS) Act, also introduced in 2013.
+As of 2024 the bill has not been enacted, partially due to lobbying by anti-open access publishers and trade groups such as Elsevier and the Association of American Publishers.
+
+
+== Executive action ==
+Days after FASTR was introduced in 2013, the Executive Branch's Office of Science and Technology Policy (OSTP) issued a memorandum that "hereby directs each Federal agency with over $100 million in annual conduct of research and development expenditures to develop a plan to support increased public access to the results of research funded by the Federal Government."  The change was in part prompted by an online Whitehouse petition to "Require free access over the Internet to scientific journal articles arising from taxpayer-funded research."
+
+
+== See also ==
+Open access
+Academic journal publishing reform
+Serials crisis
+Open, Public, Electronic and Necessary Government Data Act (OPEN)
+
+
+== References ==
+
+
+== External links ==
+Senate version of FASTR (2015)
+Congress.gov
+House version H.R. 1477 (2015)
+Congress.gov
+Senate version of FASTR (2013)
+Congress.gov
+GovTrack.us
+OpenCongress
+THOMAS
+House version H.R. 708 (2013)
+Congress.gov
+GovTrack.us
+OpenCongress
+PopVox
+THOMAS
+Notes on the Fair Access to Science and Technology Research Act. From the Harvard Open Access Project.
+FAQ on FASTR from the Scholarly Publishing and Academic Resources Coalition (SPARC)
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Federal_Research_Public_Access_Act-0.md b/data/en.wikipedia.org/wiki/Federal_Research_Public_Access_Act-0.md
new file mode 100644
index 000000000..43b149069
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Federal_Research_Public_Access_Act-0.md
@@ -0,0 +1,64 @@
+---
+title: "Federal Research Public Access Act"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Federal_Research_Public_Access_Act"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:15:11.008884+00:00"
+instance: "kb-cron"
+---
+
+The Federal Research Public Access Act (FRPAA) is a proposal to require open public access to research funded by eleven U.S. federal government agencies.  It was originally proposed by Senators John Cornyn and Joe Lieberman in 2006 and then again in 2010, and then once more in 2012.
+A later version of the bill, the Fair Access to Science and Technology Research Act, was introduced in 2013,  2015 and 2017.
+
+
+== Provisions of bill ==
+The FRPAA would require that those eleven agencies with research expenditures over $100 million, create online repositories of journal articles of the research completed by that agency and make them publicly available. They must be maintained and preserved by the agency, or another repository that permits free and open access. It must be available to users without charge within six months after it has been published in a peer-reviewed journal.
+The agencies included in this bill are: 
+
+Department of Agriculture
+Department of Commerce
+Department of Defense
+Department of Education
+Department of Energy
+Department of Health and Human Services
+Department of Homeland Security
+Department of Transportation
+Environmental Protection Agency
+National Aeronautics and Space Administration
+National Science Foundation
+
+
+== Legislative history ==
+
+
+== Reaction ==
+
+
+=== Support ===
+In addition to Senator John Cornyn and Senator Joe Lieberman, Representative Michael F. Doyle, along with Frederick Boucher, Michael Capuano, Jerry Costello, Bill Foster, Barney Frank, Gregg Harper, Paul Hodes, Tim Holden, Dennis Kucinich, Rick Larsen, Zoe Lofgren, Stephen Lynch, Dana Rohrabacher, Fortney Stark, Debbie Wasserman Schultz, and Henry Waxman have co-sponsored a similar bill in the House of Representatives (H.R. 5037).
+As of July 19, 2010, 120 Higher Education Leaders support this bill.
+On March 28, 2012, 52 Nobel Laureates signed an open letter to the US Congress expressing their support for this bill.
+
+
+=== Opposition ===
+The Association of American Publishers opposes the bill on behalf of 81 scholarly publishing organizations alleging that the bill forces the same deadline for disciplines in which that deadline is burdensome, limits the options of government-funded researchers, forces a change in publishers' business models, and will create a cost burden on federal agencies.
+
+
+== See also ==
+Open access mandate
+NIH Public Access Policy
+Fair Copyright in Research Works Act
+Research Works Act
+
+
+== References ==
+
+
+== Further reading ==
+As COMPETES Act Is Signed into Law, 'Wait-and-See' Is the Attitude on Further OA Legislation
+NEW ENGLAND UNIVERSITY PRESIDENTS BACK BILL FOR PUBLIC ACCESS Archived 2011-10-01 at the Wayback Machine
+Open Letter on Open Access | Inside Higher Ed
+Scientists Embrace Openness
+Times Higher Education, "Learning to share"
+White House Signals Interest in Open Access with Public Call for Comments
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Free_Journal_Network-0.md b/data/en.wikipedia.org/wiki/Free_Journal_Network-0.md
new file mode 100644
index 000000000..995f17ce9
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Free_Journal_Network-0.md
@@ -0,0 +1,42 @@
+---
+title: "Free Journal Network"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Free_Journal_Network"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:15:13.437879+00:00"
+instance: "kb-cron"
+---
+
+The Free Journal Network is an index of open access scholarly journals, specifically for those that do not charge article processing charges.
+
+
+== Criteria ==
+The network founded in early 2018 in order to promote free, open access journals, 
+a publishing model that is sometimes called diamond or platinum open access.
+Such journals are typically smaller than equivalent commercial journals (often supported by academic societies). Main criteria include: adherence to the Fair Open Access Principles that are publicly supported by many renowned scientists, publication of article titles and abstracts in English, clear publication ethics and quality assurance policies.
+
+
+== FJN Member Journals ==
+As of November 2024, there are 90 journals that have been accepted into the Free Journal Network. Some notable examples include:
+
+Discrete Analysis
+European Journal of Taxonomy
+Glossa
+Journal of Open Source Software
+Journal of Political Ecology
+Norwegian Journal of Geology
+SciPost Physics
+Volcanica
+
+
+== See also ==
+Directory of Open Access Journals
+Open Access Scholarly Publishers Association
+
+
+== References ==
+
+
+== External links ==
+Official website
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Grey_Literature_Network_Service-0.md b/data/en.wikipedia.org/wiki/Grey_Literature_Network_Service-0.md
new file mode 100644
index 000000000..63e957cf8
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Grey_Literature_Network_Service-0.md
@@ -0,0 +1,76 @@
+---
+title: "Grey Literature Network Service"
+chunk: 1/3
+source: "https://en.wikipedia.org/wiki/Grey_Literature_Network_Service"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:15:16.890978+00:00"
+instance: "kb-cron"
+---
+
+GreyNet International, the Grey Literature Network Service, is an independent organization founded in 1992. It is dedicated to research, publication, open access, education, and bringing public awareness to grey literature. Grey literature is often defined as "Information produced and distributed on all levels of government, academics, business and industry in electronic and print formats not controlled by commercial publishing i.e. where publishing is not the primary activity of the producing body."
+GreyNet is corporate author of the Proceeding issues from the International Conference Series on Grey Literature, The Grey Journal, An International Journal on Grey Literature, as well as other types of publications such as reports, program books, and newsletters. GreyNet also maintains a Listserv and a presence on a number of social media including LinkedIn, Netvibes, Twitter, and Facebook.
+GreyNet is a not for profit organization fostering the production and dissemination of scientific literature. It is also engaged in the open source movement and was invited to the 10th Libre Software Meeting 2009 in Nantes, France, with a communication on knowledge sharing in the field of grey literature.
+During the 11th International Conference on Grey Literature in December 2009, GreyNet signed a Partnership Agreement with ICSTI, International Council for Scientific and Technical Information. This newly established partnership lends to GreyNet a multilateral base, elevating it from a bilateral one that it already shares with a number of ICSTI Members. GreyNet seeks to provide ICSTI with an opportunity to further broaden its information activities to the social sciences and humanities.
+
+== International Conference Series on Grey Literature (ISSN 1386-2316) ==
+Source:
+
+1993 GL1 Amsterdam, “GL’93, Weinberg Report 2000” hdl:10068/698053
+1995 GL2 Washington D.C. ”GL’95, Grey Exploitations in the 21st Century” hdl:10068/698012
+1997 GL3 Luxembourg, “GL’97, Perspectives on the Design and Transfer of STI” hdl:10068/697932
+1999 GL4 Washington D.C., “GL’99, New Frontiers in Grey Literature” hdl:10068/697891
+2003 GL5 Amsterdam, “Grey Matters in the World of Networked Information”  hdl:10068/697754
+2004 GL6 New York, “Work on Grey in Progress” hdl:10068/697756
+2005 GL7 Nancy, France “Open Access to Grey Resources” hdl:10068/697757
+2006 GL8 New Orleans, “Harnessing the Power of Grey” hdl:10068/697758
+2007 GL9 Antwerp, “Grey Foundations in Information Landscape” hdl:10068/697759
+2008 GL10 Amsterdam, “Designing the Grey Grid for Information Society” hdl:10068/697786
+2009 GL11 Washington D.C., “The Grey Mosaic: Piecing It All Together”
+2010 GL12 Prague, "Transparency in Grey Literature, Grey Tech Approaches to High Tech Issues"
+2011 GL13 Washington D.C., "The Grey Circuit, From Social Networking to Wealth Creation", Library of Congress, December 5–6
+2012 GL14 Rome, Italy, "Tracking Innovation through Grey Literature", National Research Council, CNR, November 29–30
+2013 GL15 Bratislava, Slovak Republic, "The Grey Audit, A Field Assessment in Grey Literature", December 2–3
+2014 GL16 Washington D.C. “Grey Literature Lobby, Engines and Requesters for Change”, December 8–9
+2015 GL17 Amsterdam, “A New Wave of Textual and Non-Textual Grey Literature”, December 1–2
+
+== Other publications ==
+
+The Grey Journal, TGJ an International Journal on Grey Literature (ISSN print 1574–1796, ISSN e-print 1574-180X) was launched in 2005. It is the only journal on the topic. It appears three times a year in thematic issues, published in print and electronic formats. Articles from the electronic version at an article level are available via EBSCO’s LISTA-FT Database (EBSCO Publishing). The Grey Journal is indexed by the Scopus scientific database and other Indexing and abstracting services.
+The Grey Journal, International Journal on Grey Literature
+
+TGJ Volume 12, Number 1, Spring 2016 Mining Textual and Non-Textual Data Sources
+TGJ Volume 12, Number 2, Summer 2016 Convergence and Change in Grey Literature
+TGJ Volume 11, Number 1, Spring 2015 Raising Awareness to Grey Literature
+TGJ Volume 11, Number 2, Summer 2015 Publishing, Licensing, and Open Access
+TGJ Volume 11, Number 3, Autumn 2015 Topical and Technical Advances in Grey Literature
+TGJ Volume 10, Number 1, Spring 2014 Sustaining Good Practices in Grey Literature
+TGJ Volume 10, Number 2, Summer 2014 Research Communities And Data Sharing
+TGJ Volume 10, Number 3, Autumn 2014 Weighing up Public Access to Grey Literature
+TGJ Volume 9,  Number 1, Spring 2013 Adapting New Technologies for Grey Literature
+TGJ Volume 9,  Number 2, Summer 2013 Tracking Grey Literature Across Disciplines
+TGJ Volume 9,  Number 3, Autumn 2013 Improving Grey Literature through Innovation
+TGJ Volume 8,  Number 1, Spring 2012 Social Networking and Grey Literature
+TGJ Volume 8,  Number 2, Summer 2012 Data Frontiers in Grey Literature
+TGJ Volume 8,  Number 3, Autumn 2012 Managing Change in Grey Literature
+TGJ Volume 7,  Number 1, Spring 2011 Transparency in Grey Literature
+TGJ Volume 7,  Number 2, Summer 2011 System Approaches to Grey Literature
+TGJ Volume 7,  Number 3, Autumn 2011 Research and Education in Grey Literature
+TGJ Volume 6,  Number 1, Spring 2010 Government Alliance to Grey Literature
+TGJ Volume 6,  Number 2, Summer 2010 Shared Strategies for Grey Literature
+TGJ Volume 6,  Number 3, Autumn 2010 Research on Grey Literature in Europe
+TGJ Volume 5,  Number 1, Spring 2009 Paperless Initiatives for Grey Literature
+TGJ Volume 5,  Number 2, Summer 2009 Archaeology and Grey Literature
+TGJ Volume 5,  Number 3, Autumn 2009 Trusted Grey Sources and Resources
+TGJ Volume 4,  Number 1, Spring 2008 Praxis and Theory in Grey Literature
+TGJ Volume 4,  Number 2, Summer 2008 Access to Grey in a Web Environment
+TGJ Volume 4,  Number 3, Autumn 2008 Making Grey more Visible
+TGJ Volume 3,  Number 1, Spring 2007 Grey Standards in Transition and Use
+TGJ Volume 3,  Number 2, Summer 2007 Academic and Scholarly Grey
+TGJ Volume 3,  Number 3, Autumn 2007 Mapping Grey Resources
+TGJ Volume 2,  Number 1, Spring 2006 Grey Matters for OAI
+TGJ Volume 2,  Number 2, Summer 2006 Collections on a Grey Scale
+TGJ Volume 2,  Number 3, Autumn 2006 Using Grey to Sustain Innovation
+TGJ Volume 1,  Number 1, Spring 2005 Publish Grey or Perish
+TGJ Volume 1,  Number 2, Summer 2005 Repositories – Home2Grey
+TGJ Volume 1,  Number 3, Autumn 2005 Grey Areas in Education
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Grey_Literature_Network_Service-1.md b/data/en.wikipedia.org/wiki/Grey_Literature_Network_Service-1.md
new file mode 100644
index 000000000..b49dfc66d
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Grey_Literature_Network_Service-1.md
@@ -0,0 +1,44 @@
+---
+title: "Grey Literature Network Service"
+chunk: 2/3
+source: "https://en.wikipedia.org/wiki/Grey_Literature_Network_Service"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:15:16.890978+00:00"
+instance: "kb-cron"
+---
+
+== GreyNet and the OpenGrey Repository ==
+For the past 20 years, GreyNet has sought to serve researchers and authors in the field of grey literature. To further this end, GreyNet has signed on to the OpenGrey repository and in so doing seeks to preserve and make openly available research results originating in the International Conference Series on Grey Literature. GreyNet together with INIST-CNRS have designed the format for a metadata record, which encompasses standardized PDF attachments of the full-text conference preprints, PowerPoint presentations, abstracts, biographical notes, and post-publication commentaries. GreyNet's collection of over 270 conference preprints is both current and comprehensive.
+Comment from Peter Suber, Open Access News (Thursday, January 29, 2009): GreyNet started making its conference proceedings OA through its repository in May 2008. I applaud its determination to complete the collection retroactively, even if it means buying permission from a publisher. Note to other conference organizers: This is a reason to self-archive your proceedings as you go, or at least to retain the right to self-archive them without a fee.
+
+== GreyNet and the DANS Data Archive ==
+GreyNet’s research data is cross-linked to the corresponding conference preprint in OpenGrey via the DANS Data Archive. In this way these results can further serve the international grey literature community, where open access to research data has now become a prerequisite.
+
+== Web-based Resources in Grey Literature: GreySource, GreyText, IDGL, WHOIS ==
+GreySource provides examples of grey and malin-grey literature to the average net-user (see for instance the University of Queensland list) and in so doing profiles organizations responsible for its production and/or processing. Only web-based resources that explicitly refer to the term grey literature (or its equivalent in any language) are listed. GreySource identifies the hyperlink directly embedded in a resource, thus allowing immediate and virtual exposure to grey literature.
+The web-based resources appear within categories derived from the COSATI (American) and SIGLE (European) Classification Systems. The few changes that have been introduced into the classification scheme are intended to facilitate the search and retrieval of net-users. New examples are welcome and will be indexed in GreySource.
+GreyText is an inhouse archive of documents on grey literature. Over 125 documents are indexed by first author followed by the title, source, date of publication and length in printed pages. Free Access to the first page of each document is available for browsing. Also, in most cases the corresponding PowerPoint is online available. The full-text of all documents listed in GreyText are accessible in PDF via email on demand.
+IDGL, International Directory of Organizations in Grey Literature provides a list of some 170 organizations in more than 30 countries worldwide that are currently associated with GreyNet either via partnership, membership, sponsorship, or authorship in the field of grey literature. Entries are alphabetical by country and each entry has an embedded link to the corresponding organization's website. GreyNet International is proud in serving the grey literature community and welcomes additions and revisions to this Directory.
+WHOIS in the Field of Grey Literature is a compilation of over 350 biographical notes provided by authors in the International Conference Series on Grey Literature and The Grey Journal. This online resource is maintained by TextRelease, the Program and Conference Bureau. Records in this directory appear in alphabetical order by last name of author and each record contains a current email address.
+
+== Education and Curriculum Development ==
+In 2007 GreyNet conducted an international survey on grey literature among instructors in LIS higher education. In the same year, GreyNet implemented a distant education course on grey literature at the University of New Orleans. hdl:10068/697878
+In 2009, GreyNet began the Annual Summer Workshop Series "GreyWorks", which is now in its 7th year. This series seeks to highlight the state of the art in grey literature by focussing on strategies, benchmarks, good practices, mind maps, and transparency in this field of library and information science.
+In 2013, GreyNet began the GreyForum, a Thematic Series of Onsite and Online Seminars and Workshops in the field of Grey Literature. Topics dealt with in this series have dealt with information ethics, information rights, digital preservation, and policy development
+
+== GreyNet management ==
+The GreyNet Service is powered by TextRelease, an independently owned company based in Amsterdam. TextRelease is GreyNet's program and conference bureau. TextRelease was responsible for GreyNet’s relaunch in 2003. Content compiled and edited within the Grey Literature Network Service is published and marketed via TextRelease. Furthermore, all agreements, contracts, and legal matters pertaining to GreyNet are handled through TextRelease.
+
+== TextRelease and GreyNet ==
+TextRelease’s main activities are situated in the domain of grey scientific & technical information. Conference organization, information consultancy, research, publication, as well as education and training in the field of grey literature are among its objectives. In this capacity, TextRelease re-launched the Grey Literature Network Service (GreyNet) in 2003 and is the publishing body for this organization.
+TextRelease provides Non-Exclusive Rights Agreements to nearly 300 authors, who have published in its Conference Proceedings and/or International Journal on Grey Literature. It maintains an up-to-date ”WHOIS in the field of Grey Literature”.
+The TextRelease website likewise serves as the conference site for the International Conference Series on Grey Literature. This web resource provides direct access to Conference Announcements, Call-for-Papers, Official Programs, Registrations, and other conference related information.
+
+== See also ==
+European Association for Grey Literature Exploitation (EAGLE)
+Grey Literature International Steering Committee (GLISC)
+OpenSIGLE
+System for Information on Grey Literature in Europe (SIGLE)
+Grey literature
+GreyNet LinkedIn Group
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Grey_Literature_Network_Service-2.md b/data/en.wikipedia.org/wiki/Grey_Literature_Network_Service-2.md
new file mode 100644
index 000000000..f9be3d999
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Grey_Literature_Network_Service-2.md
@@ -0,0 +1,32 @@
+---
+title: "Grey Literature Network Service"
+chunk: 3/3
+source: "https://en.wikipedia.org/wiki/Grey_Literature_Network_Service"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:15:16.890978+00:00"
+instance: "kb-cron"
+---
+
+== Bibliography ==
+Farace D. & Schöpfel J. (eds.) (2010). Grey Literature in Library and Information Studies. De Gruyter Saur.
+Schöpfel J., Stock C., Farace D.J., Frantzen J. Citation Analysis and Grey Literature: Stakeholders in the Grey Circuit. The Grey Journal 2005, vol. 1, n° 1, p. 31-40. hdl:10068/697850
+Farace D., Frantzen J., Schöpfel J., Stock C.  Knowledge Generation in the Field of Grey Literature: A Review of Conference-based Research Results]. GL8 Conference Proceedings. Eighth International Conference on Grey Literature: Harnessing the Power of Grey. New Orleans, 4–5 December 2006. hdl:10068/697768
+Farace D.J., Frantzen J., Schöpfel J., Stock C.  Grey Literature: A Pilot Course constructed and implemented via Distance Education. The Grey Journal 2008, vol. 4, n° 1, p. 41-45. hdl:10068/697878
+Farace D., Frantzen J., Schöpfel J., Stock C., Henrot N. OpenSIGLE, Home to GreyNet’s Research Community and its Grey Literature Collections: Initial Results and a Project Proposal. GL10 Conference Proceedings. Tenth International Conference on Grey Literature: Designing the Grey Grid for Information Society. Amsterdam, 8–9 December 2008.
+Gelfand J. Interview with Dominic Farace, founder of GreyNet. International Journal on Grey Literature. 2000, vol. 1, n° 2, p. 73-76. Covers how Dominic Farace, the GreyNet director, first became involved in the grey literature scene, and explains how and why the Grey Literature Network Service has developed. Discusses the future prospects of GreyNet and grey literature. Highlights many of the issues concerning the GreyNet movement and looks at Farace’s inspiration for his career therein.
+Canadian Dental Hygienists Association Staff: Grey Literature Archived 2013-05-15 at the Wayback Machine. May 2006.  (GreyNet Listserv) is an internationally moderated list that seeks to facilitate communication between organizations involved in the field of grey literature. It also provides an extensive listing of resources by category.
+Matthews B. Gray literature. Association of College and Research Libraries (ACRL) wiki. June 2007.  Citation: (The GreyNet Listserv) international moderated list seeks “to facilitate dialog and communication between persons and organisations in the field of grey literature.” In addition to the electronic lists, the site includes information about the International Conference Series on Grey Literature and provides an extensive categorical listing of resources.
+J. Schöpfel & D. J. Farace (2010). `Grey Literature'. In M. J. Bates & M. N. Maack (eds.), Encyclopedia of Library and Information Sciences, Third Edition, pp. 2029–2039. CRC Press.
+Grey literature Archived 2011-07-06 at the Wayback Machine. Internet News wiki. April 2007. Citation: GreyNet facilitates the study and collection of grey literature through its source index and text archive. The GreyText Archive has articles about grey literature.
+
+== References ==
+
+== External links ==
+GreyNet
+TextRelease
+INIST-CNRS
+ICSTI
+Victorine van Schaick Prijs NVB 2008 Archived 2021-03-06 at the Wayback Machine
+Golden Candle Award 2000
+New York Academy of Medicine
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Guerilla_Open_Access_Manifesto-0.md b/data/en.wikipedia.org/wiki/Guerilla_Open_Access_Manifesto-0.md
new file mode 100644
index 000000000..4c864b148
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Guerilla_Open_Access_Manifesto-0.md
@@ -0,0 +1,25 @@
+---
+title: "Guerilla Open Access Manifesto"
+chunk: 1/2
+source: "https://en.wikipedia.org/wiki/Guerilla_Open_Access_Manifesto"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:15:18.127551+00:00"
+instance: "kb-cron"
+---
+
+The Guerilla Open Access Manifesto is a document published by (and widely attributed to) Aaron Swartz in 2008 that argues for transgressive approaches to achieving the goals of the open access movement through civil disobedience, willful violation of copyright and contracts that restrict redistribution of knowledge, and activities that exist in legal grey areas. 
+The goal of the open access movement taken up by the manifesto include the removal of barriers and paywalls that prohibit the general public from accessing scientific research publications and other forms of data. While most of the open access movement has focused on standing up new open access publishers, working with traditional publishers to switch to open access, and organizing scholars who produce and edit articles, these focuses primarily affect the accessibility of future publications. The manifesto is largely concerned with the existing proprietary articles and data that are unlikely to be released as open access by the current copyright holders.
+The manifesto appears to have been written in 2008 at a meeting of librarians and was subsequently published on Swartz's personal blog. Although the authorship of the document is widely attributed to Swartz, his role in writing the manifesto and the degree to which the manifesto reflected his views, especially several years later, were a contentious issue in United States v. Swartz, the US government's legal proceedings against him several years later. US government prosecutors sought to use the manifesto to argue that Swartz engaged in the mass downloading of articles from JSTOR for the purpose of releasing those articles freely to the public in ways that mirror the manifesto's penultimate sentence saying, "we need to download scientific journals and upload them to file sharing networks."
+
+== Background and context ==
+Prior to the publication of the Manifesto, Swartz had been active in the open source software, free culture, and the open access movements, such as working as an early contributor to Creative Commons, a web organization devoted to ensuring open access to a variety of different materials that would have otherwise been copyrighted. Other work includes his early programming contributions to Open Library, an organization attempting to create a comprehensive online library containing information on every book. Months before publishing the Manifesto, in 2008, Swartz worked to make thousands of federal court documents from the PACER electronic document systems available to public for free.
+
+== Analysis of content ==
+The manifesto opens with the statement that "Information is Power", and makes the case that access to knowledge is a human right. It focuses on the availability of scientific and scholarly work online, and argues for the importance of making scholarly work widely available, along with removing existing barriers to access. The Manifesto identifies restrictions to information availability as a serious problem facing both the academic community and the world at large, and criticizes both the copyright laws that have led to paywalls, along with the corporate influences and perceived greed that have supported the development of legislature supporting this. The Manifesto mentions one publisher by name: Reed Elsevier, a publisher whose articles covering a breadth of topics are hidden behind a paywall, which the author condemns as unethical. The manifesto frames one of the goals of the Open Access movement as ensuring that academics publishing their work can make it available to everyone and not be hindered by these restrictions. Additionally, the manifesto addresses the role of privilege in impacting who does and does not have access to many of these information repositories, calling attention to existing socioeconomic divides that contribute to these inequities in information availability. The Manifesto serves as a call to action, and argues that making scholarly information widely available online is a moral imperative. In order to do so, it advocates for proponents of open access to engage in civil disobedience and condones the violation of copyright law in order to make scholarly work widely available.
+
+== Repercussions and impact ==
+The open access manifesto played an important role in United States v. Swartz. In the case, the US government claimed that Swartz had violated federal laws by downloading large number of academic articles from the JSTOR academic article storage systems via the open MIT computer network. In 2013, the U.S. Secret Service released a portion of their almost 15,000 page file on Swartz, detailing their investigation of his home and chronicling the questions asked of him about the Manifesto's "human rights" applications. Swartz was facing up to 50 years in prison if found guilty of the charges against him, and remained under investigation until his eventual suicide in 2013. 
+
+Some activists claim that Swartz was unsuccessful in achieving the specific goals he outlined in his Manifesto. The JSTOR collection acquired by Swartz was never released to public domain. Moreover, other open access activists have spoken out against the illegal activities the Manifesto called for as counterproductive to the movement's aims. In general, open access approaches have advocated for the liberation of scholarly information through legal means. Some critics of the GOA movement claim to support civil disobedience, but do not support the specific tactics called for in the manifesto. They believe the responsibility to change belongs to policymakers and scientists.
+However, the symbolic ideas Swartz introduced through his Manifesto were effective in incentivizing others to take up the mantle of the open access (OA) movement. Today, many sites that once used paywalls are freely available thanks to the actions of OA activists following in Swartz's footsteps. One such activist, Alexandra Elbakyan, furthered Swartz's mission by developing an online repository she dubbed "Sci-Hub" that provides free access to over 74 million scientific journal articles. Elbakyan has been identified as a Guerilla Open Access (GOA) activist because of the transgressive and illegal practices she engages in.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Guerilla_Open_Access_Manifesto-1.md b/data/en.wikipedia.org/wiki/Guerilla_Open_Access_Manifesto-1.md
new file mode 100644
index 000000000..1657b0fad
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Guerilla_Open_Access_Manifesto-1.md
@@ -0,0 +1,35 @@
+---
+title: "Guerilla Open Access Manifesto"
+chunk: 2/2
+source: "https://en.wikipedia.org/wiki/Guerilla_Open_Access_Manifesto"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:15:18.127551+00:00"
+instance: "kb-cron"
+---
+
+== Text of the Manifesto ==
+Information is power. But like all power, there are those who want to keep it for themselves. The world's entire scientific and cultural heritage, published over centuries in books and journals, is increasingly being digitized and locked up by a handful of private corporations. Want to read the papers featuring the most famous results of the sciences? You'll need to send enormous amounts to publishers like Reed Elsevier.
+There are those struggling to change this. The Open Access Movement has fought valiantly to ensure that scientists do not sign their copyrights away but instead ensure their work is published on the Internet, under terms that allow anyone to access it. But even under the best scenarios, their work will only apply to things published in the future. Everything up until now will have been lost.
+That is too high a price to pay. Forcing academics to pay money to read the work of their colleagues? Scanning entire libraries but only allowing the folks at Google to read them? Providing scientific articles to those at elite universities in the First World, but not to children in the Global South? It's outrageous and unacceptable.
+"I agree," many say, "but what can we do? The companies hold the copyrights, they make enormous amounts of money by charging for access, and it's perfectly legal — there's nothing we can do to stop them." But there is something we can, something that's already being done: we can fight back.
+Those with access to these resources — students, librarians, scientists — you have been given a privilege. You get to feed at this banquet of knowledge while the rest of the world is locked out. But you need not — indeed, morally, you cannot — keep this privilege for yourselves. You have a duty to share it with the world. And you have: trading passwords with colleagues, filling download requests for friends.
+Meanwhile, those who have been locked out are not standing idly by. You have been sneaking through holes and climbing over fences, liberating the information locked up by the publishers and sharing them with your friends.
+But all of this action goes on in the dark, hidden underground. It's called stealing or piracy, as if sharing a wealth of knowledge were the moral equivalent of plundering a ship and murdering its crew. But sharing isn't immoral — it's a moral imperative. Only those blinded by greed would refuse to let a friend make a copy.
+Large corporations, of course, are blinded by greed. The laws under which they operate require it — their shareholders would revolt at anything less. And the politicians they have bought off back them, passing laws giving them the exclusive power to decide who can make copies.
+There is no justice in following unjust laws. It's time to come into the light and, in the grand tradition of civil disobedience, declare our opposition to this private theft of public culture.
+We need to take information, wherever it is stored, make our copies and share them with the world. We need to take stuff that's out of copyright and add it to the archive. We need to buy secret databases and put them on the Web. We need to download scientific journals and upload them to file sharing networks. We need to fight for Guerilla Open Access.
+With enough of us, around the world, we'll not just send a strong message opposing the privatization of knowledge — we'll make it a thing of the past.
+Will you join us?
+Aaron Swartz
+
+July 2008, Eremo, Italy
+Source:
+
+== See also ==
+Anna's Archive
+
+== External links ==
+The Guerilla Open Access Manifesto on the Internet Archive
+
+== References ==
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/History_of_open_access-0.md b/data/en.wikipedia.org/wiki/History_of_open_access-0.md
new file mode 100644
index 000000000..69e23d9d4
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/History_of_open_access-0.md
@@ -0,0 +1,17 @@
+---
+title: "History of open access"
+chunk: 1/4
+source: "https://en.wikipedia.org/wiki/History_of_open_access"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:14:35.524738+00:00"
+instance: "kb-cron"
+---
+
+The history of open access can be traced back to at least the 1950's, with an explosion of interest following the advent of the internet. The idea and practise of providing free online access to journal articles began at least a decade before the term "open access" was formally coined. Computer scientists had been self-archiving in anonymous ftp archives since the 1970s and physicists had been self-archiving in arXiv since the 1990s. The Subversive Proposal to generalize the practice was posted in 1994.
+The term "open access" itself was first formulated in three public statements in the 2000s: the Budapest Open Access Initiative in February 2002, the Bethesda Statement on Open Access Publishing in June 2003, and the Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities in October 2003, and the initial concept of open access refers to an unrestricted online access to scholarly research primarily intended for scholarly journal articles.
+
+== Efforts before the Internet ==
+
+One early proponent of the publisher-pays model was the physicist Leó Szilárd. To help stem the flood of low-quality publications, he jokingly suggested in the 1940s that at the beginning of his career each scientist should be issued with 100 vouchers to pay for his papers. Closer to the present, but still ahead of its time, was Common Knowledge. This was an attempt to share information for the good of all, the brainchild of Brower Murphy, formerly of The Library Corporation. Both Brower and Common Knowledge are recognised in the Library Microcomputer Hall of Fame. One of Mahatma Gandhi's earliest publications, Hind Swaraj published in Gujarati in 1909 is recognised as the intellectual blueprint of India's freedom movement. The book was translated into English the next year, with a copyright legend that read "No Rights Reserved".
+The modern open-access movement (as a social movement) traces its history at least back to the 1950s, with the Letterist International (LI) placing anything in their journal Potlatch in the public domain. As the LI merged to form the Situationist International, Guy Debord wrote to Patrick Straram "All the material published by the Situationist International is, in principle, usable by everyone, even without acknowledgement, without the preoccupations of literary property." This was to facilitate détournement. It became much more prominent in the 1990s with the advent of the Digital Age. With the spread of the Internet and the ability to copy and distribute electronic data at no cost, the arguments for open access gained new importance.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/History_of_open_access-1.md b/data/en.wikipedia.org/wiki/History_of_open_access-1.md
new file mode 100644
index 000000000..533b7e6de
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/History_of_open_access-1.md
@@ -0,0 +1,25 @@
+---
+title: "History of open access"
+chunk: 2/4
+source: "https://en.wikipedia.org/wiki/History_of_open_access"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:14:35.524738+00:00"
+instance: "kb-cron"
+---
+
+== Early years of online open access ==
+An explosion of interest and activity in open-access journals has occurred since the 1990s, largely due to the widespread availability of Internet access. It is now possible to publish a scholarly article and also make it instantly accessible anywhere in the world where there are computers and Internet connections. The fixed cost of producing the article is separable from the minimal marginal cost of the online distribution.
+These new possibilities emerged at a time when the traditional, print-based scholarly journals system was in a crisis. The number of journals and articles produced had been increasing at a steady rate; however the average cost per journal had been rising at a rate far above inflation for decades, and budgets at academic libraries have remained fairly static. The result was decreased access – ironically, just when technology has made almost unlimited access a very real possibility, for the first time. Libraries and librarians have played an important part in the open-access movement, initially by alerting faculty and administrators to the serials crisis. The Association of Research Libraries developed the Scholarly Publishing and Academic Resources Coalition (SPARC), in 1997, an alliance of academic and research libraries and other organizations, to address the crisis and develop and promote alternatives, such as open access.
+The first online-only, free-access journals (eventually to be called "open-access journals") began appearing in the late 1980s and early 1990s. These journals typically used pre-existing infrastructure (such as e-mail or newsgroups) and volunteer labor and were developed without any intent to generate profit. Examples include Bryn Mawr Classical Review, Postmodern Culture, Psycoloquy, and The Public-Access Computer Systems Review.
+Probably the earliest book publisher to provide open access was the National Academies Press, publisher for the National Academy of Sciences, Institute of Medicine, and other arms of the National Academies. They have provided free online full-text editions of their books alongside priced, printed editions since 1994, and assert that the online editions promote sales of the print editions. As of June 2006 they had more than 3,600 books up online for browsing, searching, and reading.
+While Editor-in-Chief of the Journal of Clinical Investigation, Ajit Varki made it the first major biomedical journal to be freely available on the web in 1996. Varki wrote, "The vexing issue of the day is how to appropriately charge users for this electronic access. The nonprofit nature of the JCI allows consideration of a truly novel solution — not to charge anyone at all!" Other pioneers in open-access publishing in the biomedical domain included BMJ, Journal of Medical Internet Research, and Medscape, who were created or made their content freely accessible in the late 1990s.
+The first free scientific online archive was arXiv.org, started in 1991, initially a preprint service for physicists, initiated by Paul Ginsparg. Self-archiving has become the norm in physics, with some sub-areas of physics, such as high-energy physics, having a 100% self-archiving rate. The prior existence of a "preprint culture" in high-energy physics is one major reason why arXiv has been successful. arXiv now includes papers from related disciplines including computer science, mathematics, nonlinear sciences, quantitative biology, quantitative finance, and statistics. However, computer scientists mostly self-archive on their own websites and have been doing so for even longer than physicists. arXiv now includes postprints as well as preprints. The two major physics publishers, American Physical Society and Institute of Physics Publishing, have reported that arXiv has had no effect on journal subscriptions in physics; even though the articles are freely available, usually before publication, physicists value their journals and continue to support them.
+Computer scientists had been self-archiving on their own FTP sites and then their websites since even earlier than the physicists, as was revealed when Citeseer began harvesting their papers in the late 1990s. Citeseer is a computer science archive that harvests, Google-style, from distributed computer science websites and institutional repositories, and contains almost twice as many papers as arXiv. The 1994 "Subversive Proposal" was to extend self-archiving to all other disciplines; from it arose CogPrints (1997) and eventually the OAI-compliant generic GNU Eprints.org software in 2000.
+One of the first online journals, GeoLogic, Terra NOVA, was published by Paul Browning and started in 1989. It was not a discrete journal but an electronic section of TerraNova. The journal ceased to be open access in 1997 due to a change in the policy of the editors (EUG) and publishing house (Blackwell).
+In 1997, the U.S. National Library of Medicine (NLM) made Medline, the most comprehensive index to medical literature on the planet, freely available in the form of PubMed. Usage of this database increased a tenfold when it became free, strongly suggesting that prior limits on usage were impacted by lack of access. While indexes are not the main focus of the open-access movement, Medline is important in that it opened up a whole new form of use of scientific literature – by the public, not just professionals. The Journal of Medical Internet Research (JMIR), one of the first open-access journals in medicine, was created in 1998, publishing its first issue in 1999.
+In 1998, the American Scientist Open Access Forum was launched (and first called the "September98 Forum"). One of the more unusual models is used by the Journal of Surgical Radiology, which uses the net profits from external revenue to provide compensation to the editors for their continuing efforts.
+In the biological and geological sciences, paleontology came into the forefront in 1998 with Palaeontologia electronica, Their first issue received 100,000 hits from an estimated 3,000 readers, comparable to the subscription numbers of their peer print journals. One challenge to digital-only biological journals was the lack of protection afforded by the International Code of Zoological Nomenclature to scientific names published in formats other than paper, but this was overcome by revisions to the Code in 1999 (effective 1 January 2000).
+One of the first humanities journals published in open access is CLCWeb: Comparative Literature and Culture founded at the University of Alberta in 1998 with its first issue published in March 1999 and since 2000 published by Purdue University Press.
+In 1999 Harold Varmus of the NIH proposed a journal called E-biomed, intended as an open-access electronic publishing platform combining a preprint server with peer-reviewed articles. E-biomed later saw light in a revised form as PubMed Central, a postprint archive.
+It was also in 1999 that the Open Archives Initiative and its OAI-PMH protocol for metadata harvesting was launched to make online archives interoperable.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/History_of_open_access-2.md b/data/en.wikipedia.org/wiki/History_of_open_access-2.md
new file mode 100644
index 000000000..8e9002ede
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/History_of_open_access-2.md
@@ -0,0 +1,20 @@
+---
+title: "History of open access"
+chunk: 3/4
+source: "https://en.wikipedia.org/wiki/History_of_open_access"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:14:35.524738+00:00"
+instance: "kb-cron"
+---
+
+== 2000s ==
+The number of open-access journals increased by an estimated 500% during the 2000s. Also, the average number of articles that were published per open-access journal per year increased from approximately 20 to 40 during the same period, resulting in that the number of open-access articles increased by 900% during that decade.
+In 2000 BioMed Central, a for-profit open access publisher with now dozens of open-access journals, was launched by what was then the Current Science Group (the founder of the Current Opinion series, and now known as the Science Navigation Group). In some ways, BioMed Central resembles Harold Varmus' original E-biomed proposal more closely than does PubMed Central. As of October 2013 BioMed Central publishes over 250 journals.
+In 2001, 34,000 scholars around the world signed "An Open Letter to Scientific Publishers", calling for "the establishment of an online public library that would provide the full contents of the published record of research and scholarly discourse in medicine and the life sciences in a freely accessible, fully searchable, interlinked form". Scientists signing the letter also pledged not to publish in or peer-review for non-open-access journals. This led to the establishment of the Public Library of Science, an advocacy organization. However, most scientists continued to publish and review for non-open-access journals. PLoS decided to become an open-access publisher aiming to compete at the high quality end of the scientific spectrum with commercial publishers and other open-access journals, which were beginning to flourish. Critics have argued that, equipped with a $10 million grant, PLoS competes with smaller open-access journals for the best submissions and risks destroying what it originally wanted to foster. PLOS launched its first open-access journal, PLOS Biology in 2003, with PLOS Medicine following in 2004, and PLOS One in 2006.
+The first major international statement on open access was the Budapest Open Access Initiative in February 2002, launched by the Open Society Institute. Two further statements followed: the Bethesda Statement on Open Access Publishing in June 2003 and the Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities in October 2003. Also in 2003, the World Summit on the Information Society included open access in its Declaration of Principles and Plan of Action.
+In 2006 a Federal Research Public Access Act was introduced in US Congress by senators John Cornyn and Joe Lieberman. The act continues to be brought up every year since then, but has never made it past committee.
+2007 recorded some backlash from non-OA publishers.
+In 2008 Ajit Varki worked with publisher Cold Spring Harbor Laboratory Press and David Lipman to create the first viable model for a major Open Access textbook combining a print version with a freely accessible online edition hosted at NCBI, the 2nd edition of  Essentials of Glycobiology.
+Perhaps the first dedicated publisher of open-access monographs in the humanities was re.press who published their first title in that 2006. Two years later in 2008 Open Humanities Press, another publisher of humanities monographs, was launched. Most recently, the Open Library of Humanities launched in September 2015.
+In 2008 USENIX, the advanced computing systems association, implemented an open access policy for their conference proceedings. In 2011 they added audio and video recordings of paper presentations to the material to which they provide open access.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/History_of_open_access-3.md b/data/en.wikipedia.org/wiki/History_of_open_access-3.md
new file mode 100644
index 000000000..668751ccb
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/History_of_open_access-3.md
@@ -0,0 +1,40 @@
+---
+title: "History of open access"
+chunk: 4/4
+source: "https://en.wikipedia.org/wiki/History_of_open_access"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:14:35.524738+00:00"
+instance: "kb-cron"
+---
+
+== 2010s ==
+In 2013 John Holdren, Barack Obama's director of the Office of Science and Technology Policy, issued a memorandum directing United States' Federal Agencies with more than $100 million in annual R&D expenditures to develop plans within six months to make the published results of federally funded research freely available to the public within one year of publication. As of March 2015, two agencies had made their plans public: the Department of Energy and the National Science Foundation.
+In 2013 the UK Higher Education Funding Council for England (HEFCE) proposed adopting a mandate that to be eligible for submission to the UK Research Excellence Framework (REF) all peer-reviewed journal articles submitted after 2014 must be deposited in the author's institutional repository immediately upon acceptance for publication, regardless of whether the article is published in a subscription journal or in an open access journal. HEFCE expresses no journal preference, places no restriction on authors' choice and requires the deposit itself to be immediate, irrespective of whether the publisher imposes an embargo (for an allowable embargo period that remains to be decided) on the date at which access to the deposit can be made open. The HEFCE/REF mandate proposal complements the recent Research Councils UK (RCUK) mandate that requires all articles resulting from RCUK funding to be made open access by 6 months after publication at the latest (12 months for arts and humanities articles).
+HEFCE also provided grants to universities in England wishing to participate in the Pilot Collection of Knowledge Unlatched, a not-for-profit organisation enabling humanities and social sciences monographs to become open access. The Pilot Collection ran from October 2013 to February 2014 and 297 libraries and institutions worldwide participated in 'unlatching' the collection of 28 titles. 61 of these participating institutions were university libraries in England eligible for the HEFCE grant of 50% towards the $1195 participation fee.
+The Indian Council of Agricultural Research had adopted an Open Access policy for its publications on 13 September 2013 and announced that each ICAR institute would set-up an open access institutional repository. One such repository is eprints@cmfri, an open access institutional repository of the Central Marine Fisheries Research Institute which was set-up on 25 February 2010 well before the policy was adopted. However, since March 2010, the ICAR is making available its two flagship journals under Open Access on its website and later through an online platform called Indian Agricultural Research Journals using Open Journal Systems. However, not all the journals hosted in the platform are open access.
+In 2014 the Department of Biotechnology and Department of Science and Technology, under Ministry of Science and Technology, Government of India jointly announced their open access policy.
+In May 2016 the European Union announced that "all scientific articles in Europe must be freely accessible as of 2020" and that the Commission will "develop and encourage measures for optimal compliance with the provisions for open access to scientific publications under Horizon 2020".
+Some ask such measures to include the usage of free and open-source software.
+By March 2018, a search of MEDLINE indicated that ~21% of all human/animal articles indexed are available freely through PubMed Central, or directly from the journal.  Within veterinary medicine specifically, research indicates the number is higher, at ~27%.
+In September 2018 eleven European funders, organized under cOAlition S, announced Plan S, which requires all research output based on funding from these organizations to be published in full Open Access journals, disallowing publishing in hybrid journals.
+
+== 2020s ==
+On 25 August 2022, the US Office of Science and Technology Policy issued guidance to make all federally funded research in the United States freely available.
+
+== Growth statistics ==
+
+A study on the development of publishing of open access journals from 1993 to 2009 published in 2011 suggests that, measured both by the number of journals as well as by the increases in total article output, direct gold open access journal publishing has seen rapid growth particularly between the years 2000 and 2009. It was estimated that there were around 19,500 articles published open access in 2000, while the number has grown to 191,850 articles in 2009. The journal count for the year 2000 is estimated to have been 740, and 4769 for 2009; numbers which show considerable growth, albeit at a more moderate pace than the article-level growth. These findings support the notion that open access journals have increased both in numbers and in average annual output over time.
+The development of the number of active open-access journals and the number of research articles published in them during the period 1993–2009 is shown in the figure above. If these gold open access growth curves are extrapolated to the next two decades, the Laakso et al. (Björk) curve would reach 60% in 2022, and the Springer curve would reach 50% in 2029 as shown in the figure below (the reference provides a more optimistic interpretation which does not match with the values shown in the figure).
+
+== See also ==
+Open data
+Timeline of the open-access movement
+
+== References ==
+
+=== Works cited ===
+Suber, Peter (2012). Open access (The MIT Press Essential Knowledge Series ed.). Cambridge, Mass.: MIT Press. ISBN 978-0-262-51763-8. Retrieved 20 October 2015.
+
+== External links ==
+Peter Suber. "History of open access". Harvard University. Compilation of  Peter Suber's contributions to the history of open access, 1992–present.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Hybrid_open-access_journal-0.md b/data/en.wikipedia.org/wiki/Hybrid_open-access_journal-0.md
new file mode 100644
index 000000000..a7e9ba367
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Hybrid_open-access_journal-0.md
@@ -0,0 +1,60 @@
+---
+title: "Hybrid open-access journal"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Hybrid_open-access_journal"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:15:19.297877+00:00"
+instance: "kb-cron"
+---
+
+A hybrid open-access journal is a subscription journal in which some of the articles are open access. This status typically requires the payment of a publication fee (also called an article processing charge or APC) to the publisher in order to publish an article open access, in addition to the continued payment of subscriptions to access all other content. Strictly speaking, the term "hybrid open-access journal" is incorrect, possibly misleading, as using the same logic such journals could also be called "hybrid subscription journals". Simply using the term "hybrid access journal" is accurate.
+Publishers that offer a hybrid open access option often use different names for it. The SHERPA/RoMEO site provides a list of publishers and the names of their options. The Open Access Directory provides a list of funds that support open access journals, and provides information about which funds will pay fees for hybrid open access journals.
+
+
+== Origins ==
+
+The concept was first proposed in 1998 when Thomas Walker suggested that authors could purchase extra visibility at a price. The first journal recognized as using this model was Walker's own Florida Entomologist; it was later extended to the other publications of the Entomological Society of America.  The idea was later refined by David Prosser in 2003 in the journal Learned Publishing. The larger academic publishers began offering hybrid open access journals around the same time, with Springer and Wiley both having started by 2005. Within two years, Elsevier, Taylor & Francis and the Nature Publishing Group had followed suit.
+
+
+== Gradual uptake of hybrid open access ==
+The early uptake of hybrid open access was slow, and differed between countries. A study in 2012 noted that "The number of hybrid journals has doubled in the past couple of years and is now over 4,300, "but concluded that there was "lack of success of this business model", with only 1 to 2% of researchers making use of it. However, the United Kingdom was a notable front runner in using the model, "its use of OA in hybrid journals and of delayed OA journals is more than twice the world average". Growth slowly continued, and a 2018 large-scale survey of open access business models across global scholarly publishing estimated that between 3 and 8% of articles were published via hybrid open access. Research carried out a year later indicated that Hybrid Open Access had actually peaked around 2016.
+
+
+== Criticism ==
+While hybrid Open Access began as an agreed method amongst publishers, scientists and libraries for a gradual transition towards full Open Access, it soon attracted various criticisms for being unfair.
+
+
+=== Allegations of double dipping ===
+Since one source of funds to pay for open access articles is the library subscription budget, it has been proposed that there needs to be a decrease in the subscription cost to the library in order to avoid 'double dipping' where an article is paid for twice – once through subscription fees, and again through an APC. For example, the Open Access Authors Fund of the University of Calgary Library (2009/09) requires that: "To be eligible for funding in this [hybrid open access] category, the publisher must plan to make (in the next subscription year) reductions to the institutional subscription prices based on the number of open-access articles in those journals." On 12 November 2009, Nature Publishing Group issued a news release on how open access affected its subscription prices.
+However, university libraries were unconvinced that the decrease in prices was occurring. A report on work carried out by the University of Nottingham since 2006 to introduce and manage an institutional open access fund has been published by Stephen Pinfield in Learned Publishing. In this article, the author comments that: "As publishers' income has increased from OA [open-access] fees in the hybrid model, there has been little or no let-up in journal subscription inflation, and only a small minority of publishers have yet committed to adjusting their subscription prices as they receive increasing levels of income from OA options." By 2018, this particular problem was considered so extreme in the area of open access book (as opposed to journal) publishing that the Anti Double Dipping Alliance was formed.
+
+
+== Institutional responses ==
+Towards the start of Hybrid Open Access, some universities, research centers, foundations, and government agencies designated funds to pay publication fees (APCs) of fee-based open access journals, including hybrid. However, as criticism of hybrid has grown, a substantial number of such funds (40%) will not reimburse APCs in hybrid journals, including Harvard University, CERN, Deutsche Forschungsgemeinschaft, Columbia University and the Norwegian Research Council. The European Commission has also announced that the ninth framework program (Horizon Europe) will not cover the cost of APCs in hybrid journals. Science Europe has set up a coalition of European research funders (cOAlition S) who have explicitly ruled out reimbursing APCs in hybrid journals from 2020 with the express aim of driving a more rapid transition towards full open access (see transformative journal).
+Publishers have argued against the above criticisms and responses, arguing that hybrid "as successfully meet[s] market demands and foster[s] growth in open access publishing."
+
+
+== Advantages and disadvantages to the author ==
+An author who wants to publish in an open-access format is not limited to the relatively small number of "full" open-access journals, but can also choose from the available hybrid open-access journals, which include journals published by many of the largest academic publishers.
+However, the author must still find the money. Many funding agencies are ready to let authors use grant funds, or apply for supplementary funds, to pay publication fees at open-access journals.  (Only a minority of open-access journals charge such fees, but nearly all hybrid open access journals do so.)  So far, the funding agencies that are willing to pay these fees do not distinguish between full and hybrid open-access journals. On 19 October 2009, one such funding agency, the Wellcome Trust, expressed concerns about hybrid open-access fees being paid twice, through subscriptions and through publication fees.
+If an author is unable to pay the fees or chooses not to do so, they often retain the right to share their work online by self-archiving in an open access repository.
+
+
+== Variations ==
+The American Society of Plant Biologists has adopted a policy that articles contributed by society members to its journal, Plant Physiology, will be made open access immediately on publication at no additional charge. Non-member authors can receive OA through payment of $1,000, but since membership is only $115/year, it is expected this initiative will boost membership.
+Partial open access exists when only research articles are open (as in BMJ), while articles in other categories are paywalled.
+
+
+== See also ==
+List of open-access journals
+Scientific journal
+
+
+== References ==
+
+
+== External links ==
+Nine questions for hybrid journal programs Archived 16 January 2013 at the Wayback Machine by Peter Suber, SPARC Open Access Newsletter, issue No. 101, 2 September 2006.
+More on society publishers with OA journals Archived 10 November 2012 at the Wayback Machine by Peter Suber, Open Access News, 3 November 2007.
+When Is Open Access Not Open Access? by Catriona J. MacCallum, PLoS Biology, 2007; 5(10): e285.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Institutional_repository-0.md b/data/en.wikipedia.org/wiki/Institutional_repository-0.md
new file mode 100644
index 000000000..c72e8111b
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Institutional_repository-0.md
@@ -0,0 +1,70 @@
+---
+title: "Institutional repository"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Institutional_repository"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:15:20.491930+00:00"
+instance: "kb-cron"
+---
+
+An institutional repository (IR) is an archive for collecting, preserving, and disseminating digital copies of the intellectual output of an institution, particularly a research institution. Academics also utilize their IRs for archiving published works to increase their visibility and collaboration with other academics. However, most of these outputs produced by universities are not effectively accessed and shared by researchers and other stakeholders. As a result academics should be involved in the implementation and development of an IR project so that they can learn the benefits and purpose of building an IR.
+An institutional repository has been defined as "a set of services that a university offers to members of its community for the management and dissemination of digital materials created by the institution and its community members." For a university, this includes materials such as monographs, eprints of academic journal articles—both before (preprints) and after (postprints) undergoing peer review—as well as electronic theses and dissertations (ETDs). An institutional repository might also include other digital assets generated by academics, such as datasets, administrative documents, course notes, learning objects, academic posters or conference proceedings. Deposit of material in an institutional repository is sometimes mandated by an institution.
+Some of the main objectives for having an institutional repository are to provide open access to institutional research output by self-archiving in an open access repository, to create global visibility for an institution's scholarly research, and to store and preserve other institutional digital assets, including less formally published grey literature such as theses, working papers or technical reports.
+
+
+== Functions ==
+Institutional repositories can be classified as a type of digital library. Institutional repositories perform the main functions of digital libraries by collecting, classifying, cataloging, curating, preserving, and providing access to digital content.
+Institutional repositories enable researchers to self-archive their research output and can improve the visibility, usage and impact of research conducted at an institution. Other functions of an institutional repository include knowledge management, research assessment, and open access to scholarly research.
+In 2003, the functions of an institutional repository were described by Clifford Lynch in relation to universities. He stated that:
+
+"... a university-based institutional repository is a set of services that a university offers to the members of its community for the management and dissemination of digital materials created by the institution and its community members. It is most essentially an organizational commitment to the stewardship of these digital materials, including long-term preservation where appropriate, as well as organization and access or distribution."
+The content of an institutional repository depends on the focus of the institution. Higher education institutions conduct research across multiple disciplines, thus research from a variety of academic subjects. Examples of such institutional repositories include the MIT Institutional Repository. A disciplinary repository is subject specific. It holds and provides access to scholarly research in a particular discipline. While there can be disciplinary repositories for one institution, disciplinary repositories are frequently not tied to a specific institution. The PsyDok disciplinary repository, for example, holds German-language research in psychology, while SSOAR is an international social science full-text server. Content included in an institutional repository can be both digitized and born-digital.
+
+
+== Open-access repositories ==
+
+Institutional repositories that provide access to research to users outside the institutional community are one of the recommended ways to achieve the open access vision described in the Budapest Open Access Initiative definition of open access. This is sometimes referred to as the self-archiving or "green" route to open access.
+
+
+== Developing an institutional repository ==
+Steps in the development of an institutional repository include choosing a platform and defining metadata practices. Designing an IR requires working with faculty to identify the type of content the library needs to support Marketing and promoting the Institutional repository is important to enhance access and increase the visibility of the researchers. Libraries will also need to target their marketing efforts to different groups of stakeholders. They may generate faculty interest by describing how an IR can support research or improve future findability of articles
+
+
+== Software ==
+Most institutional repository software platforms can use OAI-PMH to harvest metadata. For example, DSpace supports OAI-PMH.
+A 2014 survey commissioned by Duraspace found that 72% of respondents indicated that their institutional repository is hosted by a third party.
+
+
+== Aggregators ==
+The Confederation of Open Access Repositories (COAR) states in its manifesto that "Each individual repository is of limited value for research: the real power of Open Access lies in the possibility of connecting and tying together repositories, which is why we need interoperability. In order to create a seamless layer of content through connected repositories from around the world, open access relies on interoperability, the ability for systems to communicate with each other and pass information back and forth in a usable format. Interoperability allows us to exploit today's computational power so that we can aggregate, data mine, create new tools and services, and generate new knowledge from repository content."
+Interoperability is achieved in the world of institutional repositories by using protocols such as OAI-PMH. This allows search engines and open access aggregators, such as BASE, CORE and Unpaywall, to index repository metadata and content and provide value-added services on top of this content.
+The Digital Commons Network aggregates by discipline some 500 institutional repositories running on the Bepress Digital Commons platform. It includes more than two million full-text objects.
+
+
+== See also ==
+Digital Assets Repository – at Bibliotheca Alexandrina in Egypt
+Current research information system (CRIS)
+Category:Open-access archives
+Library publishing
+Registry of Open Access Repositories
+ResCarta Toolkit
+Scholarly commons
+CORE (research service)
+
+
+== References ==
+
+
+== Further reading ==
+Finlay, Stephen Craig, ed. (2021). The complete guide to institutional repositories. Chicago: ALA Editions. ISBN 9780838948101.
+Callicott, Burton B.; Scherer, David; Wesolek, Andrew, eds. (2015). Making institutional repositories work. West Layfayett: Purdue University Press. doi:10.2307/j.ctt1wf4drg. ISBN 9781557537263.
+Bluh, Pamela; Hepfer, Cindy, eds. (2013). The institutional repository: benefits and challenges. Chicago: Association for Library Collections & Technical Services, American Library Association. ISBN 978-0838986615.
+Buehler, Marianne (2013). Demystifying the institutional repository for success. Oxford: Chandos Publishing. ISBN 9781843346739.
+
+
+== External links ==
+Ranking Web of World Repositories
+List of repository software on Libopedia
+Practical guidelines for starting an institutional repository
+Peter Suber (ed.). "(Institutional repositories)". Open Access Tracking Project. Harvard University. OCLC 1040261573. News and comment from the worldwide movement for open access to research
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Iowa_State_University_Digital_Press-0.md b/data/en.wikipedia.org/wiki/Iowa_State_University_Digital_Press-0.md
new file mode 100644
index 000000000..7e2918aed
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Iowa_State_University_Digital_Press-0.md
@@ -0,0 +1,41 @@
+---
+title: "Iowa State University Digital Press"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Iowa_State_University_Digital_Press"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:15:21.684131+00:00"
+instance: "kb-cron"
+---
+
+The Iowa State University Digital Press (also known as ISUDP) is a digital university press affiliated with Iowa State University, located in Ames, Iowa. The press, which is a unit of the Iowa State University Library, was organized in 2018 and is dedicated to the creation, publication, and dissemination of open-access books and journal articles.
+Often seen as a successor of sorts to the Iowa State University Press (a now-defunct publisher that had previously been an active member of the Association of American University Presses), the Iowa State University Digital Press was founded to "support of Iowa State University’s land-grant mission." The publisher is currently a member of the Library Publishing Coalition.
+
+
+== Publications ==
+
+
+=== Notable journals ===
+Journal of Librarianship and Scholarly Communication
+Journal of Technology, Management, and Applied Engineering
+Meat and Muscle Biology
+
+
+=== Notable proceedings ===
+International Interactive Symposium on Ultra-High Performance Concrete
+International Textile and Apparel Association Annual Conference Proceedings
+Pronunciation in Second Language Learning and Teaching Proceedings
+
+
+== See also ==
+
+List of English-language book publishing companies
+List of university presses
+
+
+== References ==
+
+
+== External links ==
+Official website
+Pressbooks website
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Irina_Bolychevsky-0.md b/data/en.wikipedia.org/wiki/Irina_Bolychevsky-0.md
new file mode 100644
index 000000000..e40b737a7
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Irina_Bolychevsky-0.md
@@ -0,0 +1,41 @@
+---
+title: "Irina Bolychevsky"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Irina_Bolychevsky"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:16:29.872986+00:00"
+instance: "kb-cron"
+---
+
+Irina Bolychevsky (born 1986) is a British activist and data specialist, focused on Open Data, decentralized technologies, and technical standards. She is currently director of standards and interoperability at the NHSX of the United Kingdom Government. She has been part of large organizations in those fields, including the Open Knowledge Foundation, the World Wide Web Consortium, and the Open Data Institute, and worked for the UK, Dubai and UAE government administrations. She co-founded Redecentralize.org, an advocacy group promoting decentralized technologies.
+
+
+== Work ==
+She was the product owner of the open-source open data portal CKAN from 2011 to 2014, the period in which it was redesigned in a 2.0 version and where she piloted the transition from a mostly national use in 2011, to international adoption. In this period, she managed its adoption by and relaunch of data.gov. Her work with CKAN allowed her to win the Open Data Individual Champion Award by the Open Data Institute.
+Bolychevsky was W3C staff from July 2015 till December 2016. During this period, she was part of the W3C Social Web working group. She actively participated in EU-funded research projects in which the W3C was part of, working on open standards for decentralized technologies developed within the D-Cent project, and on the challenges around open standards within the Big Data Europe project.
+Since then, according to her profile by the British Computer Society, she "developed the personal data infrastructure programme within the UK's Government Digital Service", "developed Smart Dubai's and UAE federal policy, regulatory, commercial and technical frameworks for data exchange" and "ran one of the first UK data trust pilots and researched digital identity for the Open Data Institute".
+
+
+== Activism ==
+The group Redecentralize.org, which she co-founded, claims to be "a movement of people pioneering technologies and governance models to redecentralize the web". According to The New Yorker, it is an "advocacy group that provides support for projects that aim to make the Web less centralized". It has also been defined as a "research policy institute".
+The group maintains a directory of decentralized web projects which seems to be recognized as the reference list in the field by bloggers and several sites in the field. The group has organized two conferences on the topic, with Bolychevsky as main organizer: in 2015 hosted by ThoughtWorks, and in 2019. These two events hosted speakers such as Open Rights Group's Kevin Marks, Mozilla's Tantek Çelik, Ethereum's Gavin Wood, OAuth's Blaine Cook, Francis Irving, and representatives from Matrix.org, IPFS, Solid and Secure Scuttlebutt. The group has also hosted smaller meetups, one featuring BBC's Bill Thompson, and virtual public meetings, one within the frame of W3C.
+She was fellow of the London college for political technologists Newspeak House and co-founder of the Coffee House Club. She is currently co-organizor of the Citizen Beta civic tech meetups, and director/trustee of not-for-profit Eco Soul hostel.
+
+
+== International recognition ==
+In 2014, she was awarded with the Open Data Individual Champion Award for her work with CKAN, as part of the first Open Data awards by the Open Data Institute. Since 2018, she sits on the Board of Directors of the Open Knowledge Foundation, where she was previously Commercial Director.
+She has been featured by the New Yorker and twice in the BBC. She was highlighted by the British Computer Society in their "Women in Open Source" series. She has been keynote speaker in several conferences including: Rest Fest, the II Brazilian National Conference on Open Data, MozFest and EmpoderaLive. She has been guest author in the sites of the P2P Foundation, the Sunlight Foundation, the Open Data Institute, the UK online newspaper New Socialist, UK's open data portal data.gov.uk, and US's open data portal data.gov.
+
+
+== See also ==
+CKAN
+Open Data Institute
+
+
+== References ==
+
+
+== External links ==
+Personal website
+Redecentralize website
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/JAIRO-0.md b/data/en.wikipedia.org/wiki/JAIRO-0.md
new file mode 100644
index 000000000..57745858e
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/JAIRO-0.md
@@ -0,0 +1,23 @@
+---
+title: "JAIRO"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/JAIRO"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:15:24.024520+00:00"
+instance: "kb-cron"
+---
+
+JAIRO (ジャイロ), which stands for Japanese Institutional Repositories Online, is a web-based search interface that provides aggregated open access to Japanese academic content, including journal articles, theses, research bulletins, and reports. It is administered by Japan's National Institute of Informatics (NII).
+
+
+== History ==
+A beta version of JAIRO was launched on October 22, 2008, and its official opening was on April 1 of the following year. JAIRO began as the JuNii+ service, which operated from May 2007 until March 2009.
+As of September 30, 2015, nearly 1.6 million full-text documents were accessible through JAIRO.
+
+
+== External links ==
+Official website
+
+
+== References ==
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Lever_Press-0.md b/data/en.wikipedia.org/wiki/Lever_Press-0.md
new file mode 100644
index 000000000..75a945ad7
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Lever_Press-0.md
@@ -0,0 +1,29 @@
+---
+title: "Lever Press"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Lever_Press"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:15:26.351094+00:00"
+instance: "kb-cron"
+---
+
+Lever Press is a university press, based out of Ann Arbor, MI. Though founded in 2016 with the help of Michigan Publishing, Amherst College Press, and the Oberlin Group, Lever Press is not affiliated with a single university. Instead, it represents a consortium of universities, each of which helps govern and guide the press. All publications issued by the press are released as open access works, with the cost of production being paid for by the participating universities. The press is a member of the Association of University Presses. 
+
+
+== Participating institutions ==
+Participating institutions include the following:
+
+
+== See also ==
+List of English-language book publishing companies
+List of university presses
+University of Michigan Library
+University of Michigan Press
+
+
+== References ==
+
+
+== External links ==
+Lever Press homepage
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Library_publishing-0.md b/data/en.wikipedia.org/wiki/Library_publishing-0.md
new file mode 100644
index 000000000..43c04951f
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Library_publishing-0.md
@@ -0,0 +1,46 @@
+---
+title: "Library publishing"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Library_publishing"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:15:27.537799+00:00"
+instance: "kb-cron"
+---
+
+Library publishing, also known as campus-based publishing, is the practice of an academic or public library providing publishing services.
+
+
+== Concept ==
+A library publishing service usually publishes academic journals and often provides a broader range of publishing services as well. This can include publishing other formats such as scholarly monographs and conference proceedings. It generally has a preference for open access publishing.
+Library publishing often focuses on electronic publishing rather than print, thus complementing the role of traditional academic presses.  Sometimes a library and a university press based at the same institution will form a partnership, with each focusing on their own area of expertise. For example, the University of Pittsburgh library publishing service publishes peer-reviewed journals and also collaborates with the university press to publish open access monographs.
+Software is available to manage the journal publication process. The open source Open Journal Systems by the Public Knowledge Project, and Digital Commons' bepress, are both widely used by library publishing services. Some libraries use Open Journal Systems to create overlay journals which present scholarly content that is held in an institutional repository.
+
+
+== History ==
+Library publishing has a long history and has been around since before the Internet.
+In 1990, academic libraries published two of the first scholarly electronic journals on the Internet. The University of Houston Libraries began publishing The Public-Access Computer Systems Review  and the Virginia Tech University Libraries began publishing the Journal of the International Academy of Hospitality Research. 
+The Synergies project (2007–2011) was a collaboration between different Canadian universities to create infrastructure to support institutional publishing activities. A survey conducted by Hahn in 2008 found that at that time 65% of research libraries in North America either had a library publishing service or were considering creating one.
+In 2011 in the UK, Jisc funded three library publishing projects: Huddersfield Open Access Publishing (HOAP) at the University of Huddersfield, SAS Open Journals at the University of London, and EPICURE at UCL.
+The Library Publishing Coalition was launched in 2013 to provide a hub for library publishing activities. In October 2013, during Open Access Week, they launched a Library Publishing Directory which contains information about library publishing activities at 115 academic and research libraries.
+
+
+== See also ==
+Category:Academic journals published by university libraries
+Category:Library publishing
+Scholarly commons
+University press
+
+
+== References ==
+
+
+== Further reading ==
+Phil Jones (Dec 1, 2014). "What's Going on in the Library? Part 1: Librarian Publishers May Be More Important Than You Think". The Scholarly Kitchen.
+Phil Jones (Dec 9, 2014). "What's Going on in the Library? Part 2: The Convergence of Data Repositories and Library Publishers". The Scholarly Kitchen.
+
+
+== External links ==
+
+Library Publishing Coalition
+Campus-based Publishing Resource Center Archived 2019-10-23 at the Wayback Machine
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/List_of_academic_publishers_by_preprint_policy-0.md b/data/en.wikipedia.org/wiki/List_of_academic_publishers_by_preprint_policy-0.md
new file mode 100644
index 000000000..e09238d0e
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/List_of_academic_publishers_by_preprint_policy-0.md
@@ -0,0 +1,37 @@
+---
+title: "List of academic publishers by preprint policy"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/List_of_academic_publishers_by_preprint_policy"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:15:28.763243+00:00"
+instance: "kb-cron"
+---
+
+This is a list of publishers of academic journals by their submission policies regarding the use of preprints prior to publication (example list).
+Publishers' policies on self-archiving (including of preprint versions) can also be found at SHERPA/RoMEO.
+
+
+== Policies by publisher ==
+Submission of preprints is accepted by all open access journals. Over the last decade, they have been joined by most subscription journals, however publisher policies are often vague or ill-defined.
+In general, most publishers that permit preprints require that:
+
+the authors disclose the existence of the preprint at submission (e.g. in the cover letter)
+once an article is published, the preprint should link to the published version (typically via DOI)
+the preprint should not have been formally peer reviewed
+Publishers may place additional restrictions (e.g. specifying non-commercial servers or preferred licenses). Most publishers have a unified policy across all of their journals, however some journals list exceptions in their own policies. 
+
+
+== See also ==
+Copyright policies of academic publishers
+Ingelfinger rule
+List of open-access journals
+List of preprint repositories
+List of research funders by preprint licensing policy
+
+
+== References ==
+
+
+== External links ==
+SHERPA/RoMEO - a list of publisher policies on copyright, preprints, and self-archiving
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/List_of_research_funders_by_preprint_licensing_policy-0.md b/data/en.wikipedia.org/wiki/List_of_research_funders_by_preprint_licensing_policy-0.md
index f1cf1d619..e4d4d6951 100644
--- a/data/en.wikipedia.org/wiki/List_of_research_funders_by_preprint_licensing_policy-0.md
+++ b/data/en.wikipedia.org/wiki/List_of_research_funders_by_preprint_licensing_policy-0.md
@@ -4,7 +4,7 @@ chunk: 1/1
 source: "https://en.wikipedia.org/wiki/List_of_research_funders_by_preprint_licensing_policy"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T06:12:53.596810+00:00"
+date_saved: "2026-05-05T10:16:01.063752+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Malamud_decision-0.md b/data/en.wikipedia.org/wiki/Malamud_decision-0.md
new file mode 100644
index 000000000..41098d32c
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Malamud_decision-0.md
@@ -0,0 +1,19 @@
+---
+title: "Malamud decision"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Malamud_decision"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:15:29.928284+00:00"
+instance: "kb-cron"
+---
+
+In the Malamud decision, the European Court of Justice (ECJ) held on 5 March 2024 in Case C-588/21 P that there may be an overriding public interest in the dissemination of harmonised European standards.
+
+
+== Background ==
+Carl Malamud had requested access to several European standards for his organisation public.resource.org. The European Commission refused to make the requested European standards for toy safety available free of charge, whereupon Malamud filed a lawsuit. The ECJ ruled that those harmonised technical standards (HTN) that are mandatory are part of Union law. As the principle of the rule of law requires free access to Union law, these standards must be accessible free of charge. However, the Court did not generally rule out copyright protection for harmonised standards.
+Free access to some harmonised standards is now possible after registration, see Access to documents
+
+
+== References ==
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Mega_journal-0.md b/data/en.wikipedia.org/wiki/Mega_journal-0.md
new file mode 100644
index 000000000..f9403fe29
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Mega_journal-0.md
@@ -0,0 +1,47 @@
+---
+title: "Mega journal"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Mega_journal"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:15:31.098887+00:00"
+instance: "kb-cron"
+---
+
+A mega journal (also mega-journal and megajournal) is a peer-reviewed academic open access journal designed to be much larger than a traditional journal by exercising low selectivity among accepted articles. It was pioneered by PLOS ONE. This "very lucrative publishing model" was soon emulated by other publishers.
+
+
+== Definition ==
+A mega journal has the following defining characteristics:
+
+broad coverage of different subject areas;
+accepting articles for publication based on whether they are technically sound rather than selecting for perceived importance; and
+using article processing charges to cover the costs of publishing, although it is also possible for a mega journal to function as a non-profit (one example is Open Library of Humanities).
+Other less universal characteristics are
+
+"an accelerated review and publication process", "fast turnaround time";
+"academic editors", even "a large editorial board of academic editors", (instead of professional editors); and
+value-added services such as reusable graphics and data through Creative Commons licenses.
+Mega journals are also online-only, with no printed version, and are fully open access, in contrast to hybrid open access journals. Some "predatory" open access publishers use the mega journal model.
+
+
+== Influence ==
+It has been suggested that the academic journal landscape might become dominated by a few mega journals in the future, at least in terms of total number of articles published.
+Mega journals shift the publishing industry's funding standard from the subscription-based model common to traditional closed access publications to article processing charges.
+Their business model may not motivate reviewers, who donate their time to "influence their field, gain exposure to the most current cutting edge research or list their service to a prestigious journal on their CVs."
+Finally, they may no longer serve as "fora for the exchange ... among colleagues in a particular field or sub-field", as traditionally happened in scholarly journals. To counter that indiscrimination, PLOS ONE, the prototypical megajournal, has started to "package relevant articles into subject-specific collections."
+
+
+== List of mega journals ==
+
+
+== Notes ==
+
+
+== References ==
+
+
+== Further reading ==
+Bill Cope and Angus Phillips, The Future of the Academic Journal, 2nd ed., Chandos Publishing, Jul 1, 2014, 478 pages.
+Peter Binfield, "Open Access MegaJournals -- Have They Changed Everything?", Creative Commons New Zealand Blog
+Sönke Bartling & Sascha Friesike (Editors), Opening Science: The Evolving Guide on How the Web is Changing Research, Collaboration and Scholarly Publishing, Springer, 2014, ISBN 978-3-319-00025-1, 339 pp.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Mercè_Crosas-0.md b/data/en.wikipedia.org/wiki/Mercè_Crosas-0.md
new file mode 100644
index 000000000..cdd95678f
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Mercè_Crosas-0.md
@@ -0,0 +1,23 @@
+---
+title: "Mercè Crosas"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Mercè_Crosas"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:16:39.344482+00:00"
+instance: "kb-cron"
+---
+
+Mercè Crosas (born 1966, Barcelona) is a researcher and technologist specializing in data science, data management, and open data. Since November 2023 she is President of CODATA, the Committee on Data of the International Science Council. Crosas is the Director of Computational Social Sciences and Humanities at the Barcelona Supercomputing Center.
+
+
+== Biography ==
+Crosas holds a degree in physics from the University of Barcelona (1989) and a PhD in astrophysics from Rice University (Houston, Texas, 1992), with predoctoral and postdoctoral stays at the Harvard-Smithsonian Center for Astrophysics. She has spent most of her professional life at Harvard University, first as an astrophysicist and research software engineer at the Harvard-Smithsonian Center for Astrophysics and later as Director of Data Science and Technology at the Institute for Quantitative Social Sciences, from Harvard University and as Chief Research Data Management Officer at Harvard University. From 2000 to 2004, she worked outside of Harvard at a pair of biotech startups leading software development teams to build their research data systems.
+During her time at Harvard University, Crosas worked closely with research, computing services, and libraries to direct the management and publication of research data and provide guidance on University policies, processes, and tools for support the data life cycle. Crosas has extensive experience in data systems architecture and international data standards, with the vision of making them more accessible while ensuring their privacy. From 2006 to 2021, she co-led the Dataverse project and its open source community. The Dataverse software project has been used successfully to share and publish data at universities and research organizations around the world.
+She was also co-principal investigator (co-PI) of the OpenDP project, an open source set of differential privacy tools for analyzing sensitive private data, and of the NIH Data Commons Consortium.
+Crosas has been a member of numerous international committees and working groups focused on open data, data management and analysis, and data sharing. She is co-author of the internationally recognized and used FAIR data principles and has contributed to the recommendations of the Organization for Economic Cooperation and Development (OECD) for access to public data.
+Between 2021 and 2022 she was the Secretary of Open Government of the Catalan Government. Since 2023, she has led the Computational Social Sciences Program at the Barcelona Supercomputing Center.
+In November 2023 she was appointed President of CODATA.
+
+
+== References ==
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/NARCIS-0.md b/data/en.wikipedia.org/wiki/NARCIS-0.md
new file mode 100644
index 000000000..c15753232
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/NARCIS-0.md
@@ -0,0 +1,28 @@
+---
+title: "NARCIS"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/NARCIS"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:15:33.491076+00:00"
+instance: "kb-cron"
+---
+
+NARCIS (National Academic Research and Collaboration Information System) of the Netherlands was an online portal for searching Dutch scientific research publications and data. As of July 2018, NARCIS indexed 268,989 data sets and 1,707,486 publications, including a significant proportion of open access works. 
+It started in 2004 as a project of the Koninklijke Nederlandse Akademie van Wetenschappen, Information Centre of the Radboud University of Nijmegen (METIS), Nederlandse Organisatie voor Wetenschappelijk Onderzoek, and Vereniging van Universiteiten. Since 2011 the Data Archiving and Networked Services (DANS) operated NARCIS from headquarters in The Hague. In 2015, it was decided to replace the Digital Author Identifier used until then with the International Standard Name Identifier or ORCID. As of 3 July 2023, the portal has been decommissioned.
+
+
+== See also ==
+Open access in the Netherlands
+
+
+== References ==
+
+
+== Further reading ==
+Elly Dijk; et al. (2006), NARCIS: The Gateway to Dutch Scientific Information (PDF), Proceedings ELPUB2006 Conference on Electronic Publishing, Bansko, Bulgaria
+
+
+== External links ==
+Official site
+"Science and Technology Government Information Sources: International: Netherlands", ACRL Wiki, US: American Library Association's Association of College and Research Libraries (includes NARCIS)
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Open-access_mandate-0.md b/data/en.wikipedia.org/wiki/Open-access_mandate-0.md
new file mode 100644
index 000000000..99cf119d8
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Open-access_mandate-0.md
@@ -0,0 +1,43 @@
+---
+title: "Open-access mandate"
+chunk: 1/5
+source: "https://en.wikipedia.org/wiki/Open-access_mandate"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:15:38.169520+00:00"
+instance: "kb-cron"
+---
+
+An open-access mandate is a policy adopted by a research institution, research funder, or government which requires or recommends researchers—usually university faculty or research staff and/or research grant recipients—to make their published, peer-reviewed journal articles and conference papers open access (1) by self-archiving their final, peer-reviewed drafts in a freely accessible institutional repository or disciplinary repository ("Green OA") or (2) by publishing them in an open-access journal ("Gold OA") or both.
+
+== Characteristics ==
+Among the universities that have adopted open-access mandates for faculty are Harvard University, Massachusetts Institute of Technology, University College London, Queensland University of Technology, University of Minho (Portugal), University of Liège and ETH Zürich.  Among the funding organizations that have adopted open-access mandates for grant recipients are National Institutes of Health (with the NIH Public Access Policy), Research Councils UK, National Fund for Scientific Research, Wellcome Trust and European Research Council. For a full index of institutional and funder open-access mandates adopted to date, see the Registry of Open Access Mandatory Archiving Policies (ROARMAP).
+Open-access mandates can be classified in many ways: by the type of mandating organization (employing institution or research funder), by the locus (institutional or institution-external) and timing of deposit itself (immediate, delayed), by the time (immediate, delayed) at which the deposit is made open access, and by whether or not there is a default copyright-retention contract (and whether it can be waived). Mandate types can also be compared for strength and effectiveness (in terms of the annual volume, proportion and timing of deposits, relative to total annual article output, as well as the time that access to the deposit is set as open access. Mandates are classified and ranked by some of these properties in MELIBEA.
+
+=== Institutional and funder mandates ===
+
+Universities can adopt open-access mandates for their faculty. All such mandates make allowances for special cases. Tenured faculty cannot be required to publish; nor can they be required to make their publications open access. However, mandates can take the form of administrative procedures, such as designating repository deposit as the official means of submitting publications for institutional research performance review, or for research grant applications or renewal. Many European university mandates have taken the form of administrative requirements, whereas many U.S. university mandates have taken the form of a unanimous or near-unanimous self-imposed faculty consensus consisting of a default rights-retention contract (together with a waiver option for individual special cases).
+Research funders such as government funding agencies or private foundations can adopt open-access mandates as contractual conditions for receiving funding.
+New open-access mandates are often announced during the annual Open Access Week, that takes place globally during the last full week of October. For example, the Royal Society chose Open Access Week 2011 to announce the release of the digitized backfiles of their archives, dating from 1665 to 1941.
+
+=== Principal kinds of open-access mandates ===
+"Mandate" can mean either "authorize" or "oblige". Both senses are important in inducing researchers to provide OA. Open-access advocate Peter Suber has remarked that "'mandate' is not a good word..." for open-access policies, "...but neither is any other English word." Other ways to describe a mandate include "shifting the default publishing practice to open access" in the case of university faculty or "putting an open-access condition" on grant recipients. Mandates are stronger than policies which either request or encourage open access, because they require that authors provide open access. Some mandates allow the author to opt out if they give reasons for doing so.
+
+Encouragement policies - These are not requirements but merely recommendations to provide open access.
+Loophole mandates - These require authors to provide open access if and when their publishers allow it.
+Mandates may include the following clauses:
+
+Mandates with a limited-embargo clause - These require authors to provide open access either immediately or, at the latest, after a maximal permissible embargo period (which may vary from 6 months to 12 months or more).
+Mandates with an immediate-deposit clause - These require authors to deposit their refereed final drafts in their institutional repository immediately upon publication (or upon acceptance for publication) whether or not their publishing contracts allow making the deposit open access immediately: If the publisher embargoes open access, access to the deposit can be left as closed access during any permissible embargo period. (For closed-access deposits repositories have a request-a-copy Button with which users can request and authors can provide a single copy with one click each during the embargo.)
+Mandates with a rights-retention clause - These policies typically extend to the parent institution a non-exclusive license to exercise any and all copyrights in the article. Copyright remains with the author until they transfer copyright to a publisher, at which point the non-exclusive license survives. In so doing, authors are free to publish wherever they prefer, while granting the institution the right to post a version of the article on the open web via an institutional repository. The benefit of the rights-retention clause is that neither the author, nor the institution, need negotiate open access with the publisher; the policy itself allows open access to the article. Upon acceptance or publication, the author or their representative deposits the article into their institutional repository. Waivers are generally available in cases where authors do not desire open access for a given article. Examples include Europe's Plan S and policies of Harvard University and the Wellcome Trust.
+
+=== Locus of deposit ===
+Most institutional open-access mandates require that authors self archive their papers in their own institutional repository. Some funder mandates specify institutional deposit, some specify institution-external deposit, and some allow either.
+
+=== Timing of deposit ===
+Mandates may require deposit immediately upon publication (or acceptance for publication) or after an allowable embargo.
+
+=== Timing of opening access to deposit ===
+Mandates may require opening access to the deposit immediately upon publication (or acceptance for publication) or after an allowable embargo.
+
+== Instances ==
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Open-access_mandate-1.md b/data/en.wikipedia.org/wiki/Open-access_mandate-1.md
new file mode 100644
index 000000000..30a3f56f2
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Open-access_mandate-1.md
@@ -0,0 +1,30 @@
+---
+title: "Open-access mandate"
+chunk: 2/5
+source: "https://en.wikipedia.org/wiki/Open-access_mandate"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:15:38.169520+00:00"
+instance: "kb-cron"
+---
+
+=== Canadian funding agencies ===
+The Canadian Institutes of Health Research (CIHR) proposed a mandate in 2006 and adopted it in September 2007, becoming the first North American public research funder to do so. The CIHR Policy on Access to Research Outputs provides two options to researchers: publication in open access journals, and making their manuscripts available in an online central (PubMed Central Canada is recommended) or institutional repository.
+In October 2013, the two other Canadian federal funding agencies, the National Science and Engineering Council (NSERC) and the Social Science and Humanities Research Council (SSHRC) jointly proposed the same mandate as CIHR's, and launched a two-month consultation on what will become the Tri-Agency Open Access Policy.
+On 27 February 2015 a Tri-Agency Open Access Policy on Publications was announced. Peer-reviewed journal publications arising from Agency-supported research must be made freely available within 12 months of publication, whether by depositing in an online repository or by publishing in a journal that offers immediate or delayed open access.  The policy is effective for grants awarded from 1 May 2015 onward.
+On 1 May 2015 the International Development Research Centre adopted a new open access policy. Books and journal articles must be made freely available within 12 months of publication, whether by publishing open access and using open access journals, or by uploading to an open access repository.  The policy is effective for proposals received on or after 20 July 2015.
+
+=== United States funding agencies ===
+In May 2006, the US Federal Research Public Access Act (FRPAA) was proposed toward improving the NIH Public Access Policy. Besides points about making open access mandatory, to which the NIH complied in 2008, it argues to extend self-archiving to the full spectrum of major US-funded research. In addition, the FRPAA would no longer stipulate that the self-archiving must be central; the deposit can now be in the author's own institutional repository (IR). The new U.S. National Institutes of Health's Public Access Policy took effect in April 2008 and states that "all articles arising from NIH funds must be submitted to PubMed Central upon acceptance for publication". It stipulates self-archiving in PubMed Central regardless of the use of the author's own institutional repository. In 2012, the NIH announced it would enforce its Public Access Policy by blocking the renewal of grant funds to authors who don't follow the policy.
+In February 2013, the Fair Access to Science and Technology Research bill was introduced into both houses of Congress. It was described as a "strengthened version of FRPAA".
+Also in 2013, the White House issued a directive requiring federal agencies "with over $100 million in annual conduct of research and development expenditures" to develop, within the next 6 months, a plan to make the peer-reviewed publications directly arising from Federal funding "publicly accessible to search, retrieve, and analyze".
+As a result, open-access repositories and multi-annual open access strategies have been developed by federal institutions like the Department of Agriculture and the Department of Energy. DOE also hosts OSTI.gov, a repository with over 3 million records for federal works of which over 700,000 have full text as of 2019.
+In 2019, the GAO issued a report on the implementation of the 2013 directive, with 37 recommendations to 16 agencies.
+On August 25, 2022 US Office of Science and Technology Policy under Biden's administration issued guidance to make all federally funded research in the USA (the first country to do so) freely available without delay, thus ending over 50 years of Serials crisis albeit only for the US contributions.
+
+=== European funding agencies ===
+In April 2006, the European Commission recommended: "EC Recommendation A1: "Research funding agencies... should [e]stablish a European policy mandating published articles arising from EC-funded research to be available after a given time period in open access archives..." This recommendation has since been updated and strengthened by the European Research Advisory Board (EURAB). The project OpenAIRE (Open Access Infrastructure for Research in Europe) has since been launched.
+The global shift towards open access to the results of publicly funded research (publications and data) has been a core strategy in the European Commission to improve knowledge circulation and thus innovation. It is illustrated in particular by the general principle for open access to scientific publications in Horizon 2020 and the pilot for research data. In 2012, via a Recommendation, the European Commission encouraged all EU Member States to put publicly funded research results in the public sphere in order to strengthen science and the knowledge-based economy. In 2017 it emerged that the European Commission are looking to create its own open access publishing platform for papers that emerge from the Horizon 2020 programme. The platform is likely to be similar to the one used by Wellcome Trust for Wellcome Open Research and Gates Foundation's Gates Open Research.
+To somewhat improve on the European Commission's (and FRPAA's) allowable embargo of up to six months, EURAB has revised the mandate: all articles must be deposited immediately upon acceptance for publication; the allowable delay for complying with publisher embargoes applies only to the time when access to the deposit must be made open access rather than to the time when it must be deposited. Immediate deposit is required so that individual users can then request an immediate individual copy of any deposited eprint during the embargo period by clicking on a "RequestCopy" Button provided by the Institutional Repository software (e.g., DSPACE, EPrints). The Button automatically sends an email message to the author requesting an individual eprint; the author can comply with one click and the software immediately emails the eprint to the requestor. This is not open access, but may cover some immediate research needs during any embargo. A related idea was later put forth as the Open Access Button for papers that have not been deposited in an Institutional Repository.
+
+== Effectiveness ==
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Open-access_mandate-2.md b/data/en.wikipedia.org/wiki/Open-access_mandate-2.md
new file mode 100644
index 000000000..a790059fd
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Open-access_mandate-2.md
@@ -0,0 +1,37 @@
+---
+title: "Open-access mandate"
+chunk: 3/5
+source: "https://en.wikipedia.org/wiki/Open-access_mandate"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:15:38.169520+00:00"
+instance: "kb-cron"
+---
+
+For the four institutions with the oldest self-archiving mandates, the averaged percentage of green open-access self-archiving has been compared to the percentage for control articles from other institutions published in the same journals (for years 2002–2009, measured in 2011). Open-access mandates triple the percent Green OA (see figure below). Respective totals are derived from the Thomson Reuters Web of Science.
+
+=== Tracking mandates ===
+As of May 2015, open-access mandates have been adopted by over 550 universities and research institutions, and over 140 research funders worldwide.
+Examples of universities which have open-access mandates are Harvard University and MIT in the United States, 
+University College London in the UK and ETH Zürich in Europe. Funders which require open access when their funding recipients publish include the NIH in the US and RCUK and ERC in the EU. Mandate policy models and guidance have been provided by the Open Society Institute's EPrints Handbook, EOS, OASIS and Open Access Archivangelism.
+ROARMAP, the searchable Registry of Open Access Repository Mandates and Policies at the University of Southampton indexes the world's institutional, funder and governmental OA mandates (and the Open Access Scholarly Information Sourcebook (OASIS) as well as EnablingOpenScholarship (EOS) graph the quarterly outcome). SHERPA/JULIET is a SHERPA service which lists funder mandates only.
+In international cross-disciplinary surveys conducted by Swan (2005), the vast majority of researchers respond that they would self archive willingly if their institutions or funders mandated it. Outcome studies by Sale (2006) have confirmed these survey results. Both mandated and unmandated institutional and disciplinary repositories worldwide are indexed by SHERPA's OpenDOAR and their rate of growth is monitored and displayed by the University of Southampton's Registry of Open Access Repositories (ROAR).
+Recent studies have tested which mandate conditions are most effective in generating deposit. The three most important conditions identified were: (1) immediate deposit required, (2) deposit required for performance evaluation, and (3) unconditional opt-out allowed for the OA requirement but no opt-out allowed for the deposit requirement.
+
+== Policies adopted by research universities ==
+The information which follows relates more closely to open access policies/mandates covering open publishing of research outputs than to OER specifically. An open-access policy enacted by the Faculty of a research university can empower them in choosing how to distribute their own scholarly work. If a faculty member wishes to grant exclusive rights to a publisher, they would first need to request a waiver from their faculty governance body. Some reasons to implement this kind of policy institution-wide are to:
+
+increase the overall impact of an institution's research contributions to the global knowledge economy,
+individual faculty receive their institution's full support in a unified action to work with publishers to simplify procedures and broaden access to their scholarly work (allowing for greater possibilities for citations of their work - important for hiring, tenure and promotion decisions),
+take advantage of scholarly interactions with a greater diversity of readers, not just those who can afford to purchase the information from a vendor or attend an academic conference.
+This kind of blanket policy provides support to those whose research is not part of a project that requires open access to the research done. For example, since the February 2013 directive from the United States Office of Science and Technology Policy, U.S. federal agencies have been developing their own policies on making research freely available within a year of publication.
+SPARC, the Scholarly Publishing and Academic Resources Coalition (Archived 20 October 2020 at the Wayback Machine), led the collaborative and open effort to create an "Open Access Spectrum" that demonstrates a more sophisticated approach is needed in discussions about the concept of openness in research communications. The "HowOpenIsIt? Guide (as well as an FAQ document and slide deck) is available for download on the SPARC website. Another useful guide has been developed by members of the Harvard Office for Scholarly Communication, the Harvard Open Access Project, and the Berkman Center for Internet and Society. This online guide, "Good practices for university open-access policies" is built on a wiki and is designed to evolve over time, according to the co-authors: Emily Kilcer, Stuart Shieber and Peter Suber.
+
+=== United States ===
+
+==== California Institute of Technology ====
+On June 10, 2013, the Faculty Board of the California Institute of Technology (Caltech) created an institution-wide Open Access Policy. The ruling stated that as of January 1, 2014, all Caltech faculty must agree to grant nonexclusive rights to Caltech to disseminate their scholarly papers either via the authors' own sites or to Caltech AUTHORS, the online repository. The goal is to encourage wider distribution of their work and to simplify the copyright process when posting research on faculty or institutional Web sites. The initiative was put in place to prevent publishers of those journals from threatening legal action or issuing takedown notices to authors who have posted their content on their own sites or to CaltechAUTHORS, an online repository for research papers authored by Caltech faculty and other researchers at Caltech.
+
+==== Duke University ====
+On March 21, 2010, the Duke University Academic Council voted to support the University Library's new data repository, DukeSpace, with a blanket policy to provide open access to their scholarly writings. The policy allows for faculty members to opt out at any time, and it is regularly reviewed to determine its effectiveness.
+Duke also in 2010 joined the Compact for Open-Access Publishing Equity (COPE) and established a fund to help Duke faculty members to cover any author fees required to publish in open access journals.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Open-access_mandate-3.md b/data/en.wikipedia.org/wiki/Open-access_mandate-3.md
new file mode 100644
index 000000000..90d0dd240
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Open-access_mandate-3.md
@@ -0,0 +1,27 @@
+---
+title: "Open-access mandate"
+chunk: 4/5
+source: "https://en.wikipedia.org/wiki/Open-access_mandate"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:15:38.169520+00:00"
+instance: "kb-cron"
+---
+
+==== Harvard University ====
+On February 12, 2008, the Faculty of Arts and Sciences of Harvard University approved their Open Access Policy, granting to the President and Fellows of Harvard to "make available his or her scholarly articles and to exercise the copyright in those articles ... in a nonexclusive, irrevocable, paid-up, worldwide license..." Since then, several other schools within the University now participate in the Open Access Policies supported by the Office for Scholarly Communication: the Graduate School of Design, the School of Education, the Business School, the Law School, the Kennedy School of Government, the Divinity School, and the School of Public Health. The University's open-access repository is called DASH (Digital Access to Scholarship at Harvard) which is where the faculty upload their scholarly articles for access by all.
+
+==== Massachusetts Institute of Technology ====
+Adopted by a unanimous vote on March 18, 2009, the Massachusetts Institute of Technology (MIT) Faculty adopted an open access policy. The policy applies to "all scholarly articles written while the person is a member of the Faculty except for any articles completed before the adoption of this policy and any articles for which the Faculty member entered into an incompatible licensing or assignment agreement before the adoption of this policy." The MIT online repository is called DSpace@MIT and it was designed to work seamlessly with Google Scholar. The Faculty revised and updated the policy in 2010 to take into consideration the various issues associated with the MIT librarians' discussions with publishers.
+
+==== Princeton University ====
+In 2010 the Dean of the Faculty of Princeton University appointed an ad-hoc committee of faculty and the University Librarian to study the question of open access to faculty publications - and in March 2011, the committee recommended several changes to the Faculty rules to allow for a blanket policy for open access to Princeton faculty scholarship. The faculty approved an open access policy on September 19, 2011, which was last revised in January 2012.
+
+==== Stanford University ====
+On June 26, 2008, the Stanford University Graduate School of Education (GSE) were the first in that school to grant permission to the University to make their scholarly articles publicly accessible and to exercise the copyright in a "nonexclusive, irrevocable, worldwide license ... provided that the articles are properly attributed to the authors not sold for a profit." The GSE Open Archive houses and makes publicly available the GSE authors' working papers as well as published articles. Between May 21-24th, 2013, the Stanford GSE doctoral students voted in favor of a motion to enact an Open Access policy. At this time, however, despite the strong case made by Professors John Willinsky and Juan Pablo Alperin, no other Stanford academic units have stepped forward.
+
+==== University of California ====
+On July 24, 2013, the Academic Senate of the University of California (UC) approved the UC Open Access Policy for all 8,000 plus faculty at their ten campuses. Some confusion at the local campuses led to online postings of journal articles whose copyright was already owned by publishers. For example, in December 2013, the academic publishing company Elsevier sent several UC faculty notices to take down certain journal articles posted openly on their campus webpages, e.g., on the department websites or faculty profiles. The UC Open Access Policy protected those faculty who had correctly uploaded their articles to the UC eScholarship Archived 14 September 2017 at the Wayback Machine repository. In another case of misunderstanding by the faculty about open access, in March 2014 the University received a Digital Millennium Copyright Act (DMCA) takedown notice for nine articles owned by the American Society for Civil Engineers (ASCE). The UC faculty authors had uploaded to eScholarship the publisher-formatted articles between 2004 and 2008, before the UC Open Access Policy had been enacted and in violation of the publisher's agreement with the authors when they gave their copyrights to the ASCE.
+
+==== University of Colorado Boulder ====
+In 2014 the Faculty Assembly of the University of Colorado Boulder approved the CU Boulder Open Access Policy Archived 30 June 2019 at the Wayback Machine "in order to allow for broad dissemination of their research." They granted to The Regents of the University of Colorado "a nonexclusive, irrevocable, worldwide license to exercise any and all rights under copyright relating to their scholarly work, as long as the works are properly attributed to the authors and not used for commercial purposes"—and that the individual faculty would retain full ownership of the material. Authors at CU Boulder are expected to inform publishers about the University's policy and that they "have granted a pre-existing License." The digital repository, CU Scholar, is maintained by the University Libraries and functions under a set of policies Archived 30 June 2019 at the Wayback Machine derived from the Open Access Policy. Contributions from the CU Boulder community can include working papers and technical reports, published scholarly research articles, completed manuscripts, digital art or multimedia, conference papers and proceedings, theses and dissertations, Undergraduate Honors theses, journals published on campus, faculty course-related output primarily of scholarly interest, and data sets.  The Chancellor's Executive Committee recently approved the new policy, following the lead of the Council of Deans and the Office of the Provost and Executive Vice Chancellor.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Open-access_mandate-4.md b/data/en.wikipedia.org/wiki/Open-access_mandate-4.md
new file mode 100644
index 000000000..dd2a1de25
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Open-access_mandate-4.md
@@ -0,0 +1,29 @@
+---
+title: "Open-access mandate"
+chunk: 5/5
+source: "https://en.wikipedia.org/wiki/Open-access_mandate"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:15:38.169520+00:00"
+instance: "kb-cron"
+---
+
+==== University of Kansas ====
+In 2005 the University of Kansas (KU) created KU ScholarWorks, a digital repository for scholarly work created by KU faculty and staff. Faculty Senate President Lisa Wolf-Wendel, professor of education leadership and policy studies, approved a new policy, "Open Access Policy for University of Kansas Scholarship" on April 30, 2009, in order to provide the broadest possible access to the journal literature authored by KU faculty." In June 2009, under a faculty-initiated policy approved by Chancellor Robert Hemenway, KU became the first U.S. public university to implement an open access policy. Unless a KU author sought a waiver, all articles must be submitted to KU ScholarWorks. "Processes to Implement the KU Open Access Policy Archived 23 September 2020 at the Wayback Machine" were endorsed by the Faculty Senate in February 2010. Theses and dissertations at the University of Kansas are also openly available, however in 2010 KU Graduate Studies established a policy that a student may request permission to embargo its publication for six months, one year or two years. Graduates earning the KU Master of Fine Arts in Creative Writing or PhD in English (Literature and Creative Writing track) may request a permanent embargo.
+
+== See also ==
+NIH Public Access Policy
+ROARMAP
+SHERPA/Juliet
+
+== References ==
+
+== Sources ==
+Suber, Peter (2012). Open access (The MIT Press Essential Knowledge Series ed.). Cambridge, Mass.: MIT Press. ISBN 9780262517638. Retrieved 26 January 2016.
+See especially Chapter 4, Policies and Section 4.2, Digression on the word "Mandate".
+
+== External links ==
+
+Open Access Overview Archived 19 May 2007 at the Wayback Machine by Peter Suber
+Registry of Open Access Repositories Mandatory Archiving Policies (ROARMAP)
+Good practices for university open-access policies, by Stuart Shieber and Peter Suber.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Open-access_monograph-0.md b/data/en.wikipedia.org/wiki/Open-access_monograph-0.md
new file mode 100644
index 000000000..bceab24ef
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Open-access_monograph-0.md
@@ -0,0 +1,48 @@
+---
+title: "Open-access monograph"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Open-access_monograph"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:15:32.297651+00:00"
+instance: "kb-cron"
+---
+
+An open-access monograph (open-access book or OA book) is a scholarly publication usually made openly available online with an open license. These books are freely accessible to the public, typically via the internet. They are part of the open access movement.
+
+
+== Concept ==
+Open access is when academic research is made freely available online for anyone to read and re-use. As with open access journals, there are different business models for funding open-access books, including publication charges, institutional support, library publishing, and consortium models. Some publishers, like OECD Publishing, uses a freemium model where the ebook version is made available for free, but readers have the option to purchase a print copy. Sales of the print version subsidise the cost of producing the book. There is some evidence that making electronic editions of books open access can increase sales of the print edition.
+
+
+== History ==
+While open access to journal articles has become very common, with 50% of articles published in 2011 available as open access, open access to books has not yet seen as much uptake at this time. However, some dedicated open-access book publishers, such as Open Book Publishers, Punctum Books, and others who publish both books and journals like Open Humanities Press, have been launched.
+Gradually, academic publishers and university presses have also adopted an open-access monograph approach, offering this publishing option alongside journal articles. Major publishers of open-access books include, for example, Taylor & Francis, MDPI, and MIT Press. The OAPEN (Open Access Publishing in European Networks) online library and publication platform provides access to thousands of peer-reviewed academic books, mainly in the humanities and social sciences. The OAPEN Foundation also provides a directory of open access works via Directory of Open Access Books (DOAB). 
+A report released in 2015 by the UK's main funding body for research, the Higher Education Funding Council for England, states the importance of open access monographs: "Monographs are a vitally important and distinctive vehicle for research communication, and must be sustained in any moves to open access." A 2019 survey has shown that a majority of authors agree that all future scholarly books should be made available via open access. A 2023 study found that, out of 396,995 open access books analyzed, only 19% were archived, raising concerns about the longevity and accessibility of many OA books distributed online.
+
+
+== See also ==
+Directory of Open Access Journals
+Open content
+Openness
+Open Educational Resources
+Open Library of Humanities
+COPIM
+
+
+== References ==
+
+
+== Further reading ==
+Fathallah, J. (2022). Open Access Monographs: Myths, Truths and Implications in the Wake of UKRI Open Access Policy. LIBER Quarterly: The Journal of the Association of European Research Libraries, 32(1).
+Gatti, R., & Mierowsky, M. (2016). Funding open access monographs: A coalition of libraries and publishers. College & Research Libraries News, 77(9), 456-459.
+"Monographs", p. 112 in Martin Paul Eve, Open Access and the Humanities, Cambridge University Press, 2014.  full text
+"Open-access monographs", p. 419 in Peggy Johnson, Fundamentals of Collection Development and Management, American Library Association, 2014.
+
+
+== External links ==
+OAPEN Online library and publication platform
+Directory of Open Access Books
+Open Access Directory - List of publishers of OA books. OCLC 757073363
+Open Access Directory - List of OA book business models
+Open Access Scholarly Information Sourcebook - Open Access Monographs
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Open-access_repository-0.md b/data/en.wikipedia.org/wiki/Open-access_repository-0.md
new file mode 100644
index 000000000..bc6244b02
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Open-access_repository-0.md
@@ -0,0 +1,50 @@
+---
+title: "Open-access repository"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Open-access_repository"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:15:41.663023+00:00"
+instance: "kb-cron"
+---
+
+An open repository or open-access repository is a digital platform that holds research output and provides free, immediate and permanent access to research results for anyone to use, download and distribute. To facilitate open access such repositories must be interoperable according to the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH). Search engines harvest the content of open access repositories, constructing a database of worldwide, free of charge available research. Data repositories are the cornerstone for FAIR (findable, accessible, interoperable and reusable) data practices and are used expeditiously within the scientific community.
+Open-access repositories, such as an institutional repository or disciplinary repository, provide free access to research for users outside the institutional community and are one of the recommended ways to achieve the open access vision described in the Budapest Open Access Initiative definition of open access. This is sometimes referred to as the self-archiving or "green" route to open access.
+
+
+== Benefits ==
+The benefits of open-access repositories are:
+
+Opening up outputs of the institution to a worldwide audience;
+Maximizing the visibility and impact of these outputs as a result;
+Showcasing the institution to interested constituencies – prospective staff, prospective students and other stakeholders;
+Collecting and curating digital output;
+Managing and measuring research and teaching activities;
+Providing a workspace for work-in-progress, and for collaborative or large-scale projects;
+Enabling and encouraging interdisciplinary approaches to research;
+Facilitating the development and sharing of digital teaching materials and aids, and
+Supporting student endeavours, providing access to theses and dissertations and a location for the development of e-portfolios.
+
+
+== Software ==
+The most frequently used repository software for open repositories according to OpenDOAR are Digital Commons, DSpace and EPrints. Other examples are arXiv, bioRxiv, Dryad, Figshare, Open Science Framework, Samvera, Ubiquity Repositories and invenio (solution used by Zenodo).
+
+
+== See also ==
+Current research information system
+Digital library
+Digital asset management
+Library publishing
+Open-access archives
+Open access around the world
+List of academic databases and search engines
+ROARMAP
+
+
+== References ==
+
+
+== External links ==
+
+Peter Suber (ed.). "(Repositories)". Open Access Tracking Project. Harvard University. OCLC 1040261573. News and comment from the worldwide movement for open access to research
+Open repositories
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Open-source_ventilator-0.md b/data/en.wikipedia.org/wiki/Open-source_ventilator-0.md
new file mode 100644
index 000000000..207b69ade
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Open-source_ventilator-0.md
@@ -0,0 +1,28 @@
+---
+title: "Open-source ventilator"
+chunk: 1/3
+source: "https://en.wikipedia.org/wiki/Open-source_ventilator"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:15:49.197945+00:00"
+instance: "kb-cron"
+---
+
+An open source ventilator is a disaster-situation ventilator made using a freely licensed (open-source) design, and ideally, freely available components and parts (open source hardware). Designs, components, and parts may be anywhere from completely reverse-engineered or completely new creations, components may be adaptations of various inexpensive existing products, and special hard-to-find and/or expensive parts may be 3D-printed instead of purchased. As of early 2020, the levels of documentation and testing of open source ventilators was well below scientific and medical-grade standards.
+One small, early prototype effort was the Pandemic Ventilator created in 2008 during the resurgence of H5N1 avian influenza that began in 2003, so named "because it is meant to be used as a ventilator of last resort during a possible avian (bird) flu pandemic."
+
+== Quality assessment ==
+The policy of using both free and open source software (FOSS) and open source hardware theoretically allows community-wide peer-review and correction of bugs and faults in open source ventilators, which is not available in closed source hardware development. In early 2020 during the COVID-19 pandemic, a review of open source ventilators stated that "the tested and peer-reviewed systems lacked complete documentation and the open systems that were documented were either at the very early stages of design ... and were essentially only basically tested ..." The author speculated that the pandemic would motivate development that would significantly improve the open source ventilators, and that much work, policies, regulations, and funding would be needed for the open source ventilators to achieve medical-grade standards.
+
+== Design requirements ==
+
+A number of features are required for an invasive mechanical ventilator to be safely used on a patient:
+
+a way of measuring and controlling the volume pumped and the breath rate to avoid volutrauma;
+monitoring for inspiratory pressure, respiratory rate (bpm), and inspiratory-to-expiratory time (I/E) ratio
+for non-sedated patients, an "assist" mode that, instead of forcing air in at a fixed frequency, only increases the pressure when the patient inhales;
+for ARDS, support for setting positive end-expiratory pressure (PEEP) to  avoid alveoli collapse;
+humidification to avoid drying and cooling the alveoli.
+The requirements for non-invasive ventilation are less strict.
+
+== COVID-19 pandemic ==
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Open-source_ventilator-1.md b/data/en.wikipedia.org/wiki/Open-source_ventilator-1.md
new file mode 100644
index 000000000..522b5e71d
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Open-source_ventilator-1.md
@@ -0,0 +1,20 @@
+---
+title: "Open-source ventilator"
+chunk: 2/3
+source: "https://en.wikipedia.org/wiki/Open-source_ventilator"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:15:49.197945+00:00"
+instance: "kb-cron"
+---
+
+On March 16, 2020, the Open Source Ventilator Ireland (OSV) group was formed initially with the goal of building a focus team in Ireland to begin development on what was termed the “Field Emergency Ventilator (FEV)”. Inspired by the initial efforts of the Open Source Medical Supplies (OSCMS), which initially focused on developing open ventilators but quickly refocusing mainly on the local production of Personal Protective Experiment (PPE). OSV Ireland partnered with the OpenLung team in Canada, who were developing and publishing open source designs via GitLab. The group quickly grew amassing volunteer engineers, designers and medical professionals with the goal of developing new, low resource medical interventions to support a perceived lack of mechanical ventilation equipment globally. The well-known Bag Valve Mask (BVM) quickly became the core functional component of their design, with the goal of utilizing 3D printed and traditionally manufactured components for localized assembly of the systems to maximise potential manufacturing capabilities around the globe. The Open Source Ventilator Ireland (OSV) group evolved into TeamOSV, to fully incorporate both ventilator and other covid related medical equipment. 
+The FOSS Initiative OpenVentilator.io project began on March 19, after two weeks of research. Jeremias Almadas  had posted some drafts he made on the Open Source COVID-19 Medical Supplies forum. Marcos Mendez contacted him to join efforts to develop a solution that could be reproduced on a very high scale. This project later became the "OpenVentilator Spartan Model". 
+With the COVID-19 pandemic a new challenge had just arisen, this was no longer to manufacture ventilator, after all, these are manufactured since biblical times, including since the 1960s models like the Bird MK VII  were already consolidated with an enviable engineering that is very simple.
+The challenge now was to design an item that solves a problem on a global scale. Manufactured on a very large scale and with parts found in small towns and villages. These were the premises assumed by some projects like OpenVentilator.io.
+On March 18, Medtronic had opened its code and files for manufacturing its main pulmonary ventilation equipment. The issue was on a scale that Medtronic would not be able to fulfill at the global level, nor at the regional level. The same was already happening with Philips, GE and Drager, world leaders in the manufacture of this type of equipment. It would not make sense to reinvent something that had already been studied for 100 years. The problem was also not an engineering problem, but a logistical and scale problem so that these projects that were to emerge were applicable and achievable. Manufacturing should be decentralized, focused on the regional resources of each individual on planet earth. Nine out of ten Brazilian cities do not even have an ICU bed, let alone an electronics store and or an Ambu factory. The African situation had already been proclaimed a catastrophe.
+Several projects are beginning to emerge in this area, many of them with an engineering approach, many others following strict validations with the regulations.
+There are few projects that have an [analysis of complex thinking within the global economic-political stagnation.
+A major worldwide design effort began during the COVID-19 pandemic after a Hackaday project was started, in order to respond to expected ventilator shortages causing higher mortality among severe patients. This project aims to build a continuous positive airway pressure device.
+On March 19, the MakAir open source ventilator project was started by a team of software engineers in France, using 3D printing to quickly iterate on a prototype, with the goal of letting an established manufacturer produce the final ventilators for a cost nearing €2,000. The team built a working prototype in one month, at the end of which a successful 12 hour ventilation test on a pig was performed. The project received official support from the French Army's investment branch, Agence Innovation Défense of Direction générale de l'armement, granting the project €426,000 to help fund clinical trials. Groupe SEB agreed to manufacture the MakAir ventilator in their facilities in Vernon, France. As of December 2020, the MakAir ventilator project is still active on the engineering side, with full support for both pressure and volume controlled ventilation modes, and on the medical side with ongoing clinical trials at CHU Nantes on human patients.
+On March 20, 2020, Irish Health Services began reviewing the designs from the Open Source Ventilator Ireland project. A prototype is being designed and tested in Colombia.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Open-source_ventilator-2.md b/data/en.wikipedia.org/wiki/Open-source_ventilator-2.md
new file mode 100644
index 000000000..d4cd47fcf
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Open-source_ventilator-2.md
@@ -0,0 +1,51 @@
+---
+title: "Open-source ventilator"
+chunk: 3/3
+source: "https://en.wikipedia.org/wiki/Open-source_ventilator"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:15:49.197945+00:00"
+instance: "kb-cron"
+---
+
+The University of Minnesota Bakken Medical Device Center initiated a collaboration with various companies to bring a ventilator alternative to the market that works as a one-armed robot and replaces the need for manual ventilation in emergency situations. The Coventor device was developed in a very short time and approved on April 15, 2020, by the FDA, only 30 days after conception. The mechanical ventilator is designed for use by trained medical professionals in intensive care units and easy to operate. It has a compact design and is relatively inexpensive to manufacture and distribute. The cost is only about 4% of a normal ventilator. In addition, this device does not require pressurized oxygen or air supply, as is normally the case. A first series is manufactured by Boston Scientific. The plans are to be freely available online to the general public without royalties.
+The Polish company Urbicum reports successful testing of a 3D-printed, open source prototype device called VentilAid. The makers describe it as a last resort device when professional equipment is missing. The design is publicly available. The first Ventilaid prototype requires compressed air to run.
+On March 21, 2020, the New England Complex Systems Institute (NECSI) began maintaining a strategic list of open source designs being worked on. The NECSI project considers manufacturing capability, medical safety and need for treating patients in various conditions, speed dealing with legal and political issues, logistics and supply. NECSI is staffed with scientists from Harvard, MIT, and others who have an understanding of pandemics, medicine, systems, risk, and data collection.
+Massachusetts Institute of Technology began an emergency project to design a low-cost ventilator that uses a bag valve mask as the main component. Other groups and companies, such as Monolithic Power Systems, also developed designs based on this concept.
+The Oxysphere project develops open blueprints for a positive pressure ventilation hood.
+On April 23, 2020, NASA reported building, in 37 days, a successful COVID-19 ventilator (named VITAL ("Ventilator Intervention Technology Accessible Locally") which is currently undergoing further testing. NASA is seeking fast-track approval by the United States Food and Drug Administration for the new ventilator.
+On May 29, 2020, NASA revealed the "Eight US Manufacturers Selected to Make NASA COVID-19 Ventilator."
+The U.S. companies selected for licenses are:
+
+Vacumed, a division of Vacumetrics, Inc. in Ventura, California
+Stark Industries, LLC in Columbus, Ohio
+MVent, LLC, a division of Minnetronix Medical, in St. Paul, Minnesota
+iButtonLink, LLC in Whitewater, Wisconsin
+Evo Design, LLC in Watertown, Connecticut
+DesignPlex Biomedical, LLC in Fort Worth, Texas
+ATRON Group LLC in Dallas
+Pro-Dex, Inc. in Irvine, California
+Israeli engineers created an open source ventilator 
+
+== Disaster-relief provisions ==
+On March 24, 2020, the U.S. Secretary of Health and Human Services (HHS) enacted Emergency Use Authorizations to allow the use of additional devices, including: "Ventilators, positive pressure breathing devices modified for use as ventilators (collectively referred to as 'ventilators'), ventilator tubing connectors, and ventilator accessories." This was done in accordance with its February 4 declaration for medical countermeasures against the coronavirus disease 2019, and the equipment is subject to the FDA's "criteria for safety, performance and labeling."
+
+== See also ==
+Shortages related to the COVID-19 pandemic § Improvised ventilators
+
+== References ==
+
+== External links ==
+
+An overview of open source ventilator initiatives and open regulatory standards.
+Open Source Ventilator community Archived 2021-11-15 at the Wayback Machine and other COVID supplies, with 2000+ members; 8th design iteration as of March 26.
+The OpenVentilator.Io Spartan Model Archived 2021-02-26 at the Wayback Machine
+Opensource against covid19.
+Development status, concept and features comparison for open source ventilators projects in a single table.
+Open-source ventilator design, Vanderbilt University
+7 open hardware projects working to solve COVID-19.
+Open Source Against COVID-19
+Open Source COVID19 Medical Supplies
+OxyGEN Project (ed.). "Emergency ventilator for COVID-19 crisis approved by the Spanish medicine agency". Retrieved 2020-04-13.
+Automation of Bag-Valve-Mask (BVM) using arms and servo-motors. Archived 2022-03-31 at the Wayback Machine (PDF)
+Garmendia, Onintza; Rodríguez-Lazaro, Miguel A.; Otero, Jorge; Phan, Phuong; Stoyanova, Alexandrina; Dinh-Xuan, Anh Tuan; Gozal, David; Navajas, Daniel; Montserrat, Josep M.; Farré, Ramon (2020-01-01). "Low-cost, easy-to-build non-invasive pressure support ventilator for under-resourced regions: open source hardware description, performance and feasibility testing". European Respiratory Journal. 55 (6): 2000846. doi:10.1183/13993003.00846-2020. ISSN 0903-1936. PMC 7173672. PMID 32312862. Retrieved 2020-04-21.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/OpenCitations-0.md b/data/en.wikipedia.org/wiki/OpenCitations-0.md
index 092a80fbd..b95f16885 100644
--- a/data/en.wikipedia.org/wiki/OpenCitations-0.md
+++ b/data/en.wikipedia.org/wiki/OpenCitations-0.md
@@ -4,7 +4,7 @@ chunk: 1/1
 source: "https://en.wikipedia.org/wiki/OpenCitations"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T06:32:34.338399+00:00"
+date_saved: "2026-05-05T10:15:50.394520+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/OpenDOAR-0.md b/data/en.wikipedia.org/wiki/OpenDOAR-0.md
new file mode 100644
index 000000000..ff1c49c2a
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/OpenDOAR-0.md
@@ -0,0 +1,25 @@
+---
+title: "OpenDOAR"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/OpenDOAR"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:15:51.553114+00:00"
+instance: "kb-cron"
+---
+
+OpenDOAR: Directory of Open Access Repositories is a UK-based website that lists open access repositories (including academic ones). It is searchable by locale, content, and other measures. The service does not require complete repository details and does not search repositories' metadata.
+OpenDOAR is maintained by the University of Nottingham under the SHERPA umbrella of services and was developed in collaboration with Lund University. The project is funded by the Open Science Institute, Jisc, the Consortium of Research Libraries (CURL) and SPARC Europe.
+As of 2015, OpenDOAR and the UK-based Registry of Open Access Repositories (ROAR) "are considered the Two leading open access directories worldwide. ROAR is the larger directory and allows direct submissions to the directory. OpenDOAR controls submission of materials and is dependent on the discretion of its staff. OpenDOAR requires open access of scholarly publications; whereas ROAR allows other types of materials to be included. ROAR allows filtering by country, type of repository, and sorting by repository name."
+
+
+== See also ==
+List of academic databases and search engines
+OAIster
+
+
+== References ==
+
+
+== External links ==
+OpenDOAR
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Open_Access_Button-0.md b/data/en.wikipedia.org/wiki/Open_Access_Button-0.md
new file mode 100644
index 000000000..ab68c6057
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Open_Access_Button-0.md
@@ -0,0 +1,34 @@
+---
+title: "Open Access Button"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Open_Access_Button"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:15:35.759722+00:00"
+instance: "kb-cron"
+---
+
+The Open Access Button was a browser bookmarklet which registers when people hit a paywall to an academic article and cannot access it. It is supported by Medsin UK and the Right to Research Coalition.
+A prototype was built at a BMJ Hack Weekend. All code is openly available online at GitHub.
+A beta version of the Open Access Button was officially launched on 18 November 2013 at the Berlin 11 Satellite Conference for Students & Early Stage Researchers. It records instances of hitting a paywall, and also provides options to try to locate an open access version of the article. In April 2014 a crowdfunding campaign was started to build a second version.
+The second version of the button was launched on 21 October 2014 as part of Open Access Week.
+In February 2015 the Open Access Button and its co-founders, David Carroll and Joseph McArthur ("the button boys"), were awarded a SPARC Innovator Award by the  Scholarly Publishing and Academic Resources Coalition (SPARC).
+The third version of the button was launched on 28 October 2016, again, as part of open access week.
+On September 18, 2025 OA.Works shut down the project.
+
+
+== See also ==
+12ft
+Anna's Archive
+Bypass Paywalls Clean
+EndNote Click
+Eprint button
+Unpaywall
+
+
+== References ==
+
+
+== External links ==
+Official website
+Chrome Webstore Listing
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Open_Access_Scholarly_Publishing_Association-0.md b/data/en.wikipedia.org/wiki/Open_Access_Scholarly_Publishing_Association-0.md
new file mode 100644
index 000000000..74998d176
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Open_Access_Scholarly_Publishing_Association-0.md
@@ -0,0 +1,66 @@
+---
+title: "Open Access Scholarly Publishing Association"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Open_Access_Scholarly_Publishing_Association"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:15:39.330669+00:00"
+instance: "kb-cron"
+---
+
+The Open Access Scholarly Publishing Association (OASPA) is a non-profit trade association of open access journal and book publishers. Having started with an exclusive focus on open access journals, it has since expanded its activities to include matters pertaining to open access books and open scholarly infrastructure.
+
+
+== History ==
+The OASPA was launched on October 14, 2008 at an "Open Access Day" celebration in London hosted by the Wellcome Trust.
+
+The following organizations are founding members:
+
+The OASPA faced some criticism for a perceived conflict between its self-declared role as the "stamp of quality for open access publishing" and the application of its own criteria for membership. One member organization, Frontiers Media, was included on Jeffrey Beall's list of predatory open access publishing companies. Two members, Hindawi and MDPI - initially called predatory by Beall - were later removed from his list after pressure was applied to his employer. There was also concern around the fact that OASPA had been founded by BioMed Central and other open access publishers, which would cause a conflict of interest in their "seal of approval". OASPA has also been criticized for promoting gold open access in a way that may be at the expense of green open access.
+
+
+== Activities ==
+OASPA organizes an annual conference on open access scholarly publishing.
+OASPA encourages publishers to use Creative Commons licenses, particularly the Creative Commons Attribution License (CC-BY), which is in line with most definitions of "open", e.g. the Open Definition by the Open Knowledge Foundation.
+
+
+== Members ==
+OASPA members fall into the following groups:
+Professional publishing organisations – Organisations that include at least one full-time professional who manages the publication of OA scholarly journals or books. These organisations may be for-profit or nonprofit, and they may own journals or books or manage the publication on a contract basis for societies or other groups of scientists or scholars. Members of this class may also include organisations such as academic/research libraries, university presses, or other organisations in which the primary focus is other than publishing but still employ full-time professionals who manage the publication of OA scholarly journals and/or books.
+Scholar publishers – Individuals or small groups of scientists/scholars that publish usually a single scholarly journal in their field of study. The publication process is often largely subsidised by volunteer effort.
+Other organisations – Other organisations who provide significant services and/or support for OA publishing.
+In order to join OASPA as a member organization, a publisher must undergo an assessment process and meet set criteria. These criteria were set in 2013 and revised again in August 2018. There are seven categories of OASPA membership:
+
+Professional Publishing Organisation (Small)
+Professional Publishing Organisation (Medium)
+Professional Publishing Organisation (Large)
+Professional Publishing Organisation (Very Large)
+Other Organisation (non-commercial)
+Other Organisation (commercial)
+Scholar Publisher
+As of March 2021, OASPA has 159 members.
+Members of OASPA that publish journals exclusively in OA comprise the Fully Open Access Publishers special interest group.
+
+
+== Response to the Science sting ==
+As a response to the Who's Afraid of Peer Review? investigation, OASPA formed a committee to investigate the circumstances that led to the acceptance of the fake paper by 3 of its members. On 11 November 2013, OASPA terminated the membership of two publishers (Dove Medical Press and Hikari Ltd.) who accepted the fake paper. Sage Press, which also accepted a fake paper, was put "under review" for 6 months. Sage announced in a statement that it was reviewing the journal that accepted the fake paper, but that it would not shut it down.  Sage's membership was reinstated at the end of the review period following changes to the journal's editorial processes. Dove Medical Press were also reinstated in September 2015 after making a number of improvements to their editorial processes.
+
+
+== See also ==
+Association for Learned and Professional Society Publishers
+Association of Publishing Agencies
+Directory of Open Access Journals
+International Association of Scientific, Technical, and Medical Publishers
+International Publishers Association
+Category:Open access publishers
+Periodical Publishers Association
+Scholarly Publishing and Academic Resources Coalition
+Society for Scholarly Publishing
+
+
+== References ==
+
+
+== External links ==
+Official website 
+This article incorporates material from the Citizendium article "Open Access Scholarly Publishing Association", which is licensed under the Creative Commons Attribution-ShareAlike 3.0 Unported License but not under the GFDL.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Open_Journal_Systems-0.md b/data/en.wikipedia.org/wiki/Open_Journal_Systems-0.md
new file mode 100644
index 000000000..9bd8aed3e
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Open_Journal_Systems-0.md
@@ -0,0 +1,67 @@
+---
+title: "Open Journal Systems"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Open_Journal_Systems"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:15:43.916399+00:00"
+instance: "kb-cron"
+---
+
+Open Journal Systems, also known as OJS, is an open source and free software for the management of peer-reviewed academic journals, created by the Public Knowledge Project, and released under the GNU General Public License.
+
+
+== History ==
+Open Journal Systems (OJS) was conceived to facilitate the development of open access, peer-reviewed publishing, providing the technical infrastructure for the presentation of journal articles along with an editorial-management workflow, including article submission, peer-review, and indexing. OJS relies upon individuals fulfilling different roles, such as journal manager, editor, reviewer, author, and reader. It has a module that supports subscription journals.
+Like other community-based projects such as WordPress, the software has a plugin architecture, which allows new features to be integrated without changing its core codebase. Available plugins facilitate indexing in Google Scholar and PubMed Central, publishing RSS/Atom web syndication feeds, and providing COUNTER statistics about online usage, several plugins are curated and directly available for download through its plugin gallery interface. OJS is also LOCKSS-compliant, which helps ensure ongoing access to journal contents. Third-party plugins include Reading Tools, which point readers to related studies, media stories, and policy documents in open access databases, the Better Password plugin, which forces the users to use strong passwords, and many others freely available in GitHub. OJS also provides custom themes, which might be added to the installation through its plugin gallery and a demo installation to experiment its features.
+
+
+== Versions ==
+OJS is currently in its 3.4.0-7 version, released on August 23, 2024. Its first version was originally released in 2001. The software possesses an open well defined development roadmap and a set of milestones.
+The software is written in PHP, currently supports two databases, MySQL/MariaDB and PostgreSQL, and can be hosted on a Unix-like or Windows web server.
+
+Note: OJS 2 reached its end of life in 2021, its latest release was the version 2.4.8-5, released in May 2019. When upgrading from the version 2.x to 3.x, some care must be taken given that several features have been added and removed, especially if the installation has hand-made customizations.
+
+
+== Translations ==
+As of version 3.3.0, the software has been translated into 50 languages: Arabic, Armenian, Basque, Belarusian, Bosnian, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Finnish, French, Gaelic, Galician, Georgian, German, Greek, Hebrew, Hindi, Hungarian, Indonesian, Italian, Japanese, Kazakh, Kurdish, Macedonian, Malay, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swedish, Turkish, Ukrainian, and Vietnamese, with many additional languages (including Uzbek, Urdu, Sinhala, Lithuanian, Korean, and Mongolian) in development. Translations are created and maintained by the user community.
+
+
+== Documentation ==
+PKP maintains an extensive documentation hub where users can find documentation about all of its systems. The documentation covers basic software usage, migration instructions, development practices, accessibility, video tutorials and the content has been translated partially into other languages. PKP also provides extensive documentation on governance and policies.
+
+
+== Usage ==
+A user community has developed around the software, with active participants, and enhancements being contributed to the project from the Brazilian Institute for Information in Science and Technology (IBICT), the Journal of Medical Internet Research, and others. A growing body of publications and documentation is available on the project's website.
+As of mid-2021, OJS was being used by at least 25,000 journals worldwide. A daily updated map showing the location of these journals is also available on PKP's website. A survey in 2010 found that about half were in the developing world.
+The Public Knowledge Project is also collaborating with the International Network for the Availability of Scientific Publications (INASP) to develop scholarly research portals in Africa, Bangladesh, Nepal, and Vietnam.
+In Venezuela, at least 32 independent organizations, public and private universities publish 230 journals using this platform.
+OJS, as well as the Érudit publishing system, is being used in the Synergies project, creating a scholarly portal for Canadian social sciences and humanities research. OJS is also being used for research portals in Brazil, Spain, Italy, and Greece.
+
+
+== Hosting ==
+OJS hosting service is offered for a fee by the PKP|Publishing Services (PKP-operated Publishing Services), as well as a variety of third-party commercial and non-commercial service providers not affiliated with PKP.
+PKP has also released a Docker container in GitHub, which may be helpful to spin-up an OJS instance without having to deal with the web server, database and PHP installation. The container is still in beta, so it should be used only for testing purposes.
+
+
+== See also ==
+
+Open access journal
+DPubS
+OpenACS
+
+
+== Further reading ==
+da Fonseca, R.M.S. (2004, June). Open Journal Systems. Paper presented at the ICCC 8th International Conference on Electronic Publishing, Brasilia.
+Muthayan, S. (2003). Open access research and the public domain in South African universities: The Public Knowledge Project's Open Journal Systems. Paper presented at the International Symposium on Open Access and the Public Domain in Digital Data and Information for Science, UNESCO, Paris.
+Willinsky, J. (2005). Open Journal Systems: An example of open source software for journal management and publishing. Library Hi-Tech 23 (4), 504–519.
+A Survey and Evaluation of Open-Source Electronic Publishing Systems, Mark Cyzyk and Sayeed Choudhury, Johns Hopkins University
+Owen, Brian (1 April 2012). "The Public Knowledge Project and Open Journal Systems: open source options for small publishers". Learned Publishing. 25 (2): 138–144. doi:10.1087/20120208.
+"Open Journal Systems: The Digitization of Academic Journals". IndraStra Global. 2022.
+
+
+== References ==
+
+
+== External links ==
+Official website
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Open_Science_Infrastructure-0.md b/data/en.wikipedia.org/wiki/Open_Science_Infrastructure-0.md
index 60a7bcf56..1e7c79115 100644
--- a/data/en.wikipedia.org/wiki/Open_Science_Infrastructure-0.md
+++ b/data/en.wikipedia.org/wiki/Open_Science_Infrastructure-0.md
@@ -4,7 +4,7 @@ chunk: 1/8
 source: "https://en.wikipedia.org/wiki/Open_Science_Infrastructure"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T06:32:25.468404+00:00"
+date_saved: "2026-05-05T10:15:46.620146+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Open_Science_Infrastructure-1.md b/data/en.wikipedia.org/wiki/Open_Science_Infrastructure-1.md
index 5eb4b1658..e79aa4057 100644
--- a/data/en.wikipedia.org/wiki/Open_Science_Infrastructure-1.md
+++ b/data/en.wikipedia.org/wiki/Open_Science_Infrastructure-1.md
@@ -4,7 +4,7 @@ chunk: 2/8
 source: "https://en.wikipedia.org/wiki/Open_Science_Infrastructure"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T06:32:25.468404+00:00"
+date_saved: "2026-05-05T10:15:46.620146+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Open_Science_Infrastructure-2.md b/data/en.wikipedia.org/wiki/Open_Science_Infrastructure-2.md
index 7520e00f0..8d4c0185b 100644
--- a/data/en.wikipedia.org/wiki/Open_Science_Infrastructure-2.md
+++ b/data/en.wikipedia.org/wiki/Open_Science_Infrastructure-2.md
@@ -4,7 +4,7 @@ chunk: 3/8
 source: "https://en.wikipedia.org/wiki/Open_Science_Infrastructure"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T06:32:25.468404+00:00"
+date_saved: "2026-05-05T10:15:46.620146+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Open_Science_Infrastructure-3.md b/data/en.wikipedia.org/wiki/Open_Science_Infrastructure-3.md
index 336247743..eb5ec0795 100644
--- a/data/en.wikipedia.org/wiki/Open_Science_Infrastructure-3.md
+++ b/data/en.wikipedia.org/wiki/Open_Science_Infrastructure-3.md
@@ -4,7 +4,7 @@ chunk: 4/8
 source: "https://en.wikipedia.org/wiki/Open_Science_Infrastructure"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T06:32:25.468404+00:00"
+date_saved: "2026-05-05T10:15:46.620146+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Open_Science_Infrastructure-4.md b/data/en.wikipedia.org/wiki/Open_Science_Infrastructure-4.md
index f42c23222..85b07921c 100644
--- a/data/en.wikipedia.org/wiki/Open_Science_Infrastructure-4.md
+++ b/data/en.wikipedia.org/wiki/Open_Science_Infrastructure-4.md
@@ -4,7 +4,7 @@ chunk: 5/8
 source: "https://en.wikipedia.org/wiki/Open_Science_Infrastructure"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T06:32:25.468404+00:00"
+date_saved: "2026-05-05T10:15:46.620146+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Open_Science_Infrastructure-5.md b/data/en.wikipedia.org/wiki/Open_Science_Infrastructure-5.md
index 722ecbf2f..422ea5e08 100644
--- a/data/en.wikipedia.org/wiki/Open_Science_Infrastructure-5.md
+++ b/data/en.wikipedia.org/wiki/Open_Science_Infrastructure-5.md
@@ -4,7 +4,7 @@ chunk: 6/8
 source: "https://en.wikipedia.org/wiki/Open_Science_Infrastructure"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T06:32:25.468404+00:00"
+date_saved: "2026-05-05T10:15:46.620146+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Open_Science_Infrastructure-6.md b/data/en.wikipedia.org/wiki/Open_Science_Infrastructure-6.md
index f76226104..4473aeaba 100644
--- a/data/en.wikipedia.org/wiki/Open_Science_Infrastructure-6.md
+++ b/data/en.wikipedia.org/wiki/Open_Science_Infrastructure-6.md
@@ -4,7 +4,7 @@ chunk: 7/8
 source: "https://en.wikipedia.org/wiki/Open_Science_Infrastructure"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T06:32:25.468404+00:00"
+date_saved: "2026-05-05T10:15:46.620146+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Open_Science_Infrastructure-7.md b/data/en.wikipedia.org/wiki/Open_Science_Infrastructure-7.md
index 960aadf71..242e07b90 100644
--- a/data/en.wikipedia.org/wiki/Open_Science_Infrastructure-7.md
+++ b/data/en.wikipedia.org/wiki/Open_Science_Infrastructure-7.md
@@ -4,7 +4,7 @@ chunk: 8/8
 source: "https://en.wikipedia.org/wiki/Open_Science_Infrastructure"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T06:32:25.468404+00:00"
+date_saved: "2026-05-05T10:15:46.620146+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Open_access-0.md b/data/en.wikipedia.org/wiki/Open_access-0.md
new file mode 100644
index 000000000..123597e0d
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Open_access-0.md
@@ -0,0 +1,49 @@
+---
+title: "Open access"
+chunk: 1/10
+source: "https://en.wikipedia.org/wiki/Open_access"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:14:39.240543+00:00"
+instance: "kb-cron"
+---
+
+Open access (OA) is a set of principles and a range of practices through which nominally copyrightable publications are delivered to readers free of access charges or other barriers. With open access strictly defined (according to the 2001 definition), or libre open access, barriers to copying or reuse are also reduced or removed by applying an open license for copyright, which regulates post-publication uses of the work.
+The main focus of the open access movement has been on peer reviewed research literature, and more specifically on academic journals. This is because:
+
+such publications have been a subject of serials crisis, unlike newspapers, magazines and fiction writing. The main difference between these two groups is in demand elasticity: whereas an English literature curriculum can substitute Harry Potter and the Philosopher's Stone with a public domain alternative, such as A Voyage to Lilliput, an emergency room physician treating a patient for a life-threatening urushiol poisoning cannot substitute the most recent, but paywalled review article on this topic with a 90-year-old copyright-expired article that was published before the invention of prednisone in 1954.
+the authors of research papers are not paid in any way, so they do not suffer any monetary losses, when they switch from behind paywall to open access publishing, especially, if they use diamond open access media.
+the cost of electronic publishing, which has been the main form of distribution of journal articles since c. 2000, is incommensurably smaller than the cost of on-paper publishing and distribution, which is still preferred by many readers of fiction.
+Whereas non-open access journals cover publishing costs through access tolls such as subscriptions, site licenses or pay-per-view charges, open-access journals are characterised by funding models which do not require the reader to pay to read the journal's contents, relying instead on author fees or on public funding, subsidies and sponsorships. Open access can be applied to all forms of published research output, including peer-reviewed and non peer-reviewed academic journal articles, conference papers, theses, book chapters, monographs, research reports and images.
+
+== Definitions ==
+There are different models of open access publishing and publishers may use one or more of these models.
+
+=== Colour naming system ===
+Different open access types are currently commonly described using a colour system. The most commonly recognised names are "green", "gold", and "hybrid" open access; however, several other models and alternative terms are also used.
+
+==== Gold OA ====
+In the gold OA model, the publisher makes all articles and related content available for free immediately on the journal's website. In such publications, articles are licensed for sharing and reuse via Creative Commons licenses or similar.
+Many gold OA publishers charge an article processing charge  (APC), which is typically paid through institutional or grant funding. The majority of gold open access journals charging APCs follow an "author-pays" model,
+although this is not an intrinsic property of gold OA.
+
+==== Green OA ====
+Self-archiving by authors is permitted under green OA. Independently from publication by a publisher, the author also posts the work to a website controlled by the author, the research institution that funded or hosted the work, or to an independent central open repository, where people can download the work without paying.
+Green OA is free of charge for the author. Some publishers (less than 5% and decreasing as of 2014) may charge a fee for an additional service such as a free license on the publisher-authored copyrightable portions of the printed version of an article.
+If the author posts the near-final version of their work after peer review by a journal, the archived version is called a "postprint". This can be the accepted manuscript as returned by the journal to the author after successful peer review.
+
+==== Hybrid OA ====
+Hybrid open-access journals contain a mixture of open access articles and closed access articles. A publisher following this model is partially funded by subscriptions, and only provide open access for those individual articles for which the authors (or research sponsor) pay a publication fee. Hybrid OA generally costs more than gold OA and can offer a lower quality of service. A particularly controversial practice in hybrid open access journals is "double dipping", where both authors and subscribers are charged. For these reasons, hybrid open access journals have been called a "Mephistophelian invention", and publishing in hybrid OA journals often do not qualify for funding under open access mandates, as libraries already pay for subscriptions thus have no financial incentive to fund open access articles in such journals.
+
+==== Bronze OA ====
+Bronze open access articles are free to read only on the publisher page, but lack a clearly identifiable license. Such articles are typically not available for reuse.
+
+==== Diamond/platinum OA ====
+
+Journals that publish open access without charging authors article processing charges are sometimes referred to as diamond or platinum OA. Since they do not charge either readers or authors directly, such publishers often require funding from external sources such as the sale of advertisements, academic institutions, learned societies, philanthropists or government grants. There are now over 350 platinum OA journals with impact factors over a wide variety of academic disciplines, giving most academics options for OA with no APCs. Diamond OA journals are available for most disciplines, and are usually small (<25 articles per year) and more likely to be multilingual (38%); thousands of such journals exist.
+
+==== Black OA ====
+
+The growth of unauthorized digital copying by large-scale copyright infringement has enabled free access to paywalled literature. This has been done via existing social media sites (e.g. the #ICanHazPDF hashtag) as well as dedicated sites (e.g. Sci-Hub). In some ways this is a large-scale technical implementation of pre-existing practice, whereby those with access to paywalled literature would share copies with their contacts. However, the increased ease and scale from 2010 onwards have changed how many people treat subscription publications.
+
+=== Gratis and libre ===
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Open_access-1.md b/data/en.wikipedia.org/wiki/Open_access-1.md
new file mode 100644
index 000000000..66b5a2062
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Open_access-1.md
@@ -0,0 +1,33 @@
+---
+title: "Open access"
+chunk: 2/10
+source: "https://en.wikipedia.org/wiki/Open_access"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:14:39.240543+00:00"
+instance: "kb-cron"
+---
+
+Similar to the free content definition, the terms 'gratis' and 'libre' were used in the Budapest Open Access Initiative definition to distinguish between free to read versus free to reuse.
+Gratis open access () refers to free online access, to read, free of charge, without re-use rights.
+Libre open access () also refers to free online access, to read, free of charge, plus some additional re-use rights, covering the kinds of open access defined in the Budapest Open Access Initiative, the Bethesda Statement on Open Access Publishing and the Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities. The re-use rights of libre OA are often specified by various specific Creative Commons licenses; all of which require as a minimum attribution of authorship to the original authors. In 2012, the number of works under libre open access was considered to have been rapidly increasing for a few years, though most open-access mandates did not enforce any copyright license and it was difficult to publish libre gold OA in legacy journals. However, there are no costs nor restrictions for green libre OA as preprints can be freely self-deposited with a free license, and most open-access repositories use Creative Commons licenses to allow reuse. The biggest drawback of many Open Access licenses is a prohibition on data mining.  For this reason, many big data studies of various technologies performed by economists ( as well as machine learning by computer scientists) are limited to patent analysis, since the patent documents are not subject to copyright at all.
+
+=== FAIR ===
+
+FAIR is an acronym for 'findable, accessible, interoperable and reusable', intended to more clearly define what is meant by the term 'open access' and make the concept easier to discuss. Initially proposed in March 2016, it has subsequently been endorsed by organisations such as the European Commission and the G20. Note, however, that FAIR principles include "A1.2: The protocol allows for an authentication and authorisation procedure where necessary." This means that a FAIR dataset may be either closed (restricted access) or open (no access restrictions). So, only FAIR data without access restrictions are open access.
+
+== Features ==
+The emergence of open science or open research has brought to light a number of controversial and hotly-debated topics.
+Scholarly publishing invokes various positions and passions. For example, authors may spend hours struggling with diverse article submission systems, often converting document formatting between a multitude of journal and conference styles, and sometimes spend months waiting for peer review results. The drawn-out and often contentious societal and technological transition to Open Access and Open Science/Open Research, particularly across North America and Europe (Latin America has already widely adopted "Acceso Abierto" since before 2000) has led to increasingly entrenched positions and much debate.
+The area of (open) scholarly practices increasingly sees a role for policy-makers and research funders giving focus to issues such as career incentives, research evaluation and business models for publicly funded research. Plan S and AmeliCA (Open Knowledge for Latin America) caused a wave of debate in scholarly communication in 2019 and 2020.
+
+=== Licenses ===
+
+Subscription-based publishing typically requires transfer of copyright from authors to the publisher so that the latter can monetise the process via dissemination and reproduction of the work. With OA publishing, typically authors retain copyright to their work, and license its reproduction to the publisher. Retention of copyright by authors can support academic freedoms by enabling greater control of the work (e.g. for image re-use) or licensing agreements (e.g. to allow dissemination by others).
+The most common licenses used in open access publishing are Creative Commons. The widely used CC BY license is one of the most permissive, only requiring attribution to be allowed to use the material (and allowing derivations and commercial use). A range of more restrictive Creative Commons licenses are also used. More rarely, some of the smaller academic journals use custom open access licenses. Some publishers (e.g. Elsevier) use "author nominal copyright" for OA articles, where the author retains copyright in name only and all rights are transferred to the publisher.
+
+=== Funding ===
+Since open access publication does not charge readers, there are many financial models used to cover costs by other means. Open access can be provided by commercial publishers, who may publish open access as well as subscription-based journals, or dedicated open-access publishers such as Public Library of Science (PLOS) and BioMed Central. Another source of funding for open access can be institutional subscribers. One example is the Subscribe to Open publishing model introduced by Annual Reviews; if the subscription revenue goal is met, the given journal's volume is published open access. The number of journals implementing this model grew from 192 in 2024 to 378 in 2025.
+Advantages and disadvantages of open access have generated considerable discussion amongst researchers, academics, librarians, university administrators, funding agencies, government officials, commercial publishers, editorial staff and society publishers. Reactions of existing publishers to open access journal publishing have ranged from moving with enthusiasm to a new open access business model, to experiments with providing as much free or open access as possible, to active lobbying against open access proposals. There are many publishers that started up as open access-only publishers, such as PLOS, Hindawi Publishing Corporation, Frontiers in... journals, MDPI and BioMed Central.
+
+==== Article processing charges ====
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Open_access-2.md b/data/en.wikipedia.org/wiki/Open_access-2.md
new file mode 100644
index 000000000..31f585270
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Open_access-2.md
@@ -0,0 +1,22 @@
+---
+title: "Open access"
+chunk: 3/10
+source: "https://en.wikipedia.org/wiki/Open_access"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:14:39.240543+00:00"
+instance: "kb-cron"
+---
+
+Some open access journals (under the gold, and hybrid models) generate revenue by charging publication fees in order to make the work openly available at the time of publication. The money might come from the author but more often comes from the author's research grant or employer. While the payments are typically incurred per article published (e.g. BMC or PLOS journals), some journals apply them per manuscript submitted (e.g. Atmospheric Chemistry and Physics until recently) or per author (e.g. PeerJ).
+Charges typically range from $1,000–$3,000 ($5,380 for Nature Communications) but can be under $10, close to $5,000 or well over $10,000. APCs vary greatly depending on subject and region and are most common in scientific and medical journals (43% and 47% respectively), and lowest in arts and humanities journals (0% and 4% respectively). APCs can also depend on a journal's impact factor. Some publishers (e.g. eLife and Ubiquity Press) have released estimates of their direct and indirect costs that set their APCs. Hybrid OA generally costs more than gold OA and can offer a lower quality of service. A particularly controversial practice in hybrid open access journals is "double dipping", where both authors and subscribers are charged.
+By comparison, journal subscriptions equate to $3,500–$4,000 per article published by an institution, but are highly variable by publisher (and some charge page fees separately). This has led to the assessment that there is enough money "within the system" to enable full transition to OA. However, there is ongoing discussion about whether the change-over offers an opportunity to become more cost-effective or promotes more equitable participation in publication. Concern has been noted that increasing subscription journal prices will be mirrored by rising APCs, creating a barrier to less financially privileged authors.
+The inherent bias of the current APC-based OA publishing perpetuates this inequality through the 'Matthew effect' (the rich get richer, and the poor get poorer). The switch from pay-to-read to pay-to-publish has left essentially the same people behind, with some academics not having enough purchasing power (individually or through their institutions) for either option. Some gold OA publishers will waive all or part of the fee for authors from less developed economies. Steps are normally taken to ensure that peer reviewers do not know whether authors have requested, or been granted, fee waivers, or to ensure that every paper is approved by an independent editor with no financial stake in the journal. The main argument against requiring authors to pay a fee, is the risk to the peer review system, diminishing the overall quality of scientific journal publishing.
+
+==== Subsidized or no-fee ====
+No-fee open access journals, also known as "platinum" or "diamond" do not charge either readers or authors. These journals use a variety of business models including subsidies, advertising, membership dues, endowments, or volunteer labour. Subsidising sources range from universities, libraries and museums to foundations, societies or government agencies. Some publishers may cross-subsidise from other publications or auxiliary services and products. For example, most APC-free journals in Latin America are funded by higher education institutions and are not conditional on institutional affiliation for publication. Conversely, Knowledge Unlatched crowdsources funding in order to make monographs available open access.
+Estimates of prevalence vary, but approximately 10,000 journals without APC are listed in DOAJ and the Free Journal Network. APC-free journals tend to be smaller and more local-regional in scope. Some also require submitting authors to have a particular institutional affiliation.
+
+=== Preprint use ===
+
+A "preprint" is typically a version of a research paper that is shared on an online platform prior to, or during, a formal peer review process. Preprint platforms have become popular due to the increasing drive towards open access publishing and can be publisher- or community-led. A range of discipline-specific or cross-domain platforms now exist. The posting of pre-prints (or authors' manuscript versions) is consistent with the Green Open Access model.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Open_access-3.md b/data/en.wikipedia.org/wiki/Open_access-3.md
new file mode 100644
index 000000000..5d7524114
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Open_access-3.md
@@ -0,0 +1,32 @@
+---
+title: "Open access"
+chunk: 4/10
+source: "https://en.wikipedia.org/wiki/Open_access"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:14:39.240543+00:00"
+instance: "kb-cron"
+---
+
+==== Effect of preprints on later publication ====
+A persistent concern surrounding preprints is that work may be at risk of being plagiarised or "scooped" – meaning that the same or similar research will be published by others without proper attribution to the original source – if publicly available but not yet associated with a stamp of approval from peer reviewers and traditional journals. These concerns are often amplified as competition increases for academic jobs and funding, and perceived to be particularly problematic for early-career researchers and other higher-risk demographics within academia.
+However, preprints, in fact, protect against scooping. Considering the differences between traditional peer-review based publishing models and deposition of an article on a preprint server, "scooping" is less likely for manuscripts first submitted as preprints. In a traditional publishing scenario, the time from manuscript submission to acceptance and to final publication can range from a few weeks to years, and go through several rounds of revision and resubmission before final publication. During this time, the same work will have been extensively discussed with external collaborators, presented at conferences, and been read by editors and reviewers in related areas of research. Yet, there is no official open record of that process (e.g., peer reviewers are normally anonymous, reports remain largely unpublished), and if an identical or very similar paper were to be published while the original was still under review, it would be impossible to establish provenance.
+Preprints provide a time-stamp at the time of publication, which helps to establish the "priority of discovery" for scientific claims. This means that a preprint can act as proof of provenance for research ideas, data, code, models, and results. The fact that the majority of preprints come with a form of permanent identifier, usually a digital object identifier (DOI), also makes them easy to cite and track. Thus, if one were to be "scooped" without adequate acknowledgement, this would be a case of academic misconduct and plagiarism, and could be pursued as such.
+There is no evidence that "scooping" of research via preprints exists, not even in communities that have broadly adopted the use of the arXiv server for sharing preprints since 1991. If the unlikely case of scooping emerges as the growth of the preprint system continues, it can be dealt with as academic malpractice. ASAPbio includes a series of hypothetical scooping scenarios as part of its preprint FAQ, finding that the overall benefits of using preprints vastly outweigh any potential issues around scooping. Indeed, the benefits of preprints, especially for early-career researchers, seem to outweigh any perceived risk: rapid sharing of academic research, open access without author-facing charges, establishing priority of discoveries, receiving wider feedback in parallel with or before peer review, and facilitating wider collaborations.
+
+=== Archiving ===
+The "green" route to OA refers to author self-archiving, in which a version of the article (often the peer-reviewed version before editorial typesetting, called "postprint") is posted online to an institutional or subject repository. This route is often dependent on journal or publisher policies, which can be more restrictive and complicated than respective "gold" policies regarding deposit location, license, and embargo requirements. Some publishers require an embargo period before deposition in public repositories, arguing that immediate self-archiving risks loss of subscription income.
+
+==== Embargo periods ====
+ Embargoes are imposed by between 20 and 40% of journals, during which time an article is paywalled before permitting self-archiving (green OA) or releasing a free-to-read version (bronze OA). Embargo periods typically vary from 6–12 months in STEM and >12 months in humanities, arts and social sciences. Embargo-free self-archiving has not been shown to affect subscription revenue, and tends to increase readership and citations. Embargoes have been lifted on particular topics for either limited times or ongoing (e.g. Zika outbreaks or indigenous health). Plan S includes zero-length embargoes on self-archiving as a key principle.
+
+== Motivations ==
+
+Open access (mostly green and gratis) began to be sought and provided worldwide by researchers when the possibility itself was opened by the advent of Internet and the World Wide Web. The momentum was further increased by a growing movement for academic journal publishing reform, and with it gold and libre OA.
+The premises behind open access publishing are that there are viable funding models to maintain traditional peer review standards of quality while also making the following changes:
+
+Rather than making journal articles accessible through a subscription business model, all academic publications could be made free to read and published with some other cost-recovery model, such as publication charges, subsidies, or charging subscriptions only for the print edition, with the online edition gratis or "free to read".
+Rather than applying traditional notions of copyright to academic publications, they could be libre or "free to build upon".
+An obvious advantage of open access journals is the free access to scientific papers regardless of affiliation with a subscribing library and improved access for the general public; this is especially true in developing countries. Lower costs for research in academia and industry have been claimed in the Budapest Open Access Initiative, although others have argued that OA may raise the total cost of publication, and further increase economic incentives for exploitation in academic publishing. The open access movement is motivated by the problems of social inequality caused by restricting access to academic research, which favor large and wealthy institutions with the financial means to purchase access to many journals, as well as the economic challenges and perceived unsustainability of academic publishing.
+
+=== Stakeholders and concerned communities ===
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Open_access-4.md b/data/en.wikipedia.org/wiki/Open_access-4.md
new file mode 100644
index 000000000..0b1612279
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Open_access-4.md
@@ -0,0 +1,33 @@
+---
+title: "Open access"
+chunk: 5/10
+source: "https://en.wikipedia.org/wiki/Open_access"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:14:39.240543+00:00"
+instance: "kb-cron"
+---
+
+The intended audience of research articles is usually other researchers. Open access helps researchers as readers by opening up access to articles that their libraries do not subscribe to. All researchers benefit from open access as no library can afford to subscribe to every scientific journal and most can only afford a small fraction of them. This is known as the "serials crisis".
+Open access extends the reach of research beyond its immediate academic circle, as an open access article can be read by anyone. A 2008 study revealed that mental health professionals are roughly twice as likely to read a relevant article if it is freely available.
+
+==== Research funders ====
+
+Research funding agencies and universities want to ensure that the research they fund and support in various ways has the greatest possible research impact. As a means of achieving this, research funders are beginning to expect open access to the research they support. Many of them (including all UK Research Councils) have already adopted open-access mandates, and others are on the way to do so (see ROARMAP).
+
+==== Universities ====
+A growing number of universities are providing institutional repositories in which their researchers can deposit their published articles. Some open access advocates believe that institutional repositories will play a very important role in responding to open-access mandates from funders.
+In May 2005, 16 major Dutch universities cooperatively launched DAREnet, the Digital Academic Repositories, making over 47,000 research papers available. From 2 June 2008, DAREnet has been incorporated into the scholarly portal NARCIS. By 2019, NARCIS provided access to 360,000 open access publications from all Dutch universities, KNAW, NWO and a number of scientific institutes.
+In 2011, a group of universities in North America formed the Coalition of Open Access Policy Institutions (COAPI). Starting with 21 institutions where the faculty had either established an open access policy or were in the process of implementing one, COAPI now has nearly 50 members. These institutions' administrators, faculty and librarians, and staff support the international work of the Coalition's awareness-raising and advocacy for open access.
+In 2012, the Harvard Open Access Project released its guide to good practices for university open-access policies, focusing on rights-retention policies that allow universities to distribute faculty research without seeking permission from publishers. As of November 2023, Rights retention policies are being adopted by an increasing number of UK universities as well.
+In 2013 a group of nine Australian universities formed the Australian Open Access Strategy Group (AOASG) to advocate, collaborate, raise awareness, and lead and build capacity in the open access space in Australia. In 2015, the group expanded to include all eight New Zealand universities and was renamed the Australasian Open Access Support Group. It was then renamed the Australasian Open Access Strategy Group, highlighting its emphasis on strategy. The awareness raising activities of the AOASG include presentations, workshops, blogs, and a webinar series on open access issues.
+
+==== Libraries and librarians ====
+As information professionals, librarians are often vocal and active advocates of open access. These librarians believe that open access promises to remove both the price and permission barriers that undermine library efforts to provide access to scholarship, as well as helping to address the serials crisis. Open access provides a complement to library access services such as interlibrary loan, supporting researchers' needs for immediate access to scholarship. Librarians and library associations also lead education and outreach initiatives to faculty, administrators, the library community, and the public about the benefits of open access.
+Many library associations have either signed major open access declarations or created their own. For example, IFLA have produced a Statement on Open Access. The Association of Research Libraries has documented the need for increased access to scholarly information, and was a leading founder of the Scholarly Publishing and Academic Resources Coalition (SPARC). Librarians and library associations also develop and share informational resources on scholarly publishing and open access to research; the Scholarly Communications Toolkit developed by the Association of College and Research Libraries of the American Library Association is one example of this work.
+At most universities, the library manages the institutional repository, which provides free access to scholarly work by the university's faculty. The Canadian Association of Research Libraries has a program to develop institutional repositories at all Canadian university libraries. An increasing number of libraries provide publishing or hosting services for open access journals, with the Library Publishing Coalition as a membership organisation.
+In 2013, open access activist Aaron Swartz was posthumously awarded the American Library Association's James Madison Award for being an "outspoken advocate for public participation in government and unrestricted access to peer-reviewed scholarly articles". In March 2013, the entire editorial board and the editor-in-chief of the Journal of Library Administration resigned en masse, citing a dispute with the journal's publisher.  One board member wrote of a "crisis of conscience about publishing in a journal that was not open access" after the death of Aaron Swartz.
+
+==== Public ====
+The public may benefit from open access to scholarly research for many reasons. Advocacy groups such as SPARC's Alliance for Taxpayer Access in the US argue that most scientific research is paid for by taxpayers through government grants, who have a right to access the results of what they have funded. Examples of people who might wish to read scholarly literature include individuals with medical conditions and their family members, serious hobbyists or "amateur" scholars (e.g. amateur astronomers), and high school and junior college students. Additionally, professionals in many fields, such as those doing research in private companies, start-ups, and hospitals, may not have access to publications behind paywalls, and OA publications are the only type that they can access in practice.
+Even those who do not read scholarly articles benefit indirectly from open access. For example, patients benefit when their doctor and other health care professionals have access to the latest research. Advocates argue that open access speeds research progress, productivity, and knowledge translation.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Open_access-5.md b/data/en.wikipedia.org/wiki/Open_access-5.md
new file mode 100644
index 000000000..88859e6a6
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Open_access-5.md
@@ -0,0 +1,32 @@
+---
+title: "Open access"
+chunk: 6/10
+source: "https://en.wikipedia.org/wiki/Open_access"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:14:39.240543+00:00"
+instance: "kb-cron"
+---
+
+==== Low-income countries ====
+In developing nations, open access archiving and publishing acquires a unique importance. Scientists, health care professionals, and institutions in developing nations often do not have the capital necessary to access scholarly literature.
+Many open access projects involve international collaboration. For example, the SciELO (Scientific Electronic Library Online), is a comprehensive approach to full open access journal publishing, involving a number of Latin American countries. Bioline International, a non-profit organization dedicated to helping publishers in developing countries is a collaboration of people in the UK, Canada, and Brazil; the Bioline International Software is used around the world. Research Papers in Economics (RePEc), is a collaborative effort of over 100 volunteers in 45 countries. The Public Knowledge Project in Canada developed the open-source publishing software Open Journal Systems (OJS), which is now in use around the world, for example by the African Journals Online group, and one of the most active development groups is Portuguese. This international perspective has resulted in advocacy for the development of open-source appropriate technology and the necessary open access to relevant information for sustainable development.
+
+== History ==
+
+=== Extent ===
+Various studies have investigated the extent of open access. A study published in 2010 showed that roughly 20% of the total number of peer-reviewed articles published in 2008 could be found openly accessible. Another study found that by 2010, 7.9% of all academic journals with impact factors were gold open access journals and showed a broad distribution of Gold Open Access journals throughout academic disciplines. A study of random journals from the citations indexes AHSCI, SCI and SSCI in 2013 came to the result that 88% of the journals were closed access and 12% were open access. In August 2013, a study done for the European Commission reported that 50% of a random sample of all articles published in 2011 as indexed by Scopus were freely accessible online by the end of 2012. A 2017 study by the Max Planck Society put the share of gold access articles in pure open access journals at around 13 percent of total research papers.
+In 2009, there were approximately 4,800 active open access journals, publishing around 190,000 articles. As of February 2019, over 12,500 open access journals are listed in the Directory of Open Access Journals.
+
+A 2013-2018 report (GOA4) found that in 2018 over 700,000 articles were published in gold open access in the world, of which 42% was in journals with no author-paid fees. The figure varies significantly depending on region and kind of publisher: 75% if university-run, over 80% in Latin America, but less than 25% in Western Europe. However, Crawford's study did not count open access articles published in "hybrid" journals (subscription journals that allow authors to make their individual articles open in return for payment of a fee). More comprehensive analyses of the scholarly literature suggest that this resulted in a significant underestimation of the prevalence of author-fee-funded OA publications in the literature. Crawford's study also found that although a minority of open access journals impose charges on authors, a growing majority of open access articles are published under this arrangement, particularly in the science disciplines (thanks to the enormous output of open access "mega journals", each of which may publish tens of thousands of articles in a year and are invariably funded by author-side charges—see Figure 10.1 in GOA4).
+According to Scopus database in August, 2024, 46.2% of works, indexed therein and published in 2023, had some form of open access. More than half of the OA publications (27.5% of all indexed works in 2023) were in fully Gold Open Access sources, 16.7% of all were in Green OA sources (i.e. which allow for self-archiving by authors), 9.2 % in Hybrid Gold OA sources (such as journals, which have open access and behind-paywall articles in the same issue), and 10.6 % were in Bronze OA sources (free-to-read on the publishers' websites).
+
+The adoption of Open Access publishing varies significantly from publisher to publisher, as shown in Fig. OA-Plot, where only the oldest (traditional) publishers are shown, but not the newer publishers, that use the Open Access model exclusively. This plot shows, that since 2010 the Institute of Physics has the largest percentage of OA publications, while the American Chemical Society has the lowest. Both the IOP and the ACS are non-profit publishers. The increase in OA percentage for articles published before ca. 1923 is related to the expiration of a 100-year copyright term. Some publishers (e.g. IOP and ACS made many such articles available as Open Access, while others (Elsevier in particular) did not.
+The Registry of Open Access Repositories (ROAR) indexes the creation, location and growth of open access open access-repositories and their contents. As of February 2019, over 4,500 institutional and cross-institutional repositories have been registered in ROAR.
+
+== Effects on scholarly publishing ==
+
+=== Article impact ===
+
+Since published articles report on research that is typically funded by government or university grants, the more the article is used, cited, applied and built upon, the better for research as well as for the researcher's career.
+Some professional organizations have encouraged use of open access: in 2001, the International Mathematical Union communicated to its members that "Open access to the mathematical literature is an important goal" and encouraged them to "[make] available electronically as much of our own work as feasible" to "[enlarge] the reservoir of freely available primary mathematical material, particularly helping scientists working without adequate library access".
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Open_access-6.md b/data/en.wikipedia.org/wiki/Open_access-6.md
new file mode 100644
index 000000000..e5e94f66b
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Open_access-6.md
@@ -0,0 +1,33 @@
+---
+title: "Open access"
+chunk: 7/10
+source: "https://en.wikipedia.org/wiki/Open_access"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:14:39.240543+00:00"
+instance: "kb-cron"
+---
+
+==== Readership ====
+OA articles are generally viewed online and downloaded more often than paywalled articles and that readership continues for longer. Readership is especially higher in demographics that typically lack access to subscription journals (in addition to the general population, this includes many medical practitioners, patient groups, policymakers, non-profit sector workers, industry researchers, and independent researchers). OA articles are more read on publication management programs such as Mendeley. Open access practices can reduce publication delays, an obstacle which led some research fields such as high-energy physics to adopt widespread preprint access.
+
+==== Citation rate ====
+
+A main reason authors make their articles openly accessible is to maximize their citation impact. Open access articles are typically cited more often than equivalent articles requiring subscriptions. This 'citation advantage' was first reported in 2001. Although two major studies dispute this claim, the consensus of multiple studies support the effect, with measured OA citation advantage varying in magnitude between 1.3-fold to 6-fold depending on discipline.
+Citation advantage is most pronounced in OA articles in hybrid journals (compared to the non-OA articles in those same journals), and with articles deposited in green OA repositories. Notably, green OA articles show similar benefits to citation counts as gold OA articles. Articles in gold OA journals are typically cited at a similar frequency to paywalled articles. Citation advantage increases the longer an article has been published. A 2024 article demonstrated that open-access (OA) publications receive more diverse citations than closed-access publications, in terms of institutions, countries, subregions, regions, and research fields. This citation diversity advantage is robust, as observed across 19 million publications and 420 million citations (2010–2019). Analysis of the extant literature indicates that open access through disciplinary or institutional repositories showed a stronger effect than open access via publisher platforms.
+
+==== Altmetrics ====
+In addition to format academic citation, other forms of research impact (altmetrics) may be affected by OA publishing, constituting a significant "amplifier" effect for science published on such platforms. Initial studies suggest that OA articles are more referenced in blogs, on Twitter, and on English Wikipedia. The OA advantage in altmetrics may be smaller than the advantage in academic citations, although findings are mixed.
+
+=== Journal impact factor ===
+
+Journal impact factor (JIF) measures the average number of citations of articles in a journal over a two-year window. It is commonly used as a proxy for journal quality, expected research impact for articles submitted to that journal, and of researcher success. In subscription journals, impact factor correlates with overall citation count, however this correlation is not observed in gold OA journals.
+Open access initiatives like Plan S typically call on a broader adoption and implementation of the Leiden Manifesto and the San Francisco Declaration on Research Assessment (DORA) alongside fundamental changes in the scholarly communication system.
+
+=== Peer review processes ===
+
+Peer review of research articles prior to publishing has been common since the 18th century. Commonly reviewer comments are only revealed to the authors and reviewer identities kept anonymous. The rise of OA publishing has also given rise to experimentation in technologies and processes for peer review. Increasing transparency of peer review and quality control includes posting results to preprint servers, preregistration of studies, open publishing of peer reviews, open publishing of full datasets and analysis code, and other open science practices. It is proposed that increased transparency of academic quality control processes makes audit of the academic record easier. Additionally, the rise of OA megajournals has made it viable for their peer review to focus solely on methodology and results interpretation whilst ignoring novelty. Major criticisms of the influence of OA on peer review have included that if OA journals have incentives to publish as many articles as possible then peer review standards may fall (as aspect of predatory publishing), increased use of preprints may populate the academic corpus with un-reviewed junk and propaganda, and that reviewers may self-censor if their identity is open. Some advocates propose that readers will have increased skepticism of preprint studies - a traditional hallmark of scientific inquiry.
+
+=== Predatory publishing ===
+Predatory publishers present themselves as academic journals but use lax or no peer review processes coupled with aggressive advertising in order to generate revenue from article processing charges from authors. The definitions of 'predatory', 'deceptive', or 'questionable' publishers/journals are often vague, opaque, and confusing, and can also include fully legitimate journals, such as those indexed by PubMed Central. In this sense, Grudniewicz et al. proposed a consensus definition that needs to be shared: "Predatory journals and publishers are entities that prioritize self-interest at the expense of scholarship and are characterized by false or misleading information, deviation from best editorial and publication practices, a lack of transparency, and/or the use of aggressive and indiscriminate solicitation practices."
+In this way, predatory journals exploit the OA model by deceptively removing the main value added by the journal (peer review) and parasitize the OA movement, occasionally hijacking or impersonating other journals. The rise of such journals since 2010 has damaged the reputation of the OA publishing model as a whole, especially via sting operations where fake papers have been successfully published in such journals. Although commonly associated with OA publishing models, subscription journals are also at risk of similar lax quality control standards and poor editorial policies. OA publishers therefore aim to ensure quality via auditing by registries such as DOAJ, OASPA and SciELO and comply to a standardised set of conditions. A blacklist of predatory publishers is also maintained by Cabell's blacklist (a successor to Beall's List). Increased transparency of the peer review and publication process has been proposed as a way to combat predatory journal practices.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Open_access-7.md b/data/en.wikipedia.org/wiki/Open_access-7.md
new file mode 100644
index 000000000..16049fabf
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Open_access-7.md
@@ -0,0 +1,33 @@
+---
+title: "Open access"
+chunk: 8/10
+source: "https://en.wikipedia.org/wiki/Open_access"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:14:39.240543+00:00"
+instance: "kb-cron"
+---
+
+=== Open irony ===
+Open irony refers to the situation where a scholarly journal article advocates open access but the article itself is only accessible by paying a fee to the journal publisher to read the article. This has been noted in many fields, with more than 20 examples appearing since around 2010, including in widely-read journals such as The Lancet, Science and Nature. In 2012 Duncan Hull proposed the Open Access Irony award to publicly humiliate journals that publish these kinds of papers. Examples of these have been shared and discussed on social media using the hashtag #openirony. Typically, these discussions are humorous exposures of articles/editorials that are pro-open access, but locked behind paywalls. The main concern that motivates these discussions is that restricted access to public scientific knowledge is slowing scientific progress. The practice has been justified as important for raising awareness of open access.
+
+== Infrastructure ==
+
+=== Databases and repositories ===
+Multiple databases exist for open access articles, journals and datasets. These databases overlap, however each has different inclusion criteria, which typically include extensive vetting for journal publication practices, editorial boards and ethics statements. The main databases of open access articles and journals are DOAJ and PMC. In the case of DOAJ, only fully gold open access journals are included, whereas PMC also hosts articles from hybrid journals.
+There are also a number of preprint servers which host articles that have not yet been reviewed as open access copies. These articles are subsequently submitted for peer review by both open access and subscription journals, however the preprint always remains openly accessible. A list of preprint servers is maintained at ResearchPreprints.
+For articles that are published in closed access journals, some authors will deposit a postprint copy in an open-access repository, where it can be accessed for free. Most subscription journals place restrictions on which version of the work may be shared or require an embargo period following the original date of publication. What is deposited can therefore vary, either a preprint or the peer-reviewed postprint, either the author's refereed and revised final draft or the publisher's version of record, either immediately deposited or after several years. Repositories may be specific to an institution, a discipline (e.g.arXiv), a scholarly society (e.g. MLA's CORE Repository), or a funder (e.g. PMC). Although the practice was first formally proposed in 1994, self-archiving was already being practiced by some computer scientists in local FTP archives in the 1980s (later harvested by CiteSeer). The SHERPA/RoMEO site maintains a list of the different publisher copyright and self-archiving policies and the ROAR database hosts an index of the repositories themselves.
+
+==== Representativeness in proprietary databases ====
+Uneven coverage of journals in the major commercial citation index databases (such as Web of Science, Scopus, and PubMed) has strong effects on evaluating both researchers and institutions (e.g. the UK Research Excellence Framework or Times Higher Education ranking). While these databases primarily select based on process and content quality, there has been concern that their commercial nature may skew their assessment criteria and representation of journals outside of Europe and North America. At the time of that study in 2018, there were no  comprehensive, open source or non-commercial academic databases. However, in more recent years, The Lens emerged as a suitable outside-paywalls universal academic database.
+
+=== Distribution ===
+
+Like the self-archived green open access articles, most gold open access journal articles are distributed via the World Wide Web, due to low distribution costs, increasing reach, speed, and increasing importance for scholarly communication. Open source software is sometimes used for open-access repositories, open access journal websites, and other aspects of open access provision and open access publishing.
+Access to online content requires Internet access, and this distributional consideration presents physical and sometimes financial barriers to access.
+There are various open access aggregators that list open access journals or articles. ROAD (the Directory of Open Access Scholarly Resources) synthesizes information about open access journals and is a subset of the ISSN register. SHERPA/RoMEO lists international publishers that allow the published version of articles to be deposited in institutional repositories. The Directory of Open Access Journals (DOAJ) contains over 12,500 peer-reviewed open access journals for searching and browsing.
+Open access articles can be found with a web search, using any general search engine or those specialized for the scholarly and scientific literature, such as Google Scholar, OAIster, base-search.net, and CORE Many open-access repositories offer a programmable interface to query their content. Some of them use a generic protocol, such as OAI-PMH (e.g., base-search.net). In addition, some repositories propose a specific API, such as the arXiv API, the Dissemin API, the Unpaywall/oadoi API, or the base-search API.
+In 1998, several universities founded the Public Knowledge Project to foster open access, and developed the open-source journal publishing system Open Journal Systems, among other scholarly software projects. As of 2010, it was being used by approximately 5,000 journals worldwide.
+Several initiatives provide an alternative to the English language dominance of existing publication indexing systems, including Index Copernicus (Polish), SciELO (Portuguese, Spanish) and Redalyc (Spanish).
+
+== Policies and mandates ==
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Open_access-8.md b/data/en.wikipedia.org/wiki/Open_access-8.md
new file mode 100644
index 000000000..6a58efde8
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Open_access-8.md
@@ -0,0 +1,40 @@
+---
+title: "Open access"
+chunk: 9/10
+source: "https://en.wikipedia.org/wiki/Open_access"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:14:39.240543+00:00"
+instance: "kb-cron"
+---
+
+Many universities, research institutions and research funders have adopted mandates requiring their researchers to make their research publications open access. For example, Research Councils UK spent nearly £60m on supporting their open access mandate between 2013 and 2016. New mandates are often announced during the Open Access Week, that takes place each year during the last full week of October.
+The idea of mandating self-archiving was raised at least as early as 1998. Since 2003 efforts have been focused on open access mandating by the funders of research: governments, research funding agencies, and universities. Some publishers and publisher associations have lobbied against introducing mandates.
+In 2002, the University of Southampton's School of Electronics & Computer Science became one of the first schools to implement a meaningful mandatory open access policy, in which authors had to contribute copies of their articles to the school's repository. More institutions followed suit in the following years. In 2007, Ukraine became the first country to create a national policy on open access, followed by Spain in 2009. Argentina, Brazil, and Poland are currently in the process of developing open access policies. Making master's and doctoral theses open access is an increasingly popular mandate by many educational institutions.
+In the US, the NIH Public Access Policy has required since 2008 that papers describing research funded by the National Institutes of Health must be available to the public free through PubMed Central (PMC) within 12 months of publication. In 2022, US President Joe Biden's Office of Science and Technology Policy issued a memorandum calling for the removal of the 12-month embargo. By the end of 2025, US federal agencies must require all results (papers, documents and data) produced as a result of US government-funded research to be available to the public immediately upon publication.
+In 2023, the Council of the European Union recommended the implementation of an open-access and not-for-profit model for research publishing by the European Commission and member states. These recommendations are not legally binding and received mixed reactions. While welcomed by some members of the academic community, publishers argued that the suggested model is unrealistic due to the lack of crucial funding details. Furthermore, the council's recommendations raised concerns within the publishing industry regarding the potential implications, and they also emphasized the importance of research integrity and the need for member states to address predatory journals and paper mills.
+In 2024, the Gates Foundation announced a "preprint-centric" open access policy, and their intention to stop paying APCs. In 2024, the government of Japan also announced a Green open access policy, requiring that government-funded research be made freely available on institutional preprint repositories from April 2025.
+
+=== Compliance ===
+As of March 2021, open-access mandates have been registered by over 100 research funders and 800 universities worldwide, compiled in the Registry of Open Access Repository Mandates and Policies. As these sorts of mandates increase in prevalence, collaborating researchers may be affected by several at once. Tools such as SWORD can help authors manage sharing between repositories.
+Compliance rates with voluntary open access policies remain low (as low as 5%). However it has been demonstrated that more successful outcomes are achieved by policies that are compulsory and more specific, such as specifying maximum permissible embargo times. Compliance with compulsory open-access mandates varies between funders from 27% to 91% (averaging 67%). From March 2021, Google Scholar started tracking and indicating compliance with funders' open-access mandates, although it only checks whether items are free-to-read, rather than openly licensed.
+
+== Inequality and open access ==
+
+=== Gender inequality ===
+Gender inequality favoring men can be found in many disciplines, including political science, economics and neurology, and critical care research. For instance, in critical care research, 30.8% of the 18,483 research articles published between 2008 and 2018 were led by female authors and were more likely to be published in lower-impact journals than those led by male authors. Open access publishing may improve the visibility of female researchers both inside and outside academia, but without deliberate support of female researchers, open access publishing may exacerbate gender inequality.
+
+=== High-income–low-income country inequality ===
+A 2022 study has found "most OA articles were written by authors in high-income countries, and there were no articles in Mirror journals by authors in low-income countries." "One of the great ironies of open access is that you grant authors around the world the ability to finally read the scientific literature that was completely closed off to them, but it ends up excluding them from publishing in the same journals" says Emilio Bruna, a scholar at the University of Florida in Gainesville.
+
+== By country ==
+
+== See also ==
+
+== Notes ==
+
+== References ==
+
+=== Sources ===
+ This article incorporates text by Tennant JP, Crane H, Crick T, Davila J, Enkhbayar A, Havemann J, Kramer B, Martin R, Masuzzo P, Nobes A, Rice C, Rivera-López BS, Ross-Hellauer T, Sattler S, Thacker P, Vanholsbeeck M. available under the CC BY 4.0 license.
+ This article incorporates text from a free content work. Licensed under CC-BY-SA. Text taken from Policy guidelines for the development and promotion of open access​, 45-48, Swan, Alma, UNESCO. UNESCO.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Open_access-9.md b/data/en.wikipedia.org/wiki/Open_access-9.md
new file mode 100644
index 000000000..3c866ffc5
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Open_access-9.md
@@ -0,0 +1,31 @@
+---
+title: "Open access"
+chunk: 10/10
+source: "https://en.wikipedia.org/wiki/Open_access"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:14:39.240543+00:00"
+instance: "kb-cron"
+---
+
+== Further reading ==
+Darnton, Robert, "The Dream of a Universal Library" (review of Peter Baldwin, Athena Unbound: Why and How Scholarly Knowledge Should Be Free for All, MIT Press, 2023, 405 pp.), The New York Review of Books, vol. LXX, no. 20 (21 December 2023), pp. 73–74. Reviewer Darnton writes: "Baldwin warns: journal publishers are gouging their customers, scholarly monographs reach a tiny audience, libraries are floundering under budget pressures, academics are pursuing careers rather than truth, and readers are not getting all the information they deserve." (p. 74.) Writes Darnton: "Most scientific research is subsidized by the federal government." Under a 2022 White House directive, "As of December 31, 2025, all agencies... must require immediate open access... The G7 leaders took a similar stand on May 14, 2023, as did the European Council on May 23. The tide is turning in favor of unrestricted access, but the countervailing forces are so complex that the future remains cloudy." (p. 73.)
+Suber, Peter (2012). Open access (The MIT Press Essential Knowledge Series ed.). Cambridge, Mass.: MIT Press. ISBN 978-0-262-51763-8. Retrieved 20 October 2015.
+Kirsop, Barbara, and Leslie Chan. (2005) Transforming access to research literature for developing countries. Serials Reviews, 31(4): 246–255.
+Laakso, Mikael; Welling, Patrik; Bukvova, Helena; Nyman, Linus; Björk, Bo-Christer; Hedlund, Turid (2011). "The Development of Open Access Journal Publishing from 1993 to 2009". PLOS ONE. 6 (6) e20961. Bibcode:2011PLoSO...620961L. doi:10.1371/journal.pone.0020961. PMC 3113847. PMID 21695139.
+Hajjem, C.; Harnad, S; Gingras, Y. (2005). "Ten-Year Cross-Disciplinary Comparison of the Growth of Open Access and How It Increases Research Citation Impact". IEEE Data Engineering Bulletin. 28 (4): 39–47. arXiv:cs/0606079. Bibcode:2006cs........6079H.
+Tötösy; de Zepetnek, S.; Joshua, Jia (2014). "Electronic Journals, Prestige, and the Economics of Academic Journal Publishing". CLCWeb: Comparative Literature and Culture. 16 (1): 2014. doi:10.7771/1481-4374.2426.
+"Open and Shut?" Blog on open access by Richard Poynder, a freelance journalist, who has done a series of interviews with a few of the leaders of the open access movement.
+Mietchen, Daniel (15 January 2014). "Wikimedia and Open Access — a rich history of interactions". Wikimedia Blog. Wikimedia Foundation. Retrieved 10 January 2015.
+Okerson, Ann; O'Donnell, James (Eds.) (June 1995). Scholarly Journals at the Crossroads: A Subversive Proposal for Electronic Publishing. Washington, DC: Association of Research Libraries. ISBN 978-0-918006-26-4..
+Willinsky, John (2006). The Access Principle: The Case for Open Access to Research and Scholarship (PDF). Cambridge, MA: MIT Press. ISBN 978-0-262-51266-4. Archived from the original (PDF) on 5 November 2013.
+"Accessibility, sustainability, excellence: how to expand access to research publications" (PDF). United Kingdom: Working Group on Expanding Access to Published Research Findings. 2012. Archived from the original (PDF) on 19 June 2012. Retrieved 15 July 2012.
+In Oldenburg's Long Shadow: Librarians, Research Scientists, Publishers, and the Control of Scientific Publishing
+Glyn Moody (17 June 2016). "Open access: All human knowledge is there—so why can't everybody access it?". Ars Technica. Retrieved 20 June 2016.
+
+== External links ==
+
+OAD: Open Access Directory, an "open-access, wiki-based, community-updated encyclopedia of OA factual lists" (started by Peter Suber and Robin Peek). OCLC 757073363. Published by Simmons School of Library and Information Science in US.
+OASPA: Open Access Scholarly Publishing Association, a community of organisations engaged in open scholarship with a mission to encourage and enable open access as the predominant model of communication for scholarly outputs
+OATP: Open Access Tracking Project, a crowd-sourced tagging project providing real-time alerts about new OA developments and organizing knowledge of the field (started by Peter Suber). OCLC 1040261573
+GOAP: UNESCO's Global Open Access Portal, providing "status of open access to scientific information around the world"
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Open_access_citation_advantage-0.md b/data/en.wikipedia.org/wiki/Open_access_citation_advantage-0.md
new file mode 100644
index 000000000..f0ad221e1
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Open_access_citation_advantage-0.md
@@ -0,0 +1,27 @@
+---
+title: "Open access citation advantage"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Open_access_citation_advantage"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:15:36.958647+00:00"
+instance: "kb-cron"
+---
+
+Open access citation advantage (OACA) is a type of bias whereby scholars tend to cite academic journals with open access (OA)—that is, journals that make their full text available on the Internet without charge and not behind a paywall—in preference to toll-access publications. The concept was first introduced under the name FUTON bias ("full text on the net") by UK medical researcher Reinhard Wentz in a letter to The Lancet in 2002.
+Scholars in some fields can more easily discover and access articles whose full text is available online, which increases authors' likelihood of reading and citing these articles, an issue that was first raised and has been mainly studied in connection with medical research. In the context of evidence-based medicine, articles in expensive journals that do not provide open access may be "priced out of evidence", giving a greater weight to open access publications. Open access citation advantage may increase the impact factor of open access journals relative to journals without open access.
+One study concluded that authors in medical fields "concentrate on research published in journals that are available as full text on the internet, and ignore relevant studies that are not available in full text, thus introducing an element of bias into their search result". Authors of another study conclude that "the OA advantage is a quality advantage, rather than a quality bias", that authors make a "self-selection toward using and citing the more citable articles—once OA self-archiving has made them accessible", and that open access "itself will not make an unusable (hence uncitable) paper more used and cited".
+A similar phenomenon, termed the "no abstract available bias" or NAA bias, is a scholar's tendency to cite journal articles that have an abstract available online more readily than articles that do not—this affects articles' citation count similarly to open access citation advantage.
+
+
+== See also ==
+Digital divide
+#IcanHazPDF
+Sci-Hub
+
+
+== References ==
+
+
+== Further reading ==
+Goldsmith, K. (September 27, 2005). "If It Doesn't Exist on the Internet, It Doesn't Exist". Elective Affinities Conference. State University of New York, Buffalo. Archived from the original on February 24, 2019. Retrieved May 15, 2021.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Open_data-0.md b/data/en.wikipedia.org/wiki/Open_data-0.md
new file mode 100644
index 000000000..96d1e9b05
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Open_data-0.md
@@ -0,0 +1,40 @@
+---
+title: "Open data"
+chunk: 1/5
+source: "https://en.wikipedia.org/wiki/Open_data"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:16:21.318197+00:00"
+instance: "kb-cron"
+---
+
+Open data are data that are openly accessible, exploitable, editable and shareable by anyone for any purpose. Open data are generally licensed under an open license.
+The goals of the open data movement are similar to those of other "open(-source)" movements such as open-source software, open-source hardware, open content, open specifications, open education, open educational resources, open government, open knowledge, open access, open science, and the open web. The growth of the open data movement is paralleled by a rise in intellectual property rights. The philosophy behind open data has been long established (for example in the Mertonian tradition of science), but the term "open data" itself is recent, gaining popularity with the rise of the Internet and World Wide Web and, especially, with the launch of open-data government initiatives Data.gov, Data.gov.uk and Data.gov.in.
+Open data can be linked data—referred to as linked open data.
+One of the most important forms of open data is open government data (OGD), which is a form of open data created by ruling government institutions. The importance of open government data is born from it being a part of citizens' everyday lives, down to the most routine and mundane tasks that are seemingly far removed from government.
+The abbreviation FAIR/O data is sometimes used to indicate that the dataset or database in question complies with the principles of FAIR data and carries an explicit data‑capable open license.
+
+== Overview ==
+The concept of open data is not new, but a formalized definition is relatively new. Open data as a phenomenon denotes that governmental data should be available to anyone with a possibility of redistribution in any form without any copyright restriction. One more definition is the Open Definition which can be summarized as "a piece of data is open if anyone is free to use, reuse, and redistribute it—subject only, at most, to the requirement to attribute and/or share-alike." Other definitions, including the Open Data Institute's "open data is data that anyone can access, use or share," have an accessible short version of the definition but refer to the formal definition. Open data may include non-textual material such as maps, genomes, connectomes, chemical compounds, mathematical and scientific formulae, medical data, and practice, bioscience and biodiversity data.
+A major barrier to the open data movement is the commercial value of data. Access to, or re-use of, data is often controlled by public or private organizations. Control may be through access restrictions, licenses, copyright, patents and charges for access or re-use. Advocates of open data argue that these restrictions detract from the common good and that data should be available without restrictions or fees. There are many other, smaller barriers as well.
+Creators of data do not consider the need to state the conditions of ownership, licensing and re-use; instead presuming that not asserting copyright enters the data into the public domain. For example, many scientists do not consider the data published with their work to be theirs to control and consider the act of publication in a journal to be an implicit release of data into the commons. The lack of a license makes it difficult to determine the status of a data set and may restrict the use of data offered in an "Open" spirit. Because of this uncertainty it is possible for public or private organizations to aggregate said data, claim that it is protected by copyright, and then resell it.
+
+== Major sources ==
+
+Open data can come from any source.  This section lists some of the fields that publish (or at least discuss publishing) a large amount of open data.
+
+=== In science ===
+
+The concept of open access to scientific data was established with the formation of the World Data Center system, in preparation for the International Geophysical Year of 1957–1958. The International Council of Scientific Unions (now the International Council for Science) oversees several World Data Centres with the mission to minimize the risk of data loss and to maximize data accessibility.
+While the open-science-data movement long predates the Internet, the availability of fast, readily available networking has significantly changed the context of open science data, as publishing or obtaining data has become much less expensive and time-consuming.
+The Human Genome Project was a major initiative that exemplified the power of open data. It was built upon the so-called Bermuda Principles, stipulating that: "All human genomic sequence information … should be freely available and in the public domain in order to encourage research and development and to maximize its benefit to society". More recent initiatives such as the Structural Genomics Consortium have illustrated that the open data approach can be used productively within the context of industrial R&D.
+In 2004, the Science Ministers of all nations of the Organisation for Economic Co-operation and Development (OECD), which includes most developed countries of the world, signed a declaration which states that all publicly funded archive data should be made publicly available. Following a request and an intense discussion with data-producing institutions in member states, the OECD published in 2007 the OECD Principles and Guidelines for Access to Research Data from Public Funding as a soft-law recommendation.
+Examples of open data in science:
+
+https://www.earth-system-science-data.net/ -- Journal of publications describing and linking to open scientific datasets related to Earth system sciences. Review of the dataset itself is an integral component of peer review. Launched in 2008
+data.uni-muenster.de – Open data about scientific artifacts from the University of Muenster, Germany. Launched in 2011.
+Dataverse Network Project – archival repository software promoting data sharing, persistent data citation, and reproducible research.
+linkedscience.org/data – Open scientific datasets encoded as Linked Data. Launched in 2011, ended 2018.
+systemanaturae.org – Open scientific datasets related to wildlife classified by animal species. Launched in 2015.
+
+=== In government ===
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Open_data-1.md b/data/en.wikipedia.org/wiki/Open_data-1.md
new file mode 100644
index 000000000..123cd1a68
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Open_data-1.md
@@ -0,0 +1,42 @@
+---
+title: "Open data"
+chunk: 2/5
+source: "https://en.wikipedia.org/wiki/Open_data"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:16:21.318197+00:00"
+instance: "kb-cron"
+---
+
+There are a range of different arguments for government open data. Some advocates say that making government information available to the public as machine readable open data can facilitate government transparency, accountability and public participation. "Open data can be a powerful force for public accountability—it can make existing information easier to analyze, process, and combine than ever before, allowing a new level of public scrutiny." Governments that enable public viewing of data can help citizens engage within the governmental sectors and "add value to that data." Open data experts have nuanced the impact that opening government data may have on government transparency and accountability. In a widely cited paper, scholars David Robinson and Harlan Yu contend that governments may project a veneer of transparency by publishing machine-readable data that does not actually make government more transparent or accountable. Drawing from earlier studies on transparency and anticorruption, World Bank political scientist Tiago C. Peixoto extended Yu and Robinson's argument by highlighting a minimal chain of events necessary for open data to lead to accountability:
+
+relevant data is disclosed;
+the data is widely disseminated and understood by the public;
+the public reacts to the content of the data; and
+public officials either respond to the public's reaction or are sanctioned by the public through institutional means.
+Some make the case that opening up official information can support technological innovation and economic growth by enabling third parties to develop new kinds of digital applications and services.
+Several national governments have created websites to distribute a portion of the data they collect. It is a concept for a collaborative project in the municipal Government to create and organize culture for Open Data or Open government data.
+Additionally, other levels of government have established open data websites. There are many government entities pursuing Open Data in Canada. Data.gov lists the sites of a total of 40 US states and 46 US cities and counties with websites to provide open data, e.g., the state of Maryland, the state of California, US and New York City.
+At the international level, the United Nations has an open data website that publishes statistical data from member states and UN agencies, and the World Bank published a range of statistical data relating to developing countries. The European Commission has created two portals for the European Union: the EU Open Data Portal which gives access to open data from the EU institutions, agencies and other bodies and the European Data Portal that provides datasets from local, regional and national public bodies across Europe. The two portals were consolidated to data.europa.eu on April 21, 2021.
+Italy is the first country to release standard processes and guidelines under a Creative Commons license for spread usage in the Public Administration. The open model is called the Open Data Management Cycle and was adopted in several regions such as Veneto and Umbria. Main cities like Reggio Calabria and Genova have also adopted this model.
+In October 2015, the Open Government Partnership launched the International Open Data Charter, a set of principles and best practices for the release of governmental open data formally adopted by seventeen governments of countries, states and cities during the OGP Global Summit in Mexico.
+In July 2024, the OECD adopted Creative Commons CC-BY-4.0 licensing for its published data and reports.
+
+=== In non-profit organizations ===
+Many non-profit organizations offer open access to their data, as long it does not undermine their users', members' or third party's privacy rights. In comparison to for-profit corporations, they do not seek to monetize their data. OpenNWT launched a website offering open data of elections. CIAT offers open data to anybody who is willing to conduct big data analytics in order to enhance the benefit of international agricultural research. DBLP, which is owned by a non-profit organization Dagstuhl, offers its database of scientific publications from computer science as open data.
+Hospitality exchange services, including Bewelcome, Warm Showers, and CouchSurfing (before it became for-profit) have offered scientists access to their anonymized data for analysis, public research, and publication.
+
+== Publication of open data ==
+
+=== Open repositories ===
+
+arXiv – open-access repository of electronic preprints and postprints (known as e-prints)
+Zenodo – open repository developed under the European OpenAIRE program and operated by CERN
+Figshare – open data and software hosting
+HAL (open archive) – open archive where authors can deposit scholarly documents from all academic fields
+Dryad (repository) – data and software related to science papers
+Open Science Framework – project management and sharing platform
+
+== Policies and strategies ==
+At a small level, a business or research organization's policies and strategies towards open data will vary, sometimes greatly. One common strategy employed is the use of a data commons. A data commons is an interoperable software and hardware platform that aggregates (or collocates) data, data infrastructure, and data-producing and data-managing applications in order to better allow a community of users to manage, analyze, and share their data with others over both short- and long-term timelines. Ideally, this interoperable cyberinfrastructure should be robust enough "to facilitate transitions between stages in the life cycle of a collection" of data and information resources while still being driven by common data models and workspace tools enabling and supporting robust data analysis. The policies and strategies underlying a data commons will ideally involve numerous stakeholders, including the data commons service provider, data contributors, and data users.
+Grossman et al suggests six major considerations for a data commons strategy that better enables open data in businesses and research organizations. Such a strategy should address the need for:
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Open_data-2.md b/data/en.wikipedia.org/wiki/Open_data-2.md
new file mode 100644
index 000000000..6efc4dd3c
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Open_data-2.md
@@ -0,0 +1,54 @@
+---
+title: "Open data"
+chunk: 3/5
+source: "https://en.wikipedia.org/wiki/Open_data"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:16:21.318197+00:00"
+instance: "kb-cron"
+---
+
+permanent, persistent digital IDs, which enable access controls for datasets;
+permanent, discoverable metadata associated with each digital ID;
+application programming interface (API)-based access, tied to an authentication and authorization service;
+data portability;
+data "peering," without access, egress, and ingress charges; and
+a rationed approach to users computing data over the data commons.
+Beyond individual businesses and research centers, and at a more macro level, countries like Germany have launched their own official nationwide open data strategies, detailing how data management systems and data commons should be developed, used, and maintained for the greater public good.
+
+== Arguments for and against ==
+
+Opening government data is only a waypoint on the road to improving education, improving government, and building tools to solve other real-world problems. While many arguments have been made categorically, the following discussion of arguments for and against open data highlights that these arguments often depend highly on the type of data and its potential uses.
+Arguments made on behalf of open data include the following:
+
+"Data belongs to the human race". Typical examples are genomes, data on organisms, medical science, environmental data following the Aarhus Convention.
+Public money was used to fund the work, and so it should be universally available.
+It was created by or at a government institution (this is common in US National Laboratories and government agencies).
+Facts cannot legally be copyrighted.
+Sponsors of research do not get full value unless the resulting data are freely available.
+Restrictions on data re-use create an anticommons.
+Data are required for the smooth process of running communal human activities and are an important enabler of socio-economic development (health care, education, economic productivity, etc.).
+In scientific research, the rate of discovery is accelerated by better access to data.
+Making data open helps combat "data rot" and ensure that scientific research data are preserved over time.
+Statistical literacy benefits from open data. Instructors can use locally relevant data sets to teach statistical concepts to their students.
+Allowing open data in the scientific community is essential for increasing the rate of discoveries and recognizing significant patterns.
+It is generally held that factual data cannot be copyrighted. Publishers frequently add copyright statements (often forbidding re-use) to scientific data accompanying publications. It may be unclear whether the factual data embedded in full text are part of the copyright.
+While the human abstraction of facts from paper publications is normally accepted as legal there is often an implied restriction on the machine extraction by robots.
+Unlike open access, where groups of publishers have stated their concerns, open data is normally challenged by individual institutions. Their arguments have been discussed less in public discourse and there are fewer quotes to rely on at this time.
+Arguments against making all data available as open data include the following:
+
+Government funding may not be used to duplicate or challenge the activities of the private sector (e.g. PubChem).
+Governments have to be accountable for the efficient use of taxpayer's money: If public funds are used to aggregate the data and if the data will bring commercial (private) benefits to only a small number of users, the users should reimburse governments for the cost of providing the data.
+Open data may lead to exploitation of, and rapid publication of results based on, data pertaining to developing countries by rich and well-equipped research institutes, without any further involvement and/or benefit to local communities (helicopter research); similarly, to the historical open access to tropical forests that has led to the misappropriation ("Global Pillage") of plant genetic resources from developing countries.
+The revenue earned by publishing data can be used to cover the costs of generating and/or disseminating the data, so that the dissemination can continue indefinitely.
+The revenue earned by publishing data permits non-profit organizations to fund other activities (e.g. learned society publishing supports the society).
+The government gives specific legitimacy for certain organizations to recover costs (NIST in US, Ordnance Survey in UK).
+Privacy concerns may require that access to data is limited to specific users or to sub-sets of the data.
+Collecting, 'cleaning', managing and disseminating data are typically labour- and/or cost-intensive processes – whoever provides these services should receive fair remuneration for providing those services.
+Sponsors do not get full value unless their data is used appropriately – sometimes this requires quality management, dissemination and branding efforts that can best be achieved by charging fees to users.
+Often, targeted end-users cannot use the data without additional processing (analysis, apps etc.) – if anyone has access to the data, none may have an incentive to invest in the processing required to make data useful (typical examples include biological, medical, and environmental data).
+There is no control to the secondary use (aggregation) of open data.
+The paper entitled "Optimization of Soft Mobility Localization with Sustainable Policies and Open Data"  argues that open data is a valuable tool for improving the sustainability and equity of soft mobility in cities. The author argues that open data can be used to identify the needs of different areas of a city, develop algorithms that are fair and equitable, and justify the installation of soft mobility resources.
+
+== Relation to other open activities ==
+The goals of the Open Data movement are similar to those of other "Open" movements.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Open_data-3.md b/data/en.wikipedia.org/wiki/Open_data-3.md
new file mode 100644
index 000000000..69b840526
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Open_data-3.md
@@ -0,0 +1,28 @@
+---
+title: "Open data"
+chunk: 4/5
+source: "https://en.wikipedia.org/wiki/Open_data"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:16:21.318197+00:00"
+instance: "kb-cron"
+---
+
+Open access is concerned with making scholarly publications freely available on the internet. In some cases, these articles include open datasets as well.
+Open specifications are documents describing file types or protocols, where the documents are openly licensed. These specifications are primarily meant to improve different software handling the same file types or protocols, but monopolists forced by law into open specifications might make it more difficult.
+Open content is concerned with making resources aimed at a human audience (such as prose, photos, or videos) freely available.
+Open knowledge. Open Knowledge International argues for openness in a range of issues including, but not limited to, those of open data. It covers (a) scientific, historical, geographic or otherwise (b) Content such as music, films, books (c) Government and other administrative information. Open data is included within the scope of the Open Knowledge Definition, which is alluded to in Science Commons' Protocol for Implementing Open Access Data.
+Open notebook science refers to the application of the Open Data concept to as much of the scientific process as possible, including failed experiments and raw experimental data.
+Open-source software is concerned with the open-source licenses under which computer programs can be distributed and is not normally concerned primarily with data.
+Open educational resources are freely accessible, openly licensed documents and media that are useful for teaching, learning, and assessing as well as for research purposes.
+Open research/open science/open science data (linked open science) means an approach to open and interconnect scientific assets like data, methods and tools with linked data techniques to enable transparent, reproducible and interdisciplinary research.
+Open-GLAM (Galleries, Library, Archives, and Museums) is an initiative and network that supports exchange and collaboration between cultural institutions that support open access to their digitalized collections. The GLAM-Wiki Initiative helps cultural institutions share their openly licensed resources with the world through collaborative projects with experienced Wikipedia editors. Open Heritage Data is associated with Open GLAM, as openly licensed data in the heritage sector is now frequently used in research, publishing, and programming, particularly in the Digital Humanities.
+
+== Open Data as commons ==
+
+=== Ideas and definitions ===
+Formally both the definition of Open Data and commons revolve around the concept of shared resources with a low barrier to access. 
+Substantially, digital commons include Open Data in that it includes resources maintained online, such as data. Overall, looking at operational principles of Open Data one could see the overlap between Open Data and (digital) commons in practice. Principles of Open Data are sometimes distinct depending on the type of data under scrutiny. Nonetheless, they are somewhat overlapping and their key rationale is the lack of barriers to the re-use of data(sets). Regardless of their origin, principles across types of Open Data hint at the key elements of the definition of commons. These are, for instance, accessibility, re-use, findability, non-proprietarily. Additionally, although to a lower extent, threats and opportunities associated with both Open Data and commons are similar. Synthesizing, they revolve around (risks and) benefits associated with (uncontrolled) use of common resources by a large variety of actors.
+
+=== The System ===
+Both commons and Open Data can be defined by the features of the resources that fit under these concepts, but they can be defined by the characteristics of the systems their advocates push for. Governance is a focus for both Open Data and commons scholars. The key elements that outline commons and Open Data peculiarities are the differences (and maybe opposition) to the dominant market logics as shaped by capitalism. Perhaps it is this feature that emerges in the recent surge of the concept of commons as related to a more social look at digital technologies in the specific forms of digital and, especially, data commons.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Open_data-4.md b/data/en.wikipedia.org/wiki/Open_data-4.md
new file mode 100644
index 000000000..2319d6cb8
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Open_data-4.md
@@ -0,0 +1,58 @@
+---
+title: "Open data"
+chunk: 5/5
+source: "https://en.wikipedia.org/wiki/Open_data"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:16:21.318197+00:00"
+instance: "kb-cron"
+---
+
+=== Real-life case ===
+Application of open data for societal good has been demonstrated in academic research works. The paper "Optimization of Soft Mobility Localization with Sustainable Policies and Open Data" uses open data in two ways. First, it uses open data to identify the needs of different areas of a city. For example, it might use data on population density, traffic congestion, and air quality to determine where soft mobility resources, such as bike racks and charging stations for electric vehicles, are most needed. Second, it uses open data to develop algorithms that are fair and equitable. For example, it might use data on the demographics of a city to ensure that soft mobility resources are distributed in a way that is accessible to everyone, regardless of age, disability, or gender. The paper also discusses the challenges of using open data for soft mobility optimization. One challenge is that open data is often incomplete or inaccurate. Another challenge is that it can be difficult to integrate open data from different sources. Despite these challenges, the paper argues that open data is a valuable tool for improving the sustainability and equity of soft mobility in cities.
+An exemplification of how the relationship between Open Data and commons and how their governance can potentially disrupt the market logic otherwise dominating big data is a project conducted by Human Ecosystem Relazioni in Bologna (Italy).
+This project aimed at extrapolating and identifying online social relations surrounding "collaboration" in Bologna. Data was collected from social networks and online platforms for citizens collaboration. Eventually data was analyzed for the content, meaning, location, timeframe, and other variables. Overall, online social relations for collaboration were analyzed based on network theory. The resulting dataset have been made available online as Open Data (aggregated and anonymized); nonetheless, individuals can reclaim all their data. This has been done with the idea of making data into a commons. This project exemplifies the relationship between Open Data and commons, and how they can disrupt the market logic driving big data use in two ways. First, it shows how such projects, following the rationale of Open Data somewhat can trigger the creation of effective data commons. The project itself was offering different types of support to social network platform users to have contents removed. Second, opening data regarding online social networks interactions has the potential to significantly reduce the monopolistic power of social network platforms on those data.
+
+== Funders' mandates ==
+Several funding bodies that mandate Open Access also mandate Open Data. A good expression of requirements (truncated in places) is given by the Canadian Institutes of Health Research (CIHR):
+
+to deposit bioinformatics, atomic and molecular coordinate data, and experimental data into the appropriate public database immediately upon publication of research results.
+to retain original data sets for at least five years after the grant. This applies to all data, whether published or not.
+Other bodies promoting the deposition of data and full text include the Wellcome Trust. An academic paper published in 2013 advocated that Horizon 2020 (the science funding mechanism of the EU) should mandate that funded projects hand in their databases as "deliverables" at the end of the project so that they can be checked for third-party usability and then shared.
+
+== See also ==
+Open knowledge
+Free content
+Openness
+Committee on Data of the International Science Council
+CKAN - Comprehensive Knowledge Archive Network
+Creative Commons license
+Data curation
+Data governance
+Data management
+Data publishing
+Data sharing
+Demand-responsive transport
+Digital preservation
+FAIR data principles
+International Open Data Day
+Linked data and Linked open data
+Open energy system databases
+Urban informatics
+Wikibase
+Wikidata
+List of datasets for machine-learning research
+Open Standard
+Digital public goods
+
+== References ==
+
+== External links ==
+
+Open Data – An Introduction – from the Open Knowledge Foundation
+Video  of Tim Berners-Lee at TED (conference) 2009 calling for "Raw Data Now"
+Six minute Video  of Tim Berners-Lee at TED (conference) 2010 showing examples of open data
+G8 Open Data Charter
+Towards a Genealogy of Open Data – research paper tracing different historical threads contributing to current conceptions of open data.
+
+Open data initiatives have been widely adopted by governments to increase transparency and encourage innovation in public services.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Open_science-0.md b/data/en.wikipedia.org/wiki/Open_science-0.md
index cb9dc65a5..8b3fcae4f 100644
--- a/data/en.wikipedia.org/wiki/Open_science-0.md
+++ b/data/en.wikipedia.org/wiki/Open_science-0.md
@@ -4,7 +4,7 @@ chunk: 1/8
 source: "https://en.wikipedia.org/wiki/Open_science"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T06:31:25.714031+00:00"
+date_saved: "2026-05-05T10:15:45.346466+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Open_science-1.md b/data/en.wikipedia.org/wiki/Open_science-1.md
index 3d0a5f561..33d67bb2c 100644
--- a/data/en.wikipedia.org/wiki/Open_science-1.md
+++ b/data/en.wikipedia.org/wiki/Open_science-1.md
@@ -4,7 +4,7 @@ chunk: 2/8
 source: "https://en.wikipedia.org/wiki/Open_science"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T06:31:25.714031+00:00"
+date_saved: "2026-05-05T10:15:45.346466+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Open_science-2.md b/data/en.wikipedia.org/wiki/Open_science-2.md
index 1d4766333..9a2a84f24 100644
--- a/data/en.wikipedia.org/wiki/Open_science-2.md
+++ b/data/en.wikipedia.org/wiki/Open_science-2.md
@@ -4,7 +4,7 @@ chunk: 3/8
 source: "https://en.wikipedia.org/wiki/Open_science"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T06:31:25.714031+00:00"
+date_saved: "2026-05-05T10:15:45.346466+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Open_science-3.md b/data/en.wikipedia.org/wiki/Open_science-3.md
index 65333f1ee..bf096545b 100644
--- a/data/en.wikipedia.org/wiki/Open_science-3.md
+++ b/data/en.wikipedia.org/wiki/Open_science-3.md
@@ -4,7 +4,7 @@ chunk: 4/8
 source: "https://en.wikipedia.org/wiki/Open_science"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T06:31:25.714031+00:00"
+date_saved: "2026-05-05T10:15:45.346466+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Open_science-4.md b/data/en.wikipedia.org/wiki/Open_science-4.md
index 4dccf0b96..e16734397 100644
--- a/data/en.wikipedia.org/wiki/Open_science-4.md
+++ b/data/en.wikipedia.org/wiki/Open_science-4.md
@@ -4,7 +4,7 @@ chunk: 5/8
 source: "https://en.wikipedia.org/wiki/Open_science"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T06:31:25.714031+00:00"
+date_saved: "2026-05-05T10:15:45.346466+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Open_science-5.md b/data/en.wikipedia.org/wiki/Open_science-5.md
index 2a554f769..0204fd993 100644
--- a/data/en.wikipedia.org/wiki/Open_science-5.md
+++ b/data/en.wikipedia.org/wiki/Open_science-5.md
@@ -4,7 +4,7 @@ chunk: 6/8
 source: "https://en.wikipedia.org/wiki/Open_science"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T06:31:25.714031+00:00"
+date_saved: "2026-05-05T10:15:45.346466+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Open_science-6.md b/data/en.wikipedia.org/wiki/Open_science-6.md
index 03c9204fb..4abef0ffd 100644
--- a/data/en.wikipedia.org/wiki/Open_science-6.md
+++ b/data/en.wikipedia.org/wiki/Open_science-6.md
@@ -4,7 +4,7 @@ chunk: 7/8
 source: "https://en.wikipedia.org/wiki/Open_science"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T06:31:25.714031+00:00"
+date_saved: "2026-05-05T10:15:45.346466+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Open_science-7.md b/data/en.wikipedia.org/wiki/Open_science-7.md
index 32b27e7f1..d845b09a3 100644
--- a/data/en.wikipedia.org/wiki/Open_science-7.md
+++ b/data/en.wikipedia.org/wiki/Open_science-7.md
@@ -4,7 +4,7 @@ chunk: 8/8
 source: "https://en.wikipedia.org/wiki/Open_science"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T06:31:25.714031+00:00"
+date_saved: "2026-05-05T10:15:45.346466+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Open_source-0.md b/data/en.wikipedia.org/wiki/Open_source-0.md
index f41067175..cc565ee63 100644
--- a/data/en.wikipedia.org/wiki/Open_source-0.md
+++ b/data/en.wikipedia.org/wiki/Open_source-0.md
@@ -4,7 +4,7 @@ chunk: 1/11
 source: "https://en.wikipedia.org/wiki/Open_source"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T06:32:29.420413+00:00"
+date_saved: "2026-05-05T10:15:48.029597+00:00"
 instance: "kb-cron"
 ---
 
@@ -20,7 +20,7 @@ Early instances of the free sharing of source code include IBM's source releases
 The sharing of source code on the Internet began when the Internet was relatively primitive, with software distributed via UUCP, Usenet, IRC, and Gopher. BSD, for example, was first widely distributed by posts to comp.os.linux on the Usenet, which is also where its development was discussed. Linux followed in this model.
 
 === Open source as a term ===
-Open source as a term emerged in the late 1990s by a group of people in the free software movement who were critical of the political agenda and moral philosophy implied in the term "free software" and sought to reframe the discourse to reflect a more commercially minded position. In addition, the ambiguity of the term "free software" was seen as discouraging business adoption. However, the ambiguity of the word "free" exists primarily in English as it can refer to cost. The group included Christine Peterson, Todd Anderson, Larry Augustin, Jon Hall, Sam Ockman, Michael Tiemann and Eric S. Raymond. Peterson suggested "open source" at a meeting held at Palo Alto, California, in reaction to Netscape's announcement in January 1998 of a source code release for Navigator. Linus Torvalds gave his support the following day, and Phil Hughes backed the term in Linux Journal. Richard Stallman, the founder of the Free Software Foundation (FSF) in 1985, quickly decided against endorsing the term. The FSF's goal was to promote the development and use of free software, which they defined as software that grants users the freedom to run, study, share, and modify the code. This concept is similar to open source but places a greater emphasis on the ethical and political aspects of software freedom. Netscape released its source code under the Netscape Public License and later under the Mozilla Public License.
+Open source as a term emerged in the late 1990s. It was coined by a group of people in the free software movement who were critical of the political agenda and moral philosophy implied in the term "free software" and sought to reframe the discourse to reflect a more commercially minded position. In addition, the ambiguity of the term "free software" was seen as discouraging business adoption. However, the ambiguity of the word "free" exists primarily in English as it can refer to cost. The group included Christine Peterson, Todd Anderson, Larry Augustin, Jon Hall, Sam Ockman, Michael Tiemann and Eric S. Raymond. Peterson suggested "open source" at a meeting held at Palo Alto, California, in reaction to Netscape's announcement in January 1998 of a source code release for Navigator. Linus Torvalds gave his support the following day, and Phil Hughes backed the term in Linux Journal. Richard Stallman, the founder of the Free Software Foundation (FSF) in 1985, quickly decided against endorsing the term. The FSF's goal was to promote the development and use of free software, which they defined as software that grants users the freedom to run, study, share, and modify the code. This concept is similar to open source but places a greater emphasis on the ethical and political aspects of software freedom. Netscape released its source code under the Netscape Public License and later under the Mozilla Public License.
 Raymond was especially active in the effort to popularize the new term. He made the first public call to the free software community to adopt it in February 1998. Shortly after, he founded The Open Source Initiative in collaboration with Bruce Perens.
 The term gained further visibility through an event organized in April 1998 by technology publisher O'Reilly Media . Originally titled the "Freeware Summit" and later known as the "Open Source Summit", the event was attended by the leaders of many of the most important free and open-source projects, including Linus Torvalds, Larry Wall, Brian Behlendorf, Eric Allman, Guido van Rossum, Michael Tiemann, Paul Vixie, Jamie Zawinski, and Eric Raymond. At that meeting, alternatives to the term "free software" were discussed. Tiemann argued for "sourceware" as a new term, while Raymond argued for "open source." The assembled developers took a vote, and the winner was announced at a press conference the same evening.
 
diff --git a/data/en.wikipedia.org/wiki/Open_source-1.md b/data/en.wikipedia.org/wiki/Open_source-1.md
index 77f216cd7..11fdb7895 100644
--- a/data/en.wikipedia.org/wiki/Open_source-1.md
+++ b/data/en.wikipedia.org/wiki/Open_source-1.md
@@ -4,7 +4,7 @@ chunk: 2/11
 source: "https://en.wikipedia.org/wiki/Open_source"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T06:32:29.420413+00:00"
+date_saved: "2026-05-05T10:15:48.029597+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Open_source-10.md b/data/en.wikipedia.org/wiki/Open_source-10.md
index 6890c0503..62f80ff2a 100644
--- a/data/en.wikipedia.org/wiki/Open_source-10.md
+++ b/data/en.wikipedia.org/wiki/Open_source-10.md
@@ -4,7 +4,7 @@ chunk: 11/11
 source: "https://en.wikipedia.org/wiki/Open_source"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T06:32:29.420413+00:00"
+date_saved: "2026-05-05T10:15:48.029597+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Open_source-2.md b/data/en.wikipedia.org/wiki/Open_source-2.md
index dae81aefa..a3c297553 100644
--- a/data/en.wikipedia.org/wiki/Open_source-2.md
+++ b/data/en.wikipedia.org/wiki/Open_source-2.md
@@ -4,7 +4,7 @@ chunk: 3/11
 source: "https://en.wikipedia.org/wiki/Open_source"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T06:32:29.420413+00:00"
+date_saved: "2026-05-05T10:15:48.029597+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Open_source-3.md b/data/en.wikipedia.org/wiki/Open_source-3.md
index 3b5ca7f36..88d871464 100644
--- a/data/en.wikipedia.org/wiki/Open_source-3.md
+++ b/data/en.wikipedia.org/wiki/Open_source-3.md
@@ -4,7 +4,7 @@ chunk: 4/11
 source: "https://en.wikipedia.org/wiki/Open_source"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T06:32:29.420413+00:00"
+date_saved: "2026-05-05T10:15:48.029597+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Open_source-4.md b/data/en.wikipedia.org/wiki/Open_source-4.md
index b7c42e3ae..2f8e472f3 100644
--- a/data/en.wikipedia.org/wiki/Open_source-4.md
+++ b/data/en.wikipedia.org/wiki/Open_source-4.md
@@ -4,7 +4,7 @@ chunk: 5/11
 source: "https://en.wikipedia.org/wiki/Open_source"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T06:32:29.420413+00:00"
+date_saved: "2026-05-05T10:15:48.029597+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Open_source-5.md b/data/en.wikipedia.org/wiki/Open_source-5.md
index 7b0b0804e..044fbd4cc 100644
--- a/data/en.wikipedia.org/wiki/Open_source-5.md
+++ b/data/en.wikipedia.org/wiki/Open_source-5.md
@@ -4,7 +4,7 @@ chunk: 6/11
 source: "https://en.wikipedia.org/wiki/Open_source"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T06:32:29.420413+00:00"
+date_saved: "2026-05-05T10:15:48.029597+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Open_source-6.md b/data/en.wikipedia.org/wiki/Open_source-6.md
index 0442a6882..7eebe0aa7 100644
--- a/data/en.wikipedia.org/wiki/Open_source-6.md
+++ b/data/en.wikipedia.org/wiki/Open_source-6.md
@@ -4,7 +4,7 @@ chunk: 7/11
 source: "https://en.wikipedia.org/wiki/Open_source"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T06:32:29.420413+00:00"
+date_saved: "2026-05-05T10:15:48.029597+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Open_source-7.md b/data/en.wikipedia.org/wiki/Open_source-7.md
index 35cd30f1e..22d38462b 100644
--- a/data/en.wikipedia.org/wiki/Open_source-7.md
+++ b/data/en.wikipedia.org/wiki/Open_source-7.md
@@ -4,7 +4,7 @@ chunk: 8/11
 source: "https://en.wikipedia.org/wiki/Open_source"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T06:32:29.420413+00:00"
+date_saved: "2026-05-05T10:15:48.029597+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Open_source-8.md b/data/en.wikipedia.org/wiki/Open_source-8.md
index e39cb029f..97b69f5e3 100644
--- a/data/en.wikipedia.org/wiki/Open_source-8.md
+++ b/data/en.wikipedia.org/wiki/Open_source-8.md
@@ -4,7 +4,7 @@ chunk: 9/11
 source: "https://en.wikipedia.org/wiki/Open_source"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T06:32:29.420413+00:00"
+date_saved: "2026-05-05T10:15:48.029597+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Open_source-9.md b/data/en.wikipedia.org/wiki/Open_source-9.md
index 3bf7ba0ad..270c634ae 100644
--- a/data/en.wikipedia.org/wiki/Open_source-9.md
+++ b/data/en.wikipedia.org/wiki/Open_source-9.md
@@ -4,7 +4,7 @@ chunk: 10/11
 source: "https://en.wikipedia.org/wiki/Open_source"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T06:32:29.420413+00:00"
+date_saved: "2026-05-05T10:15:48.029597+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Overlay_journal-0.md b/data/en.wikipedia.org/wiki/Overlay_journal-0.md
new file mode 100644
index 000000000..1aeba5644
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Overlay_journal-0.md
@@ -0,0 +1,30 @@
+---
+title: "Overlay journal"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Overlay_journal"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:15:52.708920+00:00"
+instance: "kb-cron"
+---
+
+An overlay journal or overlay ejournal is a type of open access academic journal, almost always an online electronic journal (ejournal), that does not produce its own content, but selects from texts that are already freely available online.  While many overlay journals derive their content from preprint servers, others, such as the Lund Medical Faculty Monthly, contain mainly papers published by commercial publishers, but with links to self-archived preprint or postprints when possible.
+The editors of an overlay journal locate suitable material from open access repositories and public domain sources, read it, and evaluate its worth.  This evaluation may take the form of the judgement of a single editor or editors, or a full peer review process.
+Public validation of subsequently approved texts may take several forms. At its most formal, the editor may republish the article with explicit approval.  Approval might take the form of an addition to the text or its metadata. Or the editor may simply link to the article, via the table of contents of the overlay journal. An alternative approach is to link to articles already published in various open access ejournals, but adding value by grouping scattered articles together as a single themed issue of the overlay journal. Such themed issues allow the focussed coverage of relatively obscure or newly emerging topics.
+Episciences is an initiative by the Center for Direct Scientific Communication to host overlay journals. It hosts among others the computer science journals Logical Methods in Computer Science and Fundamenta Informaticae.
+In 2019, JMIR Publications, an open access publisher, announced the creation of a series of "superjournals", named JMIRx (JMIRx.org), which are overlay journals for preprint servers such as medRxiv, bioRxiv and PsyArXiv.
+
+
+== History ==
+The term 'overlay journal' was first coined by Paul Ginsparg in 1996. That same year, the journal Physical Review began to link to pre-prints that they had accepted, but not yet published. It was not until later that the first overlay journals were founded, including Journal of High Energy Physics, Logical Methods in Computer Science and Geometry and Topology, all of which were overlays for arXiv.
+
+
+== References ==
+
+
+== Further reading ==
+Open Video Project: Overlay Journal prototype demonstration (2006)
+"Investigating overlay journals: introducing the RIOJA Project". D-Lib Magazine. September/October 2007
+Lund Medical Faculty Monthly
+Gibney, Elizabeth (2016). "Open journals that piggyback on arXiv gather momentum". Nature. 530 (7588): 117–118. doi:10.1038/nature.2015.19102. PMID 26854297.
+JMIRx (JMIRx.org)
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Persée_(web_portal)-0.md b/data/en.wikipedia.org/wiki/Persée_(web_portal)-0.md
new file mode 100644
index 000000000..b7d87bb9b
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Persée_(web_portal)-0.md
@@ -0,0 +1,28 @@
+---
+title: "Persée (web portal)"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Persée_(web_portal)"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:15:53.890363+00:00"
+instance: "kb-cron"
+---
+
+Persée is a digital library of open access, mostly French-language scholarly journals, established by the Ministry of National Education of France. The website launched in 2005. The resource is maintained by the École normale supérieure de Lyon, French National Centre for Scientific Research, and University of Lyon.
+It is one of the largest francophone portals dedicated to human and social sciences, with about 600 000 documents freely available.
+
+
+== See also ==
+List of journals in Persee.fr (fr)
+Open access journal
+List of open access bibliographic databases (fr)
+
+
+== References ==
+
+
+== Bibliography ==
+
+
+== External links ==
+Official site
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Polish-Studies.Interdisciplinary-0.md b/data/en.wikipedia.org/wiki/Polish-Studies.Interdisciplinary-0.md
new file mode 100644
index 000000000..5847bca42
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Polish-Studies.Interdisciplinary-0.md
@@ -0,0 +1,61 @@
+---
+title: "Polish-Studies.Interdisciplinary"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Polish-Studies.Interdisciplinary"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:15:55.063984+00:00"
+instance: "kb-cron"
+---
+
+Polish-Studies.Interdisciplinary (Pol-Int) is a free online platform for information on and international exchange in the field of Polish studies. The platform was launched in 2014 and serves as a tool for a growing interdisciplinary community of scholars worldwide to promote their own research and publications in Polish studies. Users can publish reviews, share information about conferences, events, and career opportunities as well as connect and engage in discussions on current issues. Pol-Int is headed by Dagmara Jajeśniak-Quast and based at the Center for Interdisciplinary Polish Studies (ZIP) at the European University Viadrina in Frankfurt (Oder) and at the Collegium Polonicum in Słubice. The platform is co-financed by the Foundation for Polish-German Cooperation, the Polish-German Foundation for Science, the European University Viadrina and the European Regional Development Fund.
+
+
+== Profile ==
+Pol-Int offers up-to-date information on Polish studies worldwide: publications, journals and articles, reviews, job offers (including scholarships and grants), events, conference reports, and calls for papers. Anyone interested in the field of Polish studies can register, set up an individual profile, as well as publish and obtain information, share and promote their own research projects, and find project partners. Thus, Pol-Int serves as a networking tool for scholars dealing with Poland past and present, its culture, society, economy, and so forth.
+All posts on the platform are further disseminated through a customizable newsletter. The platform is based on and prospers from the commitment and active participation of its academic community.
+In 2016, Pol-Int launched the academic blog “Salon” – a separate space within the platform to discuss current pressing issues in Poland, to debate them at expert meetings, such as panel discussions, and to later publish the results in articles, reviews and interviews online. The “Salon” is meant to bridge the gap between the traditional analogue and the digital sphere of contemporary research, in order to promote exchange between both realms.
+
+
+== Editorial board ==
+The editorial board is based in Frankfurt (Oder). Thanks to an expanding cooperation with more than 100 voluntary specialist editors from over 25 academic disciplines who peer-review texts before publication, Pol-Int publishes reviews in Polish, English and German. The editorial board reviews more than 150 recently published monographs or anthologies and academic articles per year and coordinates the workflow with the specialist editors and reviewers. All publications on Pol-Int are required and certified to maintain the level of renowned academic journals.
+
+
+== Partners ==
+The platform is supported by numerous academic partner institutions from Poland, Germany, the USA, and the UK:
+
+Aleksander Brückner Center for Polish Studies
+Archiwum Karla Dedeciusa
+Center for Interdisciplinary Polish Studies
+Centre for Historical Research of the Polish Academy of Sciences in Berlin
+Collegium Polonicum
+Cosmopolitan Review
+Deutsches Polen-Institut
+East Central European Center of the Columbia University
+European University Viadrina
+Faculty of International and Political Studies, University of Łódź
+Geisteswissenschaftliches Zentrum Geschichte und Kultur Ostmitteleuropas
+Governmental Research Institute – Silesian Institute in Opole
+Herder-Institut für historische Ostmitteleuropaforschung
+H-Soz-Kult
+Institute for Western Affairs
+Institute of History, University of Warsaw
+Institute of Political Studies of the Polish Academy of Sciences
+Instytut Filologii Polskiej, Uniwersytet Pedagogiczny w Krakowie
+Ośrodek Badań nad Mediami
+Polish Historical Association, Kraków
+Museum of Polish History
+Polish Studies Association
+Polsko-Niemiecki Instytut Badawczy
+Programme on Modern Poland, St. Antony's College, University of Oxford
+Projekt Nauka. Fundacja na rzecz promocji nauki polskiej
+Viadrina Center B/ORDERS IN MOTION
+Willy Brandt Center for German and European Studies
+Wydział Politologii i Studiów Międzynarodowych Uniwersytetu Mikołaja Kopernika w Toruniu
+
+
+== References ==
+
+
+== External links ==
+Polish-Studies.Interdisciplinary (Pol-Int)
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Postprint-0.md b/data/en.wikipedia.org/wiki/Postprint-0.md
new file mode 100644
index 000000000..cda90af8b
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Postprint-0.md
@@ -0,0 +1,39 @@
+---
+title: "Postprint"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Postprint"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:15:56.269010+00:00"
+instance: "kb-cron"
+---
+
+A postprint is a digital draft of a research journal article after it has been peer reviewed and accepted for publication, but before it has been typeset and formatted by the journal.
+
+
+== Related terminology ==
+A digital draft before peer review is called a preprint. 
+Postprints are also sometimes called accepted author manuscripts (AAMs), because they are the version accepted by the journal after the author has addressed the peer reviewer comments. Jointly, postprints and preprints are called eprints.
+Postprints are variously referred to by different publishers as pre-proofs, author's original version and variations of these.
+After typesetting by a journal, authors will often be provided with proofs (the draft of the final formatting) and finally the version that is published is called the published/publisher's version.
+The term postprint used to also refer to the formatted publishers version, however usage has narrowed to refer only to the current definition of accepted but unformatted.
+
+
+== Role in open access ==
+Journal publication licenses typically claim copyright over the typeset and formatted version, but permit authors to release the postprint version as open access (self-archiving). This is often termed green open access, and enables access and reuse of material even in paywalled subscription journals (typically under a creative commons license). Permission by the journal to release a postprint may be immediate or after an embargo period, with licensing terms for most journals collected in the Sherpa/Romeo database.
+Since the advent of the Open Archives Initiative, preprints and postprints have been deposited in institutional repositories, which are interoperable because they are compliant with the Open Archives Initiative Protocol for Metadata Harvesting.
+Eprints are at the heart of the open access initiative to make research freely accessible online. Eprints were first deposited or self-archived in arbitrary websites and then harvested by virtual archives such as CiteSeer (and, more recently, Google Scholar), or they were deposited in central disciplinary archives such as arXiv or PubMed Central.
+
+
+== See also ==
+Ahead-of-print
+
+
+== References ==
+
+
+== Further reading ==
+Danny Smith (23 October 2018). "You say tomato, I say accepted manuscript". Open Access and Digital Scholarship Blog. Imperial College London.
+
+
+== External links ==
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Predatory_publishing-0.md b/data/en.wikipedia.org/wiki/Predatory_publishing-0.md
index 83c14dd3f..fbcc4381a 100644
--- a/data/en.wikipedia.org/wiki/Predatory_publishing-0.md
+++ b/data/en.wikipedia.org/wiki/Predatory_publishing-0.md
@@ -4,7 +4,7 @@ chunk: 1/7
 source: "https://en.wikipedia.org/wiki/Predatory_publishing"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T07:02:42.480342+00:00"
+date_saved: "2026-05-05T10:15:57.520910+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Predatory_publishing-1.md b/data/en.wikipedia.org/wiki/Predatory_publishing-1.md
index 42d5eee62..2c8cfedfc 100644
--- a/data/en.wikipedia.org/wiki/Predatory_publishing-1.md
+++ b/data/en.wikipedia.org/wiki/Predatory_publishing-1.md
@@ -4,7 +4,7 @@ chunk: 2/7
 source: "https://en.wikipedia.org/wiki/Predatory_publishing"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T07:02:42.480342+00:00"
+date_saved: "2026-05-05T10:15:57.520910+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Predatory_publishing-2.md b/data/en.wikipedia.org/wiki/Predatory_publishing-2.md
index 05eb2ca7e..23b348381 100644
--- a/data/en.wikipedia.org/wiki/Predatory_publishing-2.md
+++ b/data/en.wikipedia.org/wiki/Predatory_publishing-2.md
@@ -4,7 +4,7 @@ chunk: 3/7
 source: "https://en.wikipedia.org/wiki/Predatory_publishing"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T07:02:42.480342+00:00"
+date_saved: "2026-05-05T10:15:57.520910+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Predatory_publishing-3.md b/data/en.wikipedia.org/wiki/Predatory_publishing-3.md
index 069f3b393..3c57ba19a 100644
--- a/data/en.wikipedia.org/wiki/Predatory_publishing-3.md
+++ b/data/en.wikipedia.org/wiki/Predatory_publishing-3.md
@@ -4,7 +4,7 @@ chunk: 4/7
 source: "https://en.wikipedia.org/wiki/Predatory_publishing"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T07:02:42.480342+00:00"
+date_saved: "2026-05-05T10:15:57.520910+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Predatory_publishing-4.md b/data/en.wikipedia.org/wiki/Predatory_publishing-4.md
index ba1989128..f174b62e3 100644
--- a/data/en.wikipedia.org/wiki/Predatory_publishing-4.md
+++ b/data/en.wikipedia.org/wiki/Predatory_publishing-4.md
@@ -4,7 +4,7 @@ chunk: 5/7
 source: "https://en.wikipedia.org/wiki/Predatory_publishing"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T07:02:42.480342+00:00"
+date_saved: "2026-05-05T10:15:57.520910+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Predatory_publishing-5.md b/data/en.wikipedia.org/wiki/Predatory_publishing-5.md
index 464e9dbe0..843bf6978 100644
--- a/data/en.wikipedia.org/wiki/Predatory_publishing-5.md
+++ b/data/en.wikipedia.org/wiki/Predatory_publishing-5.md
@@ -4,7 +4,7 @@ chunk: 6/7
 source: "https://en.wikipedia.org/wiki/Predatory_publishing"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T07:02:42.480342+00:00"
+date_saved: "2026-05-05T10:15:57.520910+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Predatory_publishing-6.md b/data/en.wikipedia.org/wiki/Predatory_publishing-6.md
index 71c971cde..ce964ecd3 100644
--- a/data/en.wikipedia.org/wiki/Predatory_publishing-6.md
+++ b/data/en.wikipedia.org/wiki/Predatory_publishing-6.md
@@ -4,7 +4,7 @@ chunk: 7/7
 source: "https://en.wikipedia.org/wiki/Predatory_publishing"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T07:02:42.480342+00:00"
+date_saved: "2026-05-05T10:15:57.520910+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Public_Access_to_Public_Science_Act-0.md b/data/en.wikipedia.org/wiki/Public_Access_to_Public_Science_Act-0.md
new file mode 100644
index 000000000..eb80c6ef2
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Public_Access_to_Public_Science_Act-0.md
@@ -0,0 +1,26 @@
+---
+title: "Public Access to Public Science Act"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Public_Access_to_Public_Science_Act"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:15:58.721082+00:00"
+instance: "kb-cron"
+---
+
+The Public Access to Public Science (PAPs) public access to research funded by specific Federal agencies under the jurisdiction of the House Science committee, including National Aeronautics and Space Administration (NASA), the National Science Foundation (NSF), the National Institute of Standards and Technology (NIST) and the National Weather Service (NWS). The Bill was introduced to the 113th Congress by Congressman Jim Sensenbrenner (R.-WI.) and Congresswoman Eddie Bernice Johnson (D-TX) and was referred to the Subcommittee on Research and Technology December 13, 2013. It has been endorsed by the Association of Public and Land-grant Universities (APLU) and the Association of American Universities (AAU).
+The bill is often compared to and discussed in conjunction with the Fair Access to Science and Technology Research Act (FASTR) bill, also introduced in 2013.
+
+
+== References ==
+
+Congress.gov. H.R.3157 -- 113th Congress (2013-2014).
+Meindertsma, Jessica. October 25, 2013. "Public Access Policies (Part 1): FASTER, PAPS, and the OSTP Directive." Ohio State University Libraries.
+Sensenbrenner, Jim. September 6, 2016. "Give The Public What It Pays For: Scientific Research." Forbes.
+
+
+== External links ==
+Full text of bill
+Letter of endorsement
+Notes on the Public Access to Public Science Act, Harvard Open Access Project
+FASTR v PAPS
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Registry_of_Open_Access_Repositories-0.md b/data/en.wikipedia.org/wiki/Registry_of_Open_Access_Repositories-0.md
new file mode 100644
index 000000000..1cd33d941
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Registry_of_Open_Access_Repositories-0.md
@@ -0,0 +1,26 @@
+---
+title: "Registry of Open Access Repositories"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Registry_of_Open_Access_Repositories"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:15:59.875819+00:00"
+instance: "kb-cron"
+---
+
+The Registry of Open Access Repositories (ROAR) is a searchable international database indexing the creation, location and growth of open access institutional repositories and their contents. ROAR was created by EPrints at University of Southampton, UK, in 2003. It began as the Institutional Archives Registry and was renamed  Registry of Open Access Repositories in 2006. To date, over 3,000 institutional and cross-institutional repositories have been registered.
+As of 2015, ROAR and the UK-based Directory of Open Access Repositories (OpenDOAR) "are considered the two leading open access directories worldwide. ROAR is the larger directory and allows direct submissions to the directory. OpenDOAR controls submission of materials and is dependent on the discretion of its staff. OpenDOAR requires open access of scholarly publications; whereas ROAR allows other types of materials to be included. ROAR allows filtering by country, type of repository, and sorting by repository name."
+
+
+== ROARMAP ==
+ROAR's companion Registry of Open Access Repository Mandates and Policies (ROARMAP) is a searchable international database of policies. It charts the growth of open access mandates and policies adopted by universities, research institutions and research funders that require their researchers to provide open access to their peer-reviewed research article output by depositing it in an open access repository.
+It was created by EPrints at University of Southampton in 2003. The Institutional Self-Archiving Policy Registry became the Registry of Open Access Repository Material Archiving Policies in 2006, then the Registry of Open Access Repositories Mandatory Archiving Policies, and then the Registry of Open Access Repository Mandates and Policies around 2014.
+ROARMAP mandates are classified in terms of strength and effectiveness in MELIBEA As of October 2015, open-access mandates have been adopted by more than 520 universities and more than 75 research funders worldwide. 
+
+
+== References ==
+
+
+== External links ==
+
+Official website
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Research_Works_Act-0.md b/data/en.wikipedia.org/wiki/Research_Works_Act-0.md
new file mode 100644
index 000000000..ba13cafcb
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Research_Works_Act-0.md
@@ -0,0 +1,43 @@
+---
+title: "Research Works Act"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Research_Works_Act"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:16:02.275907+00:00"
+instance: "kb-cron"
+---
+
+The Research Works Act, 102 H.R. 3699, was a bill that was introduced in the United States House of Representatives at the 112th United States Congress on December 16, 2011, by Representative Darrell Issa (R-CA) and co-sponsored by Carolyn B. Maloney (D-NY). The bill contained provisions to prohibit open-access mandates for federally funded research and effectively revert the United States' National Institutes of Health Public Access Policy, which requires taxpayer-funded research to be freely accessible online. If enacted, it would have also severely restricted the sharing of scientific data. The bill was referred to the House Committee on Oversight and Government Reform, of which Issa is the chair. Similar bills were introduced in 2008 and 2009 but have not been enacted since.
+On February 27, 2012, Elsevier, a major publisher, announced that it was withdrawing support for the Act. Later that day, Issa and Maloney issued a statement saying that they would not push for legislative action on the bill.
+
+
+== Reception ==
+The bill was supported by the Association of American Publishers (AAP) and the Copyright Alliance.
+The Scholarly Publishing and Academic Resources Coalition, the Alliance for Taxpayer Access, the American Library Association, the International Society for Computational Biology, the Confederation of Open Access Repositories and prominent open science and open access advocates criticized the Research Works Act, some of them urging scholarly societies to resign from the AAP because of its support for the bill. Several AAP members, including MIT Press, Rockefeller University Press, Nature Publishing Group, American Association for the Advancement of Science stated their opposition to the bill but signaled no intention to leave the association. Other AAP members stated their opposition to the bill as did the Association of American Universities (AAU) and the Association of Public and Land-grant Universities. Several public health groups opposed the bill.
+Opponents stressed particularly the effects on public availability of biomedical research results, such as those funded by NIH grants, submitting that under the bill "taxpayers who already paid for the research would have to pay again to read the results". Mike Taylor from the University of Bristol said that the bill's denial of access to scientific research would cause "preventable deaths in developing countries" and "an incalculable loss to science", and said Representatives Issa and Maloney were motivated by multiple donations they had received from the academic publisher Elsevier.
+An online petition – The Cost of Knowledge – inspired by British mathematician and Fields medalist Timothy Gowers to raise awareness of the bill, to call for lower prices for journals and to promote increased open access to information, was signed by more than 10,000 scholars. Signatories vowed to withhold their support from Elsevier journals as editors, reviewers or authors "unless they radically change how they operate". On February 27, 2012, Elsevier announced its withdrawal of support for the bill, citing concerns from journal authors, editors, and reviewers. While participants in the boycott celebrated the dropping of support for the Research Works Act, Elsevier denied that their action was a result of the boycott and stated that they took this action at the request of those researchers who did not participate in the boycott.
+
+
+== Related legislation and executive action ==
+The Research Works Act followed other attempts to challenge institutional open-access mandates in the US.  On September 9, 2008, an earlier bill aimed at reversing the NIH's Public Access Policy – the Fair Copyright in Research Works Act, or Conyers Bill – was introduced as 110 H. R. 6845 in the House of Representatives at the 110th United States Congress by U.S Representative John Conyers (D-MI), with three cosponsors. It was referred to the House Committee on the Judiciary, to which Conyers delivered an introduction on September 10, 2008. After the start of the 111th United States Congress, Conyers and six-cosponsors reintroduced the bill to the House of Representatives as 111 H. R. 801 on February 3, 2009. It was on the same day referred to the House Committee on the Judiciary and on March 16 to the Subcommittee on Courts and Competition Policy.
+On the other hand, the Federal Research Public Access Act proposed to expand the open public access mandate to research funded by eleven U.S. federal agencies. Originally introduced to the Senate in 2006 by John Cornyn (R-TX) with two cosponsors, it was reintroduced in 2009 by Lieberman, co-sponsored by Cornyn, and again in 2012.  These bills proposed requiring that those eleven agencies with research expenditures over $100 million create online repositories of journal articles of the research completed by that agency and make them publicly available without charge within six months after it has been published in a peer-reviewed journal. On February 22, 2013 the Obama administration issued a similar policy memorandum, directing Federal agencies with more than $100 million in annual research and development expenditures to develop plans to make research freely available to the public within one year of publication in most cases.
+
+
+== Later developments ==
+The controversy about Research Works Act finally ended on August 25, 2022, when the US Office of Science and Technology Policy under Biden's administration issued a contractual mandate to make all publications reporting studies funded by the U.S. federal government freely available without delay, thus ending over 50 years of the serials crisis, albeit only for U.S. contributions.
+
+
+== See also ==
+
+PubMed Central
+Open-access journal
+
+
+== References ==
+
+
+== External links ==
+H.R. 3699 on Thomas – Library of Congress Archived 2013-05-29 at the Wayback Machine
+H.R. 3699 on GovTrack
+Notes on the Research Works Act from the Harvard Open Access Project
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/ScienceOpen-0.md b/data/en.wikipedia.org/wiki/ScienceOpen-0.md
new file mode 100644
index 000000000..9373b42bb
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/ScienceOpen-0.md
@@ -0,0 +1,48 @@
+---
+title: "ScienceOpen"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/ScienceOpen"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:16:08.138583+00:00"
+instance: "kb-cron"
+---
+
+ScienceOpen is a web-based platform, that hosts open access journals. It is freely accessible for readers, authors and publishers, and it generates its revenues via promotional services  for publishers and authors' institutions. The organization is based in Berlin and has a technical office in Boston. It is a member of CrossRef, ORCID, the Open Access Scholarly Publishers Association, STM Association and the Directory of Open Access Journals. The company was designated as one of “10 to Watch” by research advisory firm Outsell in its report “Open Access 2015: Market Size, Share, Forecast, and Trends.”
+
+
+== History ==
+ScienceOpen began in 2013 when Alexander Grossmann, a professor of Publishing Management at the Leipzig University of Applied Sciences and former publishing director at scholarly publishing house De Gruyter, and Tibor Tscheke, president and CEO of the content management system company Ovitas, decided to start a platform that would allow researchers to share scientific information, both formally by publishing articles and participating in post-publication peer review, and informally by reviewing their colleagues’ work, providing endorsements and comments, and by updating their own papers.
+Its beta version was introduced in November 2013, and release 1.0 launched in May 2014. In September 2015, ScienceOpen hit the 10 million article record mark. In June 2016 they announced their partnership with SciELO, the largest publisher in Latin America, to enable discovery of SciELO content in ScienceOpen. Additional publishing partners include UCL Press, Emerald Publishing, Cold Spring Harbor Laboratory Press, PeerJ, Open Library of the Humanities, Microbiology Society, Karger, Equinox, Hogrefe, EDPSciences, UTS ePRESS, Higher Education Press, Europe's Journal for Psychology, the Italian Society of Victimology and more.
+As of January 2018, the site had 38 million articles and records from PubMed Central, ArXiv, PubMed, SciELO, and numerous individual publishers. ScienceOpen provides a publicly available citation index which is free for researchers to use wherever they are and is provided at no cost to libraries, which in February 2016 was dubbed the Open Citation Index.
+
+
+== Business model ==
+ScienceOpen offers open access journal hosting services, as well as advanced indexing and promotional services that showcase customer content within the discovery platform.
+Every research article on ScienceOpen has a traceable genealogy through citations, a public peer review process, and social interaction tracked by altmetrics, which they call research "context". The technology behind the ScienceOpen platform is provided by Ovitas.
+ScienceOpen appoints members of the research community as Collection Editors who curate articles from multiple publishers in any topic. Collections support discovery of and within research communities. All content on the platform is available for post-publication peer review by scientific members with five or more peer-reviewed publications on their ORCID, and all articles can be publicly commented on by members with one or more items.
+The primary intent of ScienceOpen originally was to provide a platform to provide a peer-review for and publish Open Access science articles. Authors can upload their manuscripts free of charge onto ScienceOpen Preprints, where they immediately receive a DOI number. After this the authors need to invite potential reviewers for their manuscript. After received the reviews, the authors can modify their manuscript and submit the second version, which is usually accepted for publication. After having received at least 2 independent reviews with recommendation for acceptance, the authors have an option to publish their work on ScienceOpen Research for 400 US$. After several years of operation, on 2022-08-01 ScienceOpen lists 115 reviews (with only 2 negative) and 77 publications.
+ScienceOpen Posters allows academics to publish conference posters so that they can reach a wider audience than those who were present at their conference.
+There are several other web services in the same market niche as ScienceOpen, but they have been even less successful in attracting customers:
+Qeios, PREreview.org, Hypothesis, F1000, Publons, PubPeer, PubPub.
+
+
+== Headquarters ==
+ScienceOpen has its headquarters located at Pappelallee 78-79, 10437 Berlin, Germany and its technical hub is at 131 Hartwell Ave, Lexington, MA 02421, USA.
+
+
+== See also ==
+Open Access Scholarly Publishers Association (OASPA)
+Directory of Open Access Journals (DOAJ)
+Registry of Open Access Repositories Mandatory Archiving Policies (ROARMAP)
+Open access in Germany
+List of academic databases and search engines
+
+
+== References ==
+
+
+== External links ==
+Official website
+ScienceOpen Research, ISSN 2199-1006
+ScienceOpen Posters, ISSN 2199-8442
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Science_2.0-0.md b/data/en.wikipedia.org/wiki/Science_2.0-0.md
index 37e2c623a..85a8bf325 100644
--- a/data/en.wikipedia.org/wiki/Science_2.0-0.md
+++ b/data/en.wikipedia.org/wiki/Science_2.0-0.md
@@ -4,7 +4,7 @@ chunk: 1/3
 source: "https://en.wikipedia.org/wiki/Science_2.0"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T06:33:02.990995+00:00"
+date_saved: "2026-05-05T10:16:06.968333+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Science_2.0-1.md b/data/en.wikipedia.org/wiki/Science_2.0-1.md
index 74e93cc9a..0c27fb67b 100644
--- a/data/en.wikipedia.org/wiki/Science_2.0-1.md
+++ b/data/en.wikipedia.org/wiki/Science_2.0-1.md
@@ -4,7 +4,7 @@ chunk: 2/3
 source: "https://en.wikipedia.org/wiki/Science_2.0"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T06:33:02.990995+00:00"
+date_saved: "2026-05-05T10:16:06.968333+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Science_2.0-2.md b/data/en.wikipedia.org/wiki/Science_2.0-2.md
index 05d310d4a..c7b317725 100644
--- a/data/en.wikipedia.org/wiki/Science_2.0-2.md
+++ b/data/en.wikipedia.org/wiki/Science_2.0-2.md
@@ -4,7 +4,7 @@ chunk: 3/3
 source: "https://en.wikipedia.org/wiki/Science_2.0"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T06:33:02.990995+00:00"
+date_saved: "2026-05-05T10:16:06.968333+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Self-archiving-0.md b/data/en.wikipedia.org/wiki/Self-archiving-0.md
new file mode 100644
index 000000000..1c6ae21b2
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Self-archiving-0.md
@@ -0,0 +1,46 @@
+---
+title: "Self-archiving"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Self-archiving"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:16:09.346015+00:00"
+instance: "kb-cron"
+---
+
+Self-archiving is the act of (the author's) depositing a free copy of an electronic document online in order to provide open access to it. The term usually refers to the self-archiving of peer-reviewed research journal and conference articles, as well as theses and book chapters, deposited in the author's own institutional repository or open archive for the purpose of maximizing its accessibility, usage and citation impact. The term green open access has become common in recent years, distinguishing this approach from gold open access, where the journal itself makes the articles publicly available without charge to the reader.
+
+
+== Origins ==
+Self-archiving was first explicitly proposed as a universal practice by Stevan Harnad in his 1994 online posting "Subversive Proposal"
+(later published in Association of Research Libraries) although computer scientists had been practicing self-archiving in anonymous FTP archives since at least the 1980s (see CiteSeer) and physicists had been doing it since the early 1990s on the web (see arXiv).
+The concept of green open access was coined in 2004 to describe a "mode of publishing in non open access journal but also self archiving it in an open access archive". Different drafts of a paper may be self-archived, such as the internal non-peer-reviewed version, or the peer-reviewed version published in a journal. Green open access through self-archiving was initially enabled through institutional or disciplinary repositories, as a growing number of universities adopted policies to encourage self-archiving. Self-archiving repositories do not peer-review articles, though they may hold copies of otherwise peer-reviewed articles. Self-archiving repositories also expect that the author who self-archives has the necessary rights to do so, as copyright may have been transferred to a publisher. Therefore it may only be possible to self-archive the preprint of the article.
+
+
+== Implementation ==
+
+Whereas the right to self-archive postprints is often a copyright matter (if the rights have been transferred to the publisher), the right to self-archive preprints is merely a question of journal policy.
+A 2003 study by Elizabeth Gadd, Charles Oppenheim, and Steve Probets of the Department of Information Science at Loughborough University analysed 80 journal publishers' copyright agreements and found that 90 percent of publishers asked for some form of copyright transfer and only 42.5 percent allowed self-archiving in some form. In 2014 the SHERPA/Romeo project recorded that of 1,275 publishers 70 percent allowed for some form of self-archiving, with 62 percent allowing both pre and postprint self-archiving of published papers. In 2017 the project recorded that of 2,375 publishers 41 percent allowed pre and postprint to be self-archived. 33 percent only allowed the self-archiving of the postprint, meaning the final draft post-refereeing. 6 percent of publishers only allowed self-archiving of the preprint, meaning the pre-refereeing draft.
+Publishers such as Cambridge University Press or the American Geophysical Union, endorse self-archiving of the final published version of the article, not just peer-reviewed final drafts.
+Locations for self-archiving include institutional repositories, subject-based repositories, personal websites, and social networking websites that target researchers. Some publishers attempt to impose embargoes on self-archiving; embargo-lengths can be from 6–12 months or longer after the date of publication (see SHERPA/RoMEO). For embargoed deposits some institutional repositories have a request-a-copy Button with which users can request and authors can provide a single copy with one click each during the embargo.
+Social reference management software websites such as Mendeley, Academia.edu, and ResearchGate facilitate sharing between researchers; however, these services are often subject to criticism for using scholars' contributions for commercial purposes as well as for copyright violation. They are also targeted by publishers for copyright compliance, such as when Elsevier (which purchased Mendeley) issued Digital Millennium Copyright Act takedown notices to Academia.edu for hosting scientific papers. Social networking services also do not fulfill the requirements of many self-archiving policies from grant funders, journals, and institutions.
+In 2013 Germany created a legal basis for green open access by amending a secondary publication right into German copyright which gives scientists and researchers the legal right to self-archive their publications on the Internet, even if they have agreed to transfer all exploitation rights to a publisher.  The secondary publication right applies to results of mainly publicly funded research, 12 months after the first publication.  The right cannot be waived, and the author’s version is self-archived.
+
+
+== See also ==
+Internet Archive
+Manuscript (publishing)
+Open access mandate
+Registry of Open Access Repositories (ROAR)
+Subversive Proposal
+List of academic journals by preprint policy
+
+
+== References ==
+
+
+== External links ==
+"Self-Archiving FAQ for the Budapest Open Access Initiative (BOAI)". Archived from the original on 2008-10-12. Retrieved 2005-11-20.
+"Publisher copyright policies & self-archiving". SHERPA/RoMEO.
+"ROARMAP: Registry of Open Access Repositories Mandatory Archiving Policies".
+"Green Open Access FAQ". 6 November 2016.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Shadow_library-0.md b/data/en.wikipedia.org/wiki/Shadow_library-0.md
new file mode 100644
index 000000000..6fe7c5ba5
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Shadow_library-0.md
@@ -0,0 +1,32 @@
+---
+title: "Shadow library"
+chunk: 1/2
+source: "https://en.wikipedia.org/wiki/Shadow_library"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:16:10.574634+00:00"
+instance: "kb-cron"
+---
+
+Shadow libraries, also referred to as pirate libraries or black open access, are online repositories of freely available digital media that are normally paywalled, access-controlled, or otherwise not readily accessible. Shadow libraries usually contain textual works such as academic papers and ebooks, and may include other digital media like software, music, or films.
+Anna's Archive, Library Genesis, Sci-Hub, UbuWeb and Z-Library are some of the most popular shadow libraries for books and academic literature.
+
+== History ==
+
+Early predecessors to shadow libraries were informal collections of unauthorized digital copies of books, scholarly literature, and other textual media, often shared with small groups via mailing lists, forums, or social media websites. Online communities of scientists also collaborated to share paywalled literature among themselves.
+
+Many shadow libraries originate in Russia, which has a rich history of samizdat stemming from the Soviet era. There was strict state censorship and control of print materials, which gave rise to the dissident activity of copying and disseminating censored or underground works. Even after the dissolution of the Soviet Union and the end of the official censorship program, these sharing practices continued as a result of widespread economic hardship. Texts were widely digitized and shared on Russian FidoNet systems as computer and internet access became more widespread in Russia. One early collection of digitized texts was Maksim Moshkow's 1994 Lib.ru. The Russian Kolkhoz collection, named for the kolkhoz collective farms, was created by a community that worked in the early 2000s to download or digitize scientific texts, which they stored on FTP servers and DVDs. This collection eventually grew to around 50,000 documents.
+Some of these early collections later became shadow libraries as they attracted volunteer librarians who catalogued the archives' contents. Early academic shadow libraries in the 2000s included Textz.org, Monoskop, and Gigapedia (later Library.nu). Gigapedia focused more on academic texts than other shadow libraries, which mainly contained literature. Around 2006 or 2007, it incorporated the files amassed by the Kolkhoz collectors, and had become the largest shadow library by 2010. Gigapedia, by then renamed to Library.nu, was shut down in 2012 through a lawsuit from a coalition of seventeen publishing companies including HarperCollins, Oxford University Press, and MacMillan.
+Library Genesis (also known as LibGen) was founded in approximately 2007 or 2008 by a group of Russian scientists, who began by organizing a collection of Russian science and technology texts made available on a torrent site, aggregated from sources including the Kolkhoz collection and lib.ru. In 2011, LibGen absorbed the Library.nu collection, keeping it accessible even as Library.nu was forced to shut down. At the time, LibGen was unique in its focus on its open library infrastructure, prioritizing the free sharing of its collection, catalog, and source code to encourage many others to increase shadow libraries' collective resiliency by mirroring and forking the project. As of 2025, Library Genesis "claims to have more than 2.4 million non-fiction books, 80 million science magazine articles, 2 million comic files, 2.2 million fiction books, and 0.4 million magazine issues."
+
+== Motivation ==
+
+Shadow libraries are part of the open access and open knowledge movements. They seek to more freely disseminate academic scholarship and other media, often citing a moral imperative to make knowledge freely available.
+LibGen's operators have described the site's mission as enabling access to information for poor people and opposing the gating of knowledge by elite academic institutions, with one administrator writing "the target groups for LibGen are poors: Africa, India, Pakistan, Iran, Iraq, China, Russia and post-USSR etc., and on a separate note, people who do not belong to academia. If you are not at a university, you can't access anything or at least your access will be so much troubled that you won't be able to progress at all." Alexandra Elbakyan, the creator of Sci-Hub, has justified the site by arguing that the lack of open access to scholarship violates the human right to science and culture, captured in Article 27 of the United Nations Universal Declaration of Human Rights, which states: "Everyone has the right freely to participate in the cultural life of the community, to enjoy the arts and to share in scientific advancement and its benefits." Elbakyan has also argued that "Any law against knowledge is fundamentally unjust". American activist Aaron Swartz captured the motivations of many shadow libraries in his 2008 Guerilla Open Access Manifesto, writing:
+
+The world's entire scientific and cultural heritage, published over centuries in books and journals, is increasingly being digitized and locked up by a handful of private corporations. ... Those with access to these resources—students, librarians, scientists—you have been given a privilege. You get to feed at this banquet of knowledge while the rest of the world is locked out. But you need not—indeed, morally, you cannot—keep this privilege for yourselves.
+Shadow libraries have also cited the increasing cost of academic literature and books, also termed the "serials crisis".
+
+== Technologies ==
+
+Some shadow libraries (or their content databases) make use of BitTorrent (mainly for database dumps), dark web, and InterPlanetary File System (IPFS) technologies to increase their resilience or distribute loads. Shadow libraries including LibGen and Anna's Archive develop and make their software accessible as open source software, enabling code development by any volunteer and encouraging mirrors or forks. Anna's Archive claims that "if we get taken down we'll just pop right up elsewhere, since all our code and data is fully open source".
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Shadow_library-1.md b/data/en.wikipedia.org/wiki/Shadow_library-1.md
new file mode 100644
index 000000000..a151394c3
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Shadow_library-1.md
@@ -0,0 +1,33 @@
+---
+title: "Shadow library"
+chunk: 2/2
+source: "https://en.wikipedia.org/wiki/Shadow_library"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:16:10.574634+00:00"
+instance: "kb-cron"
+---
+
+== Legal status ==
+Shadow libraries often host or link to copyrighted material without the consent of copyright holders, making them illegal or dubiously legal in many countries. Such libraries are also described as pirate libraries. Many shadow libraries maintain bibliographic catalogs separate from the hosting of files themselves. This is both an organizational convenience and a protection against legal challenges, since the law is often ambiguous on the distinction between hosting and indexing copyrighted content. However, several shadow library catalogs have been the target of injunctions and takedown threats.
+The aggressive legal strategies pursued by Western music and film industries against online filesharing websites during the 2000s were not widely mirrored by academic or literary publishers against shadow libraries. However, as shadow libraries have grown larger and more visible, they have attracted more legal challenges. Library.nu (previously Gigapedia) was shut down in 2012 by a lawsuit from a coalition of seventeen publishing companies including HarperCollins, Oxford University Press, and MacMillan. In 2015, the academic publisher Elsevier sued LibGen and Sci-Hub in American courts, accusing them of "operat[ing] an international network of piracy and copyright infringement". Elsevier won a default judgment against the two groups, and was awarded $15 million in damages, but has not collected the money as LibGen's operators are unknown and Sci-Hub's are outside the reach of the US legal system. Although the judge in the Elsevier case granted an injunction against several domains used by the shadow libraries, briefly taking them offline, the libraries quickly moved to new domains and onion sites. A lawsuit by the American Chemical Society in 2017 against Sci-Hub also resulted in a judgment order for $4.8 million in damages. In November 2022, the FBI seized domains associated with Z-Library and charged two of its operators with criminal copyright infringement, wire fraud, and money laundering. Courts have ordered Internet service providers in countries including Denmark, France, Germany, Russia and the United Kingdom to block access to pirate libraries, although these blocks are of limited effectiveness.
+The legality of directing individuals to shadow libraries is undetermined. While there are legal theories that linking to copyright infringing material hosted by shadow libraries could constitute vicarious or contributory copyright infringement, there have been no cases brought with these theories. In 2019, Elsevier threatened legal action against Citationsy, the developer of a bibliography management tool, for publishing a blog post directing readers to Sci-Hub, and Citationsy removed the link.
+Although most academics are not penalized for distributing their own published works for free, academic publishers have threatened scientists for sharing or republishing their work.
+Some publishers have accused shadow libraries, including Sci-Hub, of illegally obtaining login credentials to academic databases, though Sci-Hub says the credentials are voluntarily donated.
+A class action lawsuit filed in June 2023 against ChatGPT developer OpenAI, led by authors Paul Tremblay and Mona Awad, alleged that the company used shadow libraries to source training data for their large language model. Meta has also been alleged to have used data from shadow libraries to train its AI model. DeepSeek's Vision-Language (VL) model was trained with data from the shadow library Anna's Archive.
+
+== Reception ==
+
+=== By academics ===
+Some academics have tacitly or explicitly endorsed shadow library efforts, with many viewing them as morally acceptable acts of civil disobedience against the abusive business models of academic publishers. Furthermore, shadow libraries may increase the impact of academics whose work is made available. According to one study from Cornell University, articles that are available on Sci-Hub receive 1.72 times as many citations as articles from journals of similar quality that are not available on Sci-Hub.
+
+=== By non-academic authors ===
+Non-academic writers have been more vocally opposed to shadow libraries.
+In February 2022, after joining a lawsuit with Amazon Publishing and Penguin Random House against a Ukrainian website selling pirated e-books, American bestselling fiction authors John Grisham and Scott Turow published an op-ed in The Hill calling on US lawmakers to pass a law prohibiting search engines from linking to piracy websites.
+In October 2022, the US-based Authors Guild submitted a complaint to the United States Trade Representative about LibGen and Z-Library, describing digital book piracy as "one of the biggest threats facing authors' livelihoods today". The Authors Guild and the UK-based Publishers Association both worked with the FBI in efforts against Z-Library, which culminated with November 2022 the arrest of two of its operators.
+However, some authors and writers' organizations have opposed such efforts. British novelist Alison Rumfitt wrote in Dazed that she was not celebrating the site's takedown, and that "the hunger to read is something to be encouraged, something which, in my opinion, is a societal good; even as publishing grows ever more overtly capitalist and monopolised, reading still thrives, and piracy allows it to take place despite borders and Digital Rights Management. Not everyone has access to a library, and not every library in the world is well-stocked." Dave Hansen, executive director of the Authors Alliance nonprofit, expressed that students and researchers would be negatively impacted by attempts to shut down shadow libraries, and expressed that such projects were "a kind of symptom of how broken the system is, particularly when you're looking at access to scientific articles".
+
+== References ==
+
+== External links ==
+SLUM: The Shadow Library Uptime Monitor
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Subscribe_to_Open-0.md b/data/en.wikipedia.org/wiki/Subscribe_to_Open-0.md
new file mode 100644
index 000000000..d57376dce
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Subscribe_to_Open-0.md
@@ -0,0 +1,61 @@
+---
+title: "Subscribe to Open"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Subscribe_to_Open"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:14:40.450152+00:00"
+instance: "kb-cron"
+---
+
+Subscribe to Open (S2O) is an economic model used by peer-reviewed scholarly journals to provide readers with open access (OA) to the journal’s content, without charging costs to authors. S2O converts journals that have a traditional subscription model to open access.
+When the academic libraries subscribing to a journal are asked to renew their subscriptions to the journal, they are told that, with the libraries’ support, the journal will be made open access to all readers, while authors are able to publish in it at no cost. If enough libraries renew on these terms, the journal is converted to open access, and if not, access to the journal remains restricted to subscribing libraries.
+
+
+== Background ==
+The term “Open Access” (OA) was first coined in December 2001 in a call for greater access to academic journal articles, which were mainly available through subscriptions. Influential statements such as the Budapest Open Access Initiative (BOAI)  in February 14, 2002, have called for improvement of the circulation of research through open access to scholarly journals. Since then, more than twenty financial models have been developed as alternatives to the traditional subscription model, beginning with authors paying article processing charges (APCs) to BioMed Central  in 2002.
+The Subscribe to Open model was first introduced in 2017 by the publisher Annual Reviews in consultation with Raym Crow and with a grant from the Robert Wood Johnson Foundation.  S2O was presented as “a practical approach for converting subscription journals to open access.”
+
+
+== Operation ==
+The Subscribe to Open financial model for open access to peer-reviewed research is based on a mutual  assurance contract.   Mutual assurance contracts are used to create public goods by securing financial support for such goods. Under S2O,  libraries help to create a public good that is universally open to both readers and authors, through subscriptions. “If all libraries continue to subscribe, then not only will those libraries have access to the content for their users, but [the journals] will also make the content openly available to non-subscribers.”  
+As with the traditional subscription model, S2O journal pricing and subscription options are announced a year in advance for one or more of the following years. If enough academic libraries agree to subscribe under the S2O model, the journal’s content is open to all readers. If the publisher judges that an insufficient number of institutions participate in the S2O offering, content may remain (or return to being) limited to subscribers. 
+
+
+== Advantages and disadvantages ==
+Advantages of S2O over other OA publishing models such as 'read and publish agreements', include minimizing disruption, reduced data management and administration, utilizing existing structures such as subscription agencies, and operating within academic library’s current budgetary commitments.
+Some S2O publishers may offer incentives to subscribing libraries for renewing under S2O terms, such as discounted subscription rates or enhanced back-volume availability.  Incentives may counter the free-rider sustainability challenge, which can accompany mutual assurance contracts, and is the most commonly expressed concern about S2O’s viability. 
+S2O is presented not as a donation but as “a categorical business offer within the journal's existing subscription process” that appeals to the libraries’ economic self-interest. In a survey of academics, including librarians, from 27 countries, 44 percent of respondents felt that “their administrations would allow them to participate in S2O,” while the majority felt more information would be needed before such a move would be approved.
+Concerns have also been raised about whether such a model can support increased subscription and financial growth for publishers, whether the model will be vulnerable to rapid changes and budget pressures,  how to access publishing support from funding bodies, the potential to launch new journals, and the  type of metrics that would be useful to administrators and librarians.
+
+
+== Impact ==
+Introduced in 2017, by 2023, S2O has been employed by 15 publishers, offering over 150 journals under open access. As of 2022, only one publisher had retracted an S2O offer, due to what they judged to be insufficient library support for the model.   Publishers, interested librarians, funders, and related organizations have formed the S2O Community of Practice, which among other activities commissioned a survey of academic librarians and others on perceptions of S2O.
+In terms of journal readership, the Annual Review of Public Health (ARPH) which was the first title to implement S2O, had an eight-times increase in usage from May 2016 to May 2019. During the same time period usage levels for Annual Reviews’ traditional subscription journals remained relatively unchanged. The number of countries accessing ARPH increased from 57 to 137 during this period.  In terms of the model’s perceived value among publishers, a survey of 102 scholarly society publishers, conducted in 2019, concluded that the “transformative agreements, including models such as Subscribe to Open, emerged as the most promising approaches” because “they offer a predictable, steady funding stream.”  S2O was noted as being unique in positioning “the publisher as the choreographer of change.” 
+In searching for a workable open-access model, the International Water Association (IWA) initially experimented with an article processing charge (APC) model for two of its journals, as well as negotiating read and publish agreements. IWA found that although it increased impact, the APC model reduced journal revenue to unsustainable levels. Read and publish agreements were time consuming and difficult to arrange.  In 2021, IWA offered all ten of its subscription journals under a Subscribe to Open model. With S2O, IWA met its revenue target to open all its journals and saw an "unprecedented increase" in their usage.
+S2O has been endorsed by cOAlition S, a group of national research funders, charitable foundations,  and European and international organizations, as providing “a rapid route to open access that is applicable to research from all disciplines and all countries.” S2O has also inspired similar models for monographs and books through  MIT Press (Direct to Open, D2O) and the Central European University Press (Opening the Future).
+
+
+== S2O publishers ==
+Amsterdam University Press
+Annual Reviews
+Berghahn Books
+Brepols
+Bloomsbury
+De Gruyter
+Duncker & Humblot
+EDP Sciences
+EMS Press (European Mathematical Society)
+IWA Publishing (International Water Association)
+Liverpool University Press
+Mathematical Sciences Publishers
+Pluto Journals
+Royal Society
+University of Toronto Press, The Journal of City Climate Policy and Economy (JCCPE)
+
+
+== References ==
+
+
+== External links ==
+Subscribe to Open
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Subversive_Proposal-0.md b/data/en.wikipedia.org/wiki/Subversive_Proposal-0.md
new file mode 100644
index 000000000..53727274a
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Subversive_Proposal-0.md
@@ -0,0 +1,36 @@
+---
+title: "Subversive Proposal"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Subversive_Proposal"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:16:13.908815+00:00"
+instance: "kb-cron"
+---
+
+The "Subversive Proposal" was an Internet posting by Stevan Harnad on June 27, 1994 (presented at the 1994 Network Services Conference in London) calling on all authors of "esoteric" research writings to archive their articles for free for everyone online (in anonymous FTP archives or websites). It initiated a series of online exchanges, many of which were collected and published as a book in 1995: Scholarly Journals at the Crossroads: A Subversive Proposal for Electronic Publishing. This led to the creation in 1997 of Cogprints, an open access archive for self-archived articles in the cognitive sciences and in 1998 to the creation of the American Scientist Open Access Forum (initially called the "September98 Forum" until the founding of the Budapest Open Access Initiative which first coined the term "open access"). The Subversive Proposal also led to the development of the GNU EPrints software used for creating OAI-compliant open access institutional repositories, and inspired CiteSeer, a tool to locate and index the resulting eprints.
+The proposal was updated gradually across the years, as summarized in the American Scientist Open Access Forum on its 10th anniversary.
+A retrospective was written by Richard Poynder.
+A self-critique
+was posted on its 15th anniversary in 2009. An online interview of Stevan Harnad was conducted by Richard Poynder on the occasion of the 20th anniversary of the subversive proposal.
+
+
+== References ==
+
+Bosc, Hélène Les idées et la technique : une rétrospective de ces 15 dernières années
+
+
+== Further reading ==
+Harnad, Stevan (1995):
+(2001/2003/2004) For Whom the Gate Tolls? Published as:
+(2003) Open Access to Peer-Reviewed Research Through Author/Institution Self-Archiving: Maximizing Research Impact by Maximizing Online Access. In: Law, Derek & Judith Andrews, Eds. Digital Libraries: Policy Planning and Practice. Ashgate Publishing 2003.
+(2003) Journal of Postgraduate Medicine 49: 337–342.
+(2004) Historical Social Research (HSR) 29:1
+(2003) Ciélographie et ciélolexie: Anomalie post-gutenbergienne et comment la résoudre in: Origgi, G. & Arikha, N. (eds) Le texte à l'heure de l'Internet. Bibliothèque Centre Pompidou: Pp. 77–103.
+Okerson, Ann Shumelda & O'Donnell, James J. (1995) (Eds.) Scholarly Journals at the Crossroads: A Subversive Proposal for Electronic Publishing. Washington, DC., Association of Research Libraries, June 1995.
+Suber, Peter Timeline of the Open Access Movement (February 2009; archived copy from 2016)
+
+
+== External links ==
+American Scientist Open Access Forum Official Site
+Global Open Access Forum Official Site
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/The_Cost_of_Knowledge-0.md b/data/en.wikipedia.org/wiki/The_Cost_of_Knowledge-0.md
index 70e1b372f..63206e636 100644
--- a/data/en.wikipedia.org/wiki/The_Cost_of_Knowledge-0.md
+++ b/data/en.wikipedia.org/wiki/The_Cost_of_Knowledge-0.md
@@ -4,7 +4,7 @@ chunk: 1/2
 source: "https://en.wikipedia.org/wiki/The_Cost_of_Knowledge"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T06:31:48.132696+00:00"
+date_saved: "2026-05-05T10:14:56.579118+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/The_Cost_of_Knowledge-1.md b/data/en.wikipedia.org/wiki/The_Cost_of_Knowledge-1.md
index b5f232c79..b0ff717b7 100644
--- a/data/en.wikipedia.org/wiki/The_Cost_of_Knowledge-1.md
+++ b/data/en.wikipedia.org/wiki/The_Cost_of_Knowledge-1.md
@@ -4,7 +4,7 @@ chunk: 2/2
 source: "https://en.wikipedia.org/wiki/The_Cost_of_Knowledge"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T06:31:48.132696+00:00"
+date_saved: "2026-05-05T10:14:56.579118+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Timeline_of_the_open-access_movement-0.md b/data/en.wikipedia.org/wiki/Timeline_of_the_open-access_movement-0.md
new file mode 100644
index 000000000..14d102f06
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Timeline_of_the_open-access_movement-0.md
@@ -0,0 +1,121 @@
+---
+title: "Timeline of the open-access movement"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Timeline_of_the_open-access_movement"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:14:36.706613+00:00"
+instance: "kb-cron"
+---
+
+The following is a timeline of the international movement for open access to scholarly communication.
+
+
+== 1940s-1990s ==
+1942
+American sociologist Robert King Merton declares: "Each researcher must contribute to the 'common pot' and give up intellectual property rights to allow knowledge to move forward."
+1971
+"World's first online digital library is launched, Project Gutenberg."
+1987
+Syracuse University in the US issues one of the world's first open access journals, New Horizons in Adult Education (ISSN 1062-3183).
+1991
+14 August: ArXiv repository of physics research papers established at Los Alamos National Laboratory in the US.
+1994
+27 June: Stevan Harnad posts a "Subversive Proposal" for authors to archive their articles for free for everyone online.
+July 1994. Electronic Green Journal (EGJ) was launched by the University of Idaho Library. Since 2009 it is published by the University of California eScholarship. The EGJ is a peer-reviewed publication devoted to information about international sources on environmental protection, conservation, management of natural resources, and sustainability.
+1998
+Brazil-based SciELO (Scientific Electronic Library Online) launched.
+Public Knowledge Project founded in Canada.
+Scholarly Publishing and Academic Resources Coalition founded in North America.
+1999
+October: Open Archives Initiative on interoperability standards holds its first meeting, in New Mexico, US.
+
+
+== 2000s ==
+2000
+BioMed Central publisher established.
+2001
+15 January: Creative Commons founded in the United States.
+Public Library of Science publisher active.
+Open Journal Systems free software published.
+SPARC Europe established to promote open access in Europe.
+2002
+14 February: Budapest Open Access Initiative statement issued.
+28 June: US-based OAIster catalog begins.
+2003
+11 April: Bethesda Statement on Open Access Publishing formed.
+22 October: Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities published.
+25 December: Institutional Self-Archiving Policy Registry launched (later called ROARMAP).
+Redalyc (Red de Revistas Científicas de América Latina y El Caribe, España y Portugal) established in Mexico.
+2004
+UK Digital Curation Centre founded.
+Bielefeld Academic Search Engine launched by Bielefeld University, Germany.
+Publisher Springer begins "hybrid option 'Open Choice' for their full portfolio of over 1,000 subscription journals."
+30 January: Organisation for Economic Co-operation and Development issues "Declaration on Access to Research Data from Public Funding."
+2005
+Directory of Open Access Repositories begins publication.
+2007
+European Research Council issues "its first Scientific Council Guidelines for open access."
+2008
+Durham Statement on Open Access to Legal Scholarship written.
+7 April: United States National Institutes of Health Public Access Policy effected.
+July: Aaron Swartz releases the "Guerilla Open Access Manifesto", to send "a strong message against the privatization of knowledge".
+2009
+12 January: European Commission-funded OpenAIRE project begins, supporting implementation of open access in Europe.
+Confederation of Open Access Repositories founded.
+
+
+== 2010s ==
+
+2010
+"Beall's list" of predatory open access publishers begins circulating.
+2011
+20 January: #icanhazPDF begins on Twitter.
+5 September: Sci-Hub launched by Alexandra Elbakyan.
+16 December: United States Research Works Act bill introduced.
+UK-based CORE (COnnecting REpositories) aggregation service founded.
+2012
+Knowledge Unlatched established.
+Pasteur4OA (Open Access Policy Alignment Strategies for European Union Research) begins.
+The Cost of Knowledge protest begins against high prices charged by large publisher Elsevier.
+22 October: Brussels Declaration signed, on open access to Belgian publicly funded research.
+2013
+PeerJ megajournal begins publication.
+Registry of Research Data Repositories begins operating.
+4 October: "Who's Afraid of Peer Review?" published in Science.
+2014
+FOSTER Project (Facilitate Open Science Training for European Research) begins.
+2016
+7 March: Open Data Button (browser extension) launched.
+2017
+April: UnpayWall Button (Browser extension) launched (90 million articles are indexed)
+10 October: Jussieu Call statement issued
+Plug-in search tool Canary Haz launched to enable access to PDF versions of articles (later renamed Kopernio.com).
+
+
+== See also ==
+Access to Knowledge movement
+History of open access
+Open access: history
+Timeline of free and open-source software
+
+
+== References ==
+
+
+== Citations ==
+"Origins of OA". US: University of Pittsburgh. (Includes timeline)
+"History of", Open Access Tracking Project, Harvard University. Also:  Milestones. (News feed)
+Peter Suber. "History of open access". Harvard University. Compilation of  Peter Suber's contributions to the history of open access, 1992–present.
+"Timeline of the open access movement". Open Access Directory. This timeline was created and initially maintained by Peter Suber, who crowd-sourced it in February 2009 by moving it to the Open Access Directory.
+
+
+== Further reading ==
+Mikael Laakso; et al. (2011). "Development of Open Access Journal Publishing from 1993 to 2009". PLOS One. 6 (6) e20961. Bibcode:2011PLoSO...620961L. doi:10.1371/journal.pone.0020961. PMC 3113847. PMID 21695139.
+"Evolution of Open Access: A Brief History", SciElo in Perspective, Brazil: SciElo, 21 October 2013. (Timeline)
+Marie Lebert (2015), Open Access: a "chronology" (or timeline)
+
+
+== External links ==
+Declarations in support of OA
+Timeline of the open access movement at the Open Access Directory since 2009.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Toby_Green_(publisher)-0.md b/data/en.wikipedia.org/wiki/Toby_Green_(publisher)-0.md
new file mode 100644
index 000000000..e71b5cd30
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Toby_Green_(publisher)-0.md
@@ -0,0 +1,45 @@
+---
+title: "Toby Green (publisher)"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Toby_Green_(publisher)"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:15:15.693892+00:00"
+instance: "kb-cron"
+---
+
+Toby Green is a scholarly publisher who has worked with Academic Press, Applied Science Publishers,  Pergamon Press, Elsevier Science, and the Organisation for Economic Co-operation and Development (OECD). At OECD, Green launched important digital publishing initiatives such as SourceOECD (2001), its successor the OECD iLibrary (2010), and the OECD Better Life Index (2011). In 2017, OECD Publishing won the Academic and Professional Publisher Award from The London Book Fair.
+Green has chaired the Council of the Association of Learned and Professional Society Publishers (ALPSP, 2010) and served on the publishing board of the Royal Society of Chemistry and the board of Annual Reviews. In 2019, Green co-founded Coherent Digital to improve access to grey literature policy documents from IGOs, NGOs, research centers and think tanks.  
+
+
+== Early life and education ==
+Green was born in Broadwindsor, Dorset, England and educated at Marlborough College. He attended the University of Warwick in Coventry, receiving a BSc in Microbiology and Virology in 1980.
+
+
+== Career ==
+
+Toby Green is a scholarly publisher who has worked with a wide range of organizations, publishing content for commercial organizations, scholarly societies, intergovernmental organizations (IGOs) and nongovernmental organizations (NGOs).  Since 1982, Green has worked for Academic Press (1982-1984), Applied Science Publishers (1984 - 1986),  Pergamon Press (1987–1991; taken over by Elsevier Science in 1991), Elsevier Science (1991-1997) and the Organisation for Economic Co-operation and Development (OECD, 1998-2019). 
+Green joined OECD Publishing in France in 1998 and has served as Chief operating officer (COO) of its Public Affairs and Communications Directorate and Head of Publishing. As of January 2001, Green launched SourceOECD, the first digital initiative to distribute books, journals and statistical databases through a commonly searchable subscription service that grouped publications into thematic clusters. In 2004, Green premiered OECD's StatLink service, connecting text publications and underlying data. In 2007, OECD made all of its books free to read in a basic form under a freemium business model, with additional premium services and print copies available for a fee.  In 2010, Green launched the OECD iLibrary, a redesign of SourceOECD. In 2011, Green introduced the OECD Better Life Index, which received an Award for Innovation in Publishing from the Association of Learned and Professional Society Publishers (ALPSP).
+He has served as a Member of the Council of the Association of Learned and Professional Society Publishers (ALPSP, 2002-2006) and became chair of ALPSP as of 1 January 2010. He was the first chair of ALPSP to be based outside of the UK. As of 2018 Green served on the publishing board of the Royal Society of Chemistry. He serves on the board of Annual Reviews.
+As of 2020, Green was the Managing Director of Coherent Digital, which he co-founded in 2019 with Stephen Rhind-Tutt. Coherent Digital's initial Policy Commons collection provided access to almost 2.5 million policy documents published by IGOs, NGOs, researchers and think tanks, with the goals of making content more stable and findable. In June 2022, Content Digital reached an agreement with the Center for Research Libraries (CRL) under which many CRL affiliated institutions became founding institutional members of Policy Commons and received discounts or complimentary access In May 2023, Policy Commons was expanded to include  World Cities, world-wide archives of municipal reports, in addition to its existing policy collections for North American City Reports and Global Think Tanks. In June 2023, Coherent Digital acquired Accessible Archives Inc., adding its digital collections in American social, economic, and political history to its History Commons. In 2023 Coherent Digital was named one of  Outsell’s ‘Top 50 Emerging Companies’ in 2023.
+Green has been a speaker at the Society for Scholarly Publishing (SSP), Association of Learned and Professional Society Publishers (ALPSP), Open Access Scholarly Publishing Association {OASPA), Fiesole Collection Development Retreat, Frankfurt Book Fair, London Book Fair, and the Researcher to Reader (R2R) conference. He discusses topics such as open source, grey literature and the impact of digitization on research and publishing. He helped to develop the guidelines for the production of scientific and technical reports of the Grey Literature International Steering Committee.
+
+
+== Awards and honors ==
+2005, StatLink was "Highly Commended" for Innovation in Publishing by the Association of Learned and Professional Society Publishers (ALPSP).
+2011, OECD Better Life Index received an Award for Innovation in Publishing from the ALPSP.
+2017, OECD Publishing won the Academic and Professional Publisher Award from The London Book Fair.
+2021, Coherent Digital's Mindscape Commons, a collection of Virtual Reality (VR) videos for use in mental health education, was a co-recipient of the Award for Innovation in Publishing from ALPSP.
+2023,  Coherent Digital was named one of  Outsell’s ‘Top 50 Emerging Companies’.
+
+
+== Selected publications ==
+Green, Toby (19 December 2022). "Wait! What? There's stuff missing from the scholarly record?". Medical Writing. 31 (4): 44–48. doi:10.56012/ajel9043.
+Green, Toby (October 2019). "Maximizing dissemination and engaging readers: The other 50% of an author's day: A case study". Learned Publishing. 32 (4): 395–405. doi:10.1002/leap.1251.
+Green, Toby (January 2019). "Is open access affordable? Why current models do not work and why we need internet-era transformation of scholarly communications". Learned Publishing. 32 (1): 13–25. doi:10.1002/leap.1219.
+Green, Toby (October 2017). "We've failed: Pirate black open access is trumping green and gold and we must change our approach". Learned Publishing. 30 (4): 325–329. doi:10.1002/leap.1116.
+Green, Toby (15 August 2012). "Publishing Data Alongside Analysis, Books and Journals". Necessity is the Mother of Invention. pp. 523–548. doi:10.5703/1288284314789. ISBN 978-0-9834043-0-9. {{cite book}}: |journal= ignored (help)
+Green, Toby (1 July 2002). "Can the monograph help solve the library 'serials' funding crisis?". Serials: The Journal for the Serials Community. 15 (2): 135–139. doi:10.1629/15135.
+
+
+== References ==
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Tri-Agency_Open_Access_Policy_on_Publications-0.md b/data/en.wikipedia.org/wiki/Tri-Agency_Open_Access_Policy_on_Publications-0.md
new file mode 100644
index 000000000..aba8f0ec0
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Tri-Agency_Open_Access_Policy_on_Publications-0.md
@@ -0,0 +1,26 @@
+---
+title: "Tri-Agency Open Access Policy on Publications"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Tri-Agency_Open_Access_Policy_on_Publications"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:16:15.102611+00:00"
+instance: "kb-cron"
+---
+
+Canada introduced the Tri-Agency Open Access Policy on Publications in May 2015 to mandate open access to research articles funded by Canada's three major research agencies: the Natural Sciences and Engineering Research Council (NSERC), the Social Sciences and Humanities Research Council (SSHRC) and the Canadian Institutes of Health Research (CIHR). CIHR has had an open access policy since 2008 and the new Tri-Agency policy is largely based on CIHR's pre-existing policy.
+The policy stipulates that peer-reviewed journal articles produced from funded research must be made open access within 12 months of publication by either:
+
+publication in an open access journal
+archiving in a subject repository or institutional repository
+
+
+== Applicability ==
+The policy affects Tri-Agency grants awarded on or after May 1, 2015. All funded researchers are affected except graduate students and postdoctoral fellows. Only peer-reviewed journal articles are covered by the policy: other research outputs such as books or media, are not affected. Only postprints or final published articles may be archived in a subject or institutional repository; other article versions, e.g. preprints, are not acceptable.
+
+
+== Compliance ==
+Enforcement of the Tri-Agency policy has not been explicitly described. Compliance with the CIHR open access policy has been managed in conjunction with the Research Reporting Service.
+
+
+== References ==
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/UFluids@Home-0.md b/data/en.wikipedia.org/wiki/UFluids@Home-0.md
index 8f633903b..80d798292 100644
--- a/data/en.wikipedia.org/wiki/UFluids@Home-0.md
+++ b/data/en.wikipedia.org/wiki/UFluids@Home-0.md
@@ -4,7 +4,7 @@ chunk: 1/1
 source: "https://en.wikipedia.org/wiki/UFluids@Home"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T06:50:52.077085+00:00"
+date_saved: "2026-05-05T10:14:33.980869+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/UNZA_Institutional_repository-0.md b/data/en.wikipedia.org/wiki/UNZA_Institutional_repository-0.md
new file mode 100644
index 000000000..362c2a016
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/UNZA_Institutional_repository-0.md
@@ -0,0 +1,39 @@
+---
+title: "UNZA Institutional repository"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/UNZA_Institutional_repository"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:16:17.342037+00:00"
+instance: "kb-cron"
+---
+
+An institutional repository (IR) is simply a "digital archive of the intellectual products created by faculty research staff and students of an institution and accessible to end users both within and outside of the institution, with few if any barriers to access”. To enhance optimization and accessibility of the content in the IR, open access repositories are registered with the Directory of Open Access Repositories (OpenDOAR) which basically is a list of open academic repositories. Many universities have established IRs to promote open access to knowledge and information. The University of Zambia Institutional Repository (UNZA-IR) was established in 2010 with the support of the Netherlands Government  to help archive the intellectual output of the university. The repository falls under the UNZA main Library and is headed by the repository manager who oversees the operations of the repository. The UNZA repository was created using Dspace, an "open source repository software package used for creating open access repositories.
+The UNZA repository houses research outputs including: post graduate research dissertations and thesis, research reports, conference presentations, book chapters and research articles (pre-prints and post prints). Currently,  the repository houses approximately more than 8000 research publications with post graduate desertions and thesis being most collected.
+Content in the UNZA IR is organised according to communities of users or depositors. Since users of an institutional repository come from  within a research community or organisation. Much of the content in the UNZA repository is deposited by students and academic members staff from the various schools and departments. However, the content is publicly available to users outside the university.
+Users can browse and search for content using the search bar or by following a specific community that they might be interested in.
+Communities in the UNZA IR:
+The UNZA IR has about 20 communities and these include:
+
+Agricultural Sciences
+Education
+Engineering
+Examination Past Papers
+Graduate School of Business
+Humanities and Social Sciences
+Institute of Distance Education
+Institute of Economic and Social Research (INESOR)
+Law
+Library
+Medicine
+Mines
+Natural Sciences
+Students' Project/Research Reports
+Technical Development and Advisory Unit (TDAU)
+Theses and Dissertations
+University Collection
+University of Zambia Press (UNZA Press)
+Veterinary Medicine
+
+
+== References ==
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Who's_Afraid_of_Peer_Review-0.md b/data/en.wikipedia.org/wiki/Who's_Afraid_of_Peer_Review-0.md
index 79c5027bb..494f69069 100644
--- a/data/en.wikipedia.org/wiki/Who's_Afraid_of_Peer_Review-0.md
+++ b/data/en.wikipedia.org/wiki/Who's_Afraid_of_Peer_Review-0.md
@@ -4,7 +4,7 @@ chunk: 1/2
 source: "https://en.wikipedia.org/wiki/Who's_Afraid_of_Peer_Review?"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T09:53:26.673104+00:00"
+date_saved: "2026-05-05T10:16:18.559856+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/Who's_Afraid_of_Peer_Review-1.md b/data/en.wikipedia.org/wiki/Who's_Afraid_of_Peer_Review-1.md
index 834d0f452..cc6908ebe 100644
--- a/data/en.wikipedia.org/wiki/Who's_Afraid_of_Peer_Review-1.md
+++ b/data/en.wikipedia.org/wiki/Who's_Afraid_of_Peer_Review-1.md
@@ -4,7 +4,7 @@ chunk: 2/2
 source: "https://en.wikipedia.org/wiki/Who's_Afraid_of_Peer_Review?"
 category: "reference"
 tags: "science, encyclopedia"
-date_saved: "2026-05-05T09:53:26.673104+00:00"
+date_saved: "2026-05-05T10:16:18.559856+00:00"
 instance: "kb-cron"
 ---
 
diff --git a/data/en.wikipedia.org/wiki/WorldPop_Project-0.md b/data/en.wikipedia.org/wiki/WorldPop_Project-0.md
new file mode 100644
index 000000000..178a27f1b
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/WorldPop_Project-0.md
@@ -0,0 +1,68 @@
+---
+title: "WorldPop Project"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/WorldPop_Project"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:16:19.821014+00:00"
+instance: "kb-cron"
+---
+
+WorldPop is a research programme based in the School of Geography and Environmental Science, University of Southampton. The programme employs a multidisciplinary team of researchers, analysts, GIS technicians, and project specialists who construct open data on populations and population attributes at high spatial resolution. Created from a combination of The AfriPop Project, AmeriPop, and AsiaPop projects in 2013, WorldPop engages in geospatial demographic projects with governments and institutions in low- and middle-income countries (LMICs) as well as collaborations with partner organisations, such as the Bill & Melinda Gates Foundation, Gavi, the Vaccine Alliance, United Nations agencies, the UK Foreign, Commonwealth and Development Office, commercial data providers and other international development organisations. The programme provides training in population modelling to ministries of health and national statistical offices in LMICs and works with them to support health and demographic surveys to achieve Sustainable Development Goals
+
+
+== Areas of interest ==
+Demography
+Geographic information system data
+Satellite imagery
+Remote sensing
+Effects of Non-pharmaceutical intervention (epidemiology) on COVID-19
+Geospatial predictive modeling
+
+
+== Population estimation ==
+WorldPop develops statistical population modelling methods to produce gridded population estimates that support census activities. The programme develops new methods for data synthesis that use demographic and health surveys, census, satellite imagery, cell phone and other data to create consistent gridded outputs and map detailed population densities.
+A case study evaluating several geospatial datasets against the 'gold-standard' census data for Bioko Island, Equatorial Guinea found that while the WorldPop Constrained dataset for the area matched best at lower population densities, WorldPop Unconstrained data performed poorly at all densities.
+
+
+=== Population of Papua New Guinea ===
+Although the government of Papua New Guinea had estimated the country's population at 9.4 million, unpublished findings of a population estimation study funded by the United Nations Population Fund and conducted by WorldPop in November 2022 suggested the true population was close to 17 million. This estimate was reviewed and amended to less than 11 million and the methodology used to calculate this figure was published in July 2023.
+
+
+== Academic debate on rural population estimates ==
+In 2025, the accuracy of WorldPop and other global gridded population datasets was discussed in the academic literature with respect to the representation of rural populations. The debate centred on whether observed discrepancies indicate a systemic bias in population modelling or reflect localised methodological limitations.
+
+
+=== Rural underrepresentation study (2025) ===
+A study published in Nature Communications by Láng-Ritter et al. (2025) analysed five major global population datasets, including WorldPop, and reported systematic underestimation of rural populations.  
+Using historical resettlement records from 307 large dam projects across 35 countries as an independent reference, the authors estimated a 53.4% negative bias in WorldPop’s rural population estimates. The study attributed this discrepancy to incomplete rural census data and modelling approaches primarily calibrated for urban environments, and suggested potential implications for development planning and resource allocation.
+
+
+=== Response from data producers ===
+In a published rebuttal, a group of population data producers, including WorldPop Director Andrew J. Tatem, disputed the study’s conclusions and argued that its findings resulted from methodological shortcomings rather than systemic bias.  
+The rebuttal raised several points:
+
+The study was said to measure known technical limitations, such as the use of static water masks and growth-only building models, in areas affected by reservoir flooding where population allocation is not intended.
+The authors argued that displacement linked to large dam reservoirs represents rare and localised cases that should not be generalised to global rural population accuracy.
+They stated that population in such areas is typically redistributed to nearby grid cells rather than omitted from datasets.
+The rebuttal estimated that reservoir-related population misplacement affects less than 2% of the global rural population, contrasting with the larger global underestimation proposed by Láng-Ritter et al..
+The data producers acknowledged the value of improving rural demographic data but maintained that the study does not demonstrate a fundamental flaw in WorldPop or other global gridded population datasets.
+
+
+== WorldPop Database ==
+Outputs from WorldPop research contribute to a spatial database of linked information on contemporary census data, satellite-imagery-derived settlement maps, and land cover information. The resultant API, datasets, methods, and maps are available under Creative Commons license on the project's websites. Through collaboration with Esri, gridded population datasets produced by WorldPop are also available in the ArcGIS Living Atlas of the World 
+
+
+== See also ==
+Developing country
+World Population
+Malaria Atlas Project
+
+
+== References ==
+
+
+== External links ==
+WorldPop—official website
+WorldPop Open Population Repository—access to gridded population estimates and related data.
+WorldPop Applications—access to WorldPop's interactive web map, do-it-yourself gridded population estimate tool and demographics portal.
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/XDrawChem-0.md b/data/en.wikipedia.org/wiki/XDrawChem-0.md
new file mode 100644
index 000000000..105ebe914
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/XDrawChem-0.md
@@ -0,0 +1,35 @@
+---
+title: "XDrawChem"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/XDrawChem"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:14:29.287450+00:00"
+instance: "kb-cron"
+---
+
+XDrawChem is a free software program for drawing chemical structural formulas, available for Unix and macOS. It is distributed under the GNU GPL. In Microsoft Windows this program is called WinDrawChem.
+
+
+== Major features ==
+Fixed length and fixed angle drawing
+Automatic alignment of figures
+Detection of structures, text, and arrows, and their automatic placement
+Can automatically draw rings and other structures - has all standard amino acids and nucleic acids in a built-in library
+Retrieval of structures from a network database based on CAS number, formula, or name
+Retrieval of information on a molecule based on a drawing
+Symbols such as partial charge and radicals
+Reading MDL Molfiles, CML (Chemical Markup Language), ChemDraw binary format, ChemDraw XML text format
+Writing MDL Molfiles, CML, ChemDraw XML text format
+Integration with OpenBabel, allowing XDrawChem to read and write over 20 different chemical file formats.
+Image export in Portable Network Graphics (PNG), Windows bitmap, Encapsulated PostScript (EPS), and Scalable Vector Graphics (SVG)
+3D structure generation with the help of the external program BUILD3D
+Simple spectra predictions, including 13C-NMR, 1H-NMR (based on additive rules and functional group lookup methods), and IR
+Simple property estimation, including pKa, octanol-water partition coefficient, and gas-phase enthalpy change.
+
+
+== References ==
+
+
+== External links ==
+XDrawChem at SourceForge
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/XMD-0.md b/data/en.wikipedia.org/wiki/XMD-0.md
new file mode 100644
index 000000000..0c57f3eba
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/XMD-0.md
@@ -0,0 +1,24 @@
+---
+title: "XMD"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/XMD"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:14:30.458640+00:00"
+instance: "kb-cron"
+---
+
+XMD is a classical molecular dynamics software designed to
+simulate problems related to materials science. The code was
+developed by Jon Rifkin of University of Connecticut and is being
+distributed under GNU General Public License.
+Source code is available in C and can be compiled using POSIX thread
+functions to take advantage of multi-CPU computers.
+
+
+== See also ==
+Molecular design software
+
+
+== External links ==
+XMD Homepage
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Xplanet-0.md b/data/en.wikipedia.org/wiki/Xplanet-0.md
new file mode 100644
index 000000000..ba480de85
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Xplanet-0.md
@@ -0,0 +1,51 @@
+---
+title: "Xplanet"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Xplanet"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:14:31.609791+00:00"
+instance: "kb-cron"
+---
+
+Xplanet is a renderer for planetary and Solar System images, capable of producing various types of graphics depicting the Solar System.  It is normally used to create computer wallpapers, which may be updated with the latest cloud maps or the regions of Earth which are in sunlight. Xplanet is free software released under the GNU GPL.
+
+
+== Flat maps ==
+Xplanet can be used to produce projected maps of any planet (but typically Earth), for example Mollweide projections which show the whole Earth at once, or Mercator projections with a rectangular appearance suitable for filling the screen.
+It is possible to overlay clouds or text (such as the location of recent events) onto these maps; a popular option is shading areas currently experiencing night.
+
+
+== Planetary images ==
+Xplanet can also be used to render more general views of objects in the Solar System, such as a view of the Earth from the Moon.  In more recent versions, Xplanet depicts eclipses, and some of its images show Jupiter's moons casting an eclipse onto the planet.
+
+
+== Technical ==
+Xplanet runs on Linux, Mac OS X and other Unix operating systems and also on Microsoft Windows, and was derived from an older Unix application called xearth.
+It can either generate wallpaper, save the resulting image, or produce textual output detailing the locations of various objects.
+Configuration is done by modifying a text file.  The Windows version comes with a simple editor called winXPlanetBG to assist in updating the configurations and helps to download the cloud maps automatically.  OSXplanet Archived 2007-08-07 at the Wayback Machine is an interactive wallpaper derivative for the Mac OS X.
+
+
+== Incorporation into other utilities ==
+
+The c-squares mapper, a web-based mapping utility constructed at CSIRO in Australia in 2002 for displaying the spatial extent of c-squares on the surface of the Earth, was upgraded in 2005–2006 to incorporate Xplanet software in order to display "globe views" (example at right). These views are user-rotatable and zoomable and can offer more realistic views for either Pacific Ocean- or polar- centred data than are possible with a flat map (e.g. equirectangular) projection. A technical description of the c-squares mapper installation process (version 3 onwards), which also requires installation of Xplanet, is available at http://www.marine.csiro.au/csquares/mapper_README.html (also available via Sourceforge).
+
+
+== XplanetFX ==
+Until 2013, XplanetFX was a GTK-frontend for Xplanet under Linux. It provided a simple to use GUI to configure Xplanet and schedule renderings. It also claimed to produce higher quality renderings. XplanetFX was free software released under a permissive vanity license.  It closed down when many of the sources it used were shut down or paywalled.
+
+
+== See also ==
+Wikipedia:Producing maps with xplanet
+EarthDesk
+
+
+== References ==
+
+
+== External links ==
+Project page
+winXPlanetBG a Windows GUI Archived 2007-09-22 at the Wayback Machine
+Sample output of xplanet: ISS-Position
+Sample output of xplanet: Mercator view of Earth enhanced with various data like satellites, quakes, active volcanos
+XplanetFX a Linux GUI
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/Yooreeka-0.md b/data/en.wikipedia.org/wiki/Yooreeka-0.md
new file mode 100644
index 000000000..4d3600f51
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/Yooreeka-0.md
@@ -0,0 +1,43 @@
+---
+title: "Yooreeka"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/Yooreeka"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:14:32.751714+00:00"
+instance: "kb-cron"
+---
+
+Yooreeka is a library for data mining, machine learning, soft computing, and mathematical analysis. The project started with the code of the book "Algorithms of the Intelligent Web". Although the term "Web" prevailed in the title, in essence, the algorithms are valuable in any software application.
+It covers all major algorithms and provides many examples.
+Yooreeka 2.x is licensed under the Apache License rather than the somewhat more restrictive LGPL (which was the license of v1.x).
+The library is written 100% in the Java language.
+
+
+== Algorithms ==
+The following algorithms are covered:
+
+Clustering
+Hierarchical—Agglomerative (e.g. MST single link; ROCK) and Divisive
+Partitional (e.g. k-means)
+Classification
+Bayesian
+Decision trees
+Neural Networks
+Rule based (via Drools)
+Recommendations
+Collaborative filtering
+Content based
+Search
+PageRank
+DocRank
+Personalization
+
+
+== References ==
+
+
+== External links ==
+Baynoo Website
+Yooreeka on GitHub
+Yooreeka on Google Code (old repository)
\ No newline at end of file
diff --git a/data/en.wikipedia.org/wiki/‡biblios.net-0.md b/data/en.wikipedia.org/wiki/‡biblios.net-0.md
new file mode 100644
index 000000000..193f3a44d
--- /dev/null
+++ b/data/en.wikipedia.org/wiki/‡biblios.net-0.md
@@ -0,0 +1,25 @@
+---
+title: "‡biblios.net"
+chunk: 1/1
+source: "https://en.wikipedia.org/wiki/‡biblios.net"
+category: "reference"
+tags: "science, encyclopedia"
+date_saved: "2026-05-05T10:16:28.685475+00:00"
+instance: "kb-cron"
+---
+
+‡biblios.net is a free browser-based cataloging service with a data store containing over thirty million records. Records are licensed under the Open Data Commons Public Domain Dedication and License, making the service the world's largest repository of freely-licensed library records. The service was created and is maintained by LibLime.
+
+
+== Features ==
+‡biblios.net (pronounced 'biblios dot net') features a metadata editor with templates, macros, authority auto-completion and embedded context-sensitive help. The central record repository contains 25-million bibliographic records and just under eight-million authority records. The data is maintained by ‡biblios.net users. Catalogers can use and contribute to the database without restrictions because records in ‡biblios.net are freely-licensed under the Open Data Commons Public Domain Dedication and License.
+‡biblios.net also includes a built-in federated search system, allowing catalogers to find records from any Z39.50 target. Additionally, there is a central Search Target Registry, seeded with over 2,000 Z39.50 servers, for catalogers to find, create and share Z39.50 targets.
+In addition to offering a traditional cataloging interface, ‡biblios.net offers social cataloging features. Built-in forums and private messaging make finding help and communicating with others possible within the software.
+
+
+== References ==
+
+
+== External links ==
+‡biblios.net Website
+LibLime's Homepage
\ No newline at end of file